摘要
:
Context: Recently, the machine learning (ML) field has been rapidly growing, mainly owing to the availability of historical datasets and advanced computational power. This growth is still facing a set of challenges, such as the in...
展开
Context: Recently, the machine learning (ML) field has been rapidly growing, mainly owing to the availability of historical datasets and advanced computational power. This growth is still facing a set of challenges, such as the interpretability of ML models. In particular, in the medical field, interpretability is a real bottleneck to the use of ML by physicians. Therefore, numerous interpretability techniques have been proposed and evaluated to help ML gain the trust of its users. Methods: This review was carried out according to the well-known systematic map and review process to analyze the literature on interpretability techniques when applied in the medical field with regard to different aspects: publication venues and publication year, contribution and empirical types, medical and ML disciplines and objectives, ML black-box techniques interpreted, interpretability techniques investigated, their performance and the best performing techniques, and lastly, the datasets used when evaluating interpretability techniques. Results: A total of 179 articles (1994-2020) were selected from six digital libraries: ScienceDirect, IEEE Xplore, ACM Digital Library, SpringerLink, Wiley, and Google Scholar. The results showed that the number of studies dealing with interpretability increased over the years with a dominance of solution proposals and experiment-based empirical type. Diagnosis, oncology, and classification were the most frequent medical task, discipline, and ML objective studied, respectively. Artificial neural networks were the most widely used ML black-box techniques investigated for interpretability. Additionally, global interpretability techniques focusing on a specific black-box model, such as rules, were the dominant explanation types, and most of the metrics used to evaluate interpretability were accuracy, fidelity, and number of rules. Moreover, the variety of the techniques used by the selected papers did not allow categorization at the technique level, and the high number of the sum of evaluations (671) of the articles raised a suspicion of subjectivity. Datasets that contained numerical and categorical attributes were the most frequently used in the selected studies. Conclusions: Further effort is needed in disciplines other than diagnosis and classification. Global techniques such as rules are the most used because of their comprehensibility to doctors, but new local techniques should be explored more in the medical field to gain more insights into the model's behavior. More experiments and comparisons against existing techniques are encouraged to determine the best performing techniques. Lastly, quantitative evaluation of interpretability and physicians' implications in interpretability techniques evaluation is highly recommended to evaluate how the techniques will perform in real-world scenarios. It can ensure the soundness of the techniques and help gain trust in black-box models in medical environments. (C) 2022 Elsevier B.V. All rights reserved.
收起