Abstract | Background Artificial intelligence (AI)-based medical devices and digital health technologies, including medical sensors, wearable health trackers, telemedicine, mobile health (mHealth), large language models (LLMs), and digital care twins (DCTs), significantly influence the process of clinical decision support systems (CDSS) in healthcare and medical applications. However, given the complexity of medical decisions, it is crucial that results generated by AI tools not only be correct but also carefully evaluated, understandable, and explainable to end-users, especially clinicians. The lack of interpretability in communicating AI clinical decisions can lead to mistrust among decision-makers and a reluctance to use these technologies. Objective This paper systematically reviews the processes and challenges associated with interpretable machine learning (IML) and explainable artificial intelligence (XAI) within the healthcare and medical domains. Its main goals are to examine the processes of IML and XAI, their related methods, applications, and the implementation challenges they pose in digital health interventions (DHIs), particularly from a quality control perspective, to help understand and improve communication between AI systems and clinicians. The IML process is categorized into pre-processing interpretability, interpretable modeling, and post-processing interpretability. This paper aims to foster a comprehensive understanding of the significance of a robust interpretability approach in clinical decision support systems (CDSS) by reviewing related experimental results. The goal is to provide future researchers with insights for creating clinician-AI tools that are more communicable in healthcare decision support systems and offer a deeper understanding of their challenges. Methods Our research questions, eligibility criteria, and primary goals were proved using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline and the PICO (population, intervention, control, and outcomes) method. We systematically searched PubMed, Scopus, and Web of Science databases using sensitive and specific search strings. Subsequently, duplicate papers were removed using EndNote and Covidence. A two-phase selection process was then carried out on Covidence, starting with screening by title and abstract, followed by a full-text appraisal. The Meta Quality Appraisal Tool (MetaQAT) was used to assess the quality and risk of bias. Finally, a standardized data extraction tool was employed for reliable data mining. Results The searches yielded 2,241 records, from which 555 duplicate papers were removed. During the title and abstract screening step, 958 papers were excluded, and the full-text review step excluded 482 studies. Subsequently, in quality and risk of bias assessment, 172 papers were removed. 74 publications were selected for data extraction, which formed 10 insightful reviews and 64 related experimental studies. Conclusion The paper provides general definitions of explainable artificial intelligence (XAI) in the medical domain and introduces a framework for interpretability in clinical decision support systems structured across three levels. It explores XAI-related health applications within each tier of this framework, underpinned by a review of related experimental findings. Furthermore, the paper engages in a detailed discussion of quality assessment tools for evaluating XAI in intelligent health systems. It also presents a step-by-step roadmap for implementing XAI in clinical settings. To direct future research toward bridging current gaps, the paper examines the importance of XAI models from various angles and acknowledges their limitations. |
---|