Skip to content

Medical explainable AI

Medical explainable AI

Publish Date Title Authors Homepage Code
2024-11-01 Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering Mehdi Hosseini Chagahi et.al. 2411.00916v2 null
2024-10-25 A Review of Deep Learning Approaches for Non-Invasive Cognitive Impairment Detection Muath Alsuhaibani et.al. 2410.19898v1 null
2024-10-23 An Ontology-Enabled Approach For User-Centered and Knowledge-Enabled Explanations of AI Systems Shruthi Chari et.al. 2410.17504v1 link
2024-10-22 Contrasting Attitudes Towards Current and Future AI Applications for Computerised Interpretation of ECG: A Clinical Stakeholder Interview Study Lukas Hughes-Noehrer et.al. 2410.16879v1 null
2024-10-19 Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer Gesa Mittmann et.al. 2410.15012v1 null
2024-10-15 Explainable AI Methods for Multi-Omics Analysis: A Survey Ahmad Hussein et.al. 2410.11910v1 null
2024-10-14 Study on the Helpfulness of Explainable Artificial Intelligence Tobias Labarta et.al. 2410.11896v1 link
2024-10-12 Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health Abdullah Mamun et.al. 2410.09635v1 link
2024-10-10 Artificial intelligence techniques in inherited retinal diseases: A review Han Trinh et.al. 2410.09105v1 null
2024-10-07 CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures Ekaterina Sviridova et.al. 2410.05235v2 link
2024-10-01 Explainable Diagnosis Prediction through Neuro-Symbolic Integration Qiuhao Lu et.al. 2410.01855v1 null
2024-10-01 Easydiagnos: a framework for accurate feature selection for automatic diagnosis in smart healthcare Prasenjit Maji et.al. 2410.00366v1 null
2024-09-20 Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: eye-tracking study Tirtha Chanda et.al. 2409.13476v1 null
2024-09-19 Explainable AI for Autism Diagnosis: Identifying Critical Brain Regions Using fMRI Data Suryansh Vidya et.al. 2409.15374v1 null
2024-09-19 Improving Prototypical Parts Abstraction for Case-Based Reasoning Explanations Designed for the Kidney Stone Type Recognition Daniel Flores-Araiza et.al. 2409.12883v1 null
2024-09-18 Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques Yubo Li et.al. 2409.12087v3 null
2024-09-09 Explainable AI: Definition and attributes of a good explanation for health AI Evangelia Kyrimi et.al. 2409.15338v1 null
2024-08-30 Exploring the Effect of Explanation Content and Format on User Comprehension and Trust Antonio Rago et.al. 2408.17401v1 null
2024-08-29 A Survey for Large Language Models in Biomedicine Chong Wang et.al. 2409.00133v1 null
2024-08-27 Aligning XAI with EU Regulations for Smart Biomedical Devices: A Methodology for Compliance Analysis Francesco Sovrano et.al. 2408.15121v1 null
2024-08-24 Towards Case-based Interpretability for Medical Federated Learning Laura Latorre et.al. 2408.13626v1 null
2024-08-22 AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines Douwe J. Spaanderman et.al. 2408.12491v1 null
2024-08-14 Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy Kimji N. Pellano et.al. 2409.00001v1 null
2024-08-06 MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy Hanchen David Wang et.al. 2408.11837v1 null
2024-08-05 The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development Joshua Morriss et.al. 2408.05239v1 null
2024-08-05 Enhancing Medical Learning and Reasoning Systems: A Boxology-Based Comparative Analysis of Design Patterns Chi Him Ng et.al. 2408.02709v1 null
2024-08-05 Bayesian Kolmogorov Arnold Networks (Bayesian_KANs): A Probabilistic Approach to Enhance Accuracy and Interpretability Masoud Muhammed Hassan et.al. 2408.02706v1 null
2024-07-26 MLtoGAI: Semantic Web based with Machine Learning for Enhanced Disease Prediction and Personalized Recommendations using Generative AI Shyam Dongre et.al. 2407.20284v1 null
2024-07-25 Introducing δ-XAI: a novel sensitivity-based method for local AI explanations Alessandro De Carlo et.al. 2407.18343v2 null
2024-07-24 Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population Nikolaos Ntampakis et.al. 2407.17324v2 null
2024-07-24 Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition Michele Fiori et.al. 2408.06352v1 null
2024-07-21 Explainable AI-based Intrusion Detection System for Industry 5.0: An Overview of the Literature, associated Challenges, the existing Solutions, and Potential Research Directions Naseem Khan et.al. 2408.03335v1 null
2024-07-18 A Comparative Study on Automatic Coding of Medical Letters with Explainability Jamie Glen et.al. 2407.13638v1 link
2024-07-09 Explainable AI for Enhancing Efficiency of DL-based Channel Estimation Abdul Karim Gizzini et.al. 2407.07009v1 null
2024-07-07 Explainable AI: Comparative Analysis of Normal and Dilated ResNet Models for Fundus Disease Classification P. N. Karthikayan et.al. 2407.05440v2 null
2024-07-03 A Survey on Trustworthiness in Foundation Models for Medical Image Analysis Congzhen Shi et.al. 2407.15851v2 null
2024-07-01 The Impact of an XAI-Augmented Approach on Binary Classification with Scarce Data Ximing Wen et.al. 2407.06206v1 null
2024-06-28 Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach Sai Krishna Revanth Vuruma et.al. 2407.00167v1 null
2024-06-25 Towards Compositional Interpretability for XAI Sean Tull et.al. 2406.17583v1 null
2024-06-17 Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods Vincent Olesen et.al. 2406.12142v2 link
2024-06-11 Unlocking the Potential of Metaverse in Innovative and Immersive Digital Health Fatemeh Ebrahimzadeh et.al. 2406.07114v2 null
2024-06-10 AI-Driven Predictive Analytics Approach for Early Prognosis of Chronic Kidney Disease Using Ensemble Learning and Explainable AI K M Tawsik Jawad et.al. 2406.06728v1 null
2024-06-10 Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook Yusif Ibrahimov et.al. 2406.05984v1 null
2024-06-09 Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance Zhan Zhang et.al. 2406.05746v1 null
2024-06-07 Advancing Histopathology-Based Breast Cancer Diagnosis: Insights into Multi-Modality and Explainability Faseela Abdullakutty et.al. 2406.12897v1 null
2024-06-07 Revisiting Attention Weights as Interpretations of Message-Passing Neural Networks Yong-Min Shin et.al. 2406.04612v1 link
2024-06-04 Using Explainable AI for EEG-based Reduced Montage Neonatal Seizure Detection Dinuka Sandun Udayantha et.al. 2406.16908v3 link
2024-06-01 Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques Samita Bai et.al. 2406.00532v1 null
2024-06-01 Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition Alaa Nfissi et.al. 2406.01624v2 link
2024-05-31 The Explanation Necessity for Healthcare AI Michail Mamalakis et.al. 2406.00216v1 null
2024-05-29 Interdisciplinary Expertise to Advance Equitable Explainable AI Chloe R. Bennett et.al. 2406.18563v1 null
2024-05-27 "It depends": Configuring AI to Improve Clinical Usefulness Across Contexts Hubert D. Zając et.al. 2407.11978v1 null
2024-05-26 Improving Health Professionals' Onboarding with AI and XAI for Trustworthy Human-AI Collaborative Decision Making Min Hun Lee et.al. 2405.16424v1 null
2024-05-26 Exploring Nutritional Impact on Alzheimer's Mortality: An Explainable AI Approach Ziming Liu et.al. 2405.17502v1 null
2024-05-24 Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone Catalina Gomez et.al. 2407.11974v1 null
2024-05-23 Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery Yingying Fang et.al. 2406.18552v1 null
2024-05-21 The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach Mohsen Jozani et.al. 2405.13099v1 null
2024-05-17 ChatGPT in Classrooms: Transforming Challenges into Opportunities in Education Harris Bin Munawar et.al. 2405.10645v1 null
2024-05-13 Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data Camelia Oprea et.al. 2405.07590v1 null
2024-05-10 XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare Fatemeh Nazary et.al. 2405.06270v3 null
2024-05-09 To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems Miquel Miró-Nicolau et.al. 2405.05766v1 null
2024-05-05 Region-specific Risk Quantification for Interpretable Prognosis of COVID-19 Zhusi Zhong et.al. 2405.02815v1 link
2024-04-26 Rad4XCNN: a new agnostic method for post-hoc global explanation of CNN-derived features by means of radiomics Francesco Prinzi et.al. 2405.02334v1 null
2024-04-25 Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability Yunfei Ge et.al. 2404.16957v1 null
2024-04-19 Explainable AI for Fair Sepsis Mortality Predictive Model Chia-Hsuan Chang et.al. 2404.13139v1 null
2024-04-19 Multi Class Depression Detection Through Tweets using Artificial Intelligence Muhammad Osama Nusrat et.al. 2404.13104v1 link
2024-04-19 COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images Dmytro Shvetsov et.al. 2404.12832v2 link
2024-04-15 Hybrid Intelligence for Digital Humanities Victor de Boer et.al. 2406.15374v1 null
2024-04-14 Ethical Framework for Responsible Foundational Models in Medical Imaging Abhijit Das et.al. 2406.11868v1 null
2024-04-09 Advancements in Radiomics and Artificial Intelligence for Thyroid Cancer Diagnosis Milad Yousefi et.al. 2404.07239v1 null
2024-04-06 Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI Taminul Islam et.al. 2404.04686v1 null
2024-04-05 Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI Maryam Ahmed et.al. 2404.03892v3 null
2024-03-30 Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives Xingrui Gu et.al. 2404.00320v2 null
2024-03-26 Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach Andrea Ferrario et.al. 2403.17873v1 null
2024-03-26 Clinical Domain Knowledge-Derived Template Improves Post Hoc AI Explanations in Pneumothorax Classification Han Yuan et.al. 2403.18871v1 link
2024-03-03 Enhancing Neural Machine Translation of Low-Resource Languages: Corpus Development, Human Evaluation and Explainable AI Architectures Séamus Lankford et.al. 2403.01580v1 null
2024-02-28 Cause and Effect: Can Large Language Models Truly Understand Causality? Swagata Ashwani et.al. 2402.18139v3 null
2024-02-28 Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina Yasin Sadeghi Bazargani et.al. 2402.18600v1 null
2024-02-22 Multi-stakeholder Perspective on Responsible Artificial Intelligence and Acceptability in Education A. J. Karran et.al. 2402.15027v2 null
2024-02-12 Deciphering Heartbeat Signatures: A Vision Transformer Approach to Explainable Atrial Fibrillation Detection from ECG Signals Aruna Mohan et.al. 2402.09474v2 null
2024-02-05 Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering Aryan Agrawal et.al. 2402.05127v1 null
2024-01-24 Information That Matters: Exploring Information Needs of People Affected by Algorithmic Decisions Timothée Schmude et.al. 2401.13324v6 null
2024-01-02 Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education Vahid Ashrafimoghari et.al. 2401.02985v1 null
2023-12-29 XAI for In-hospital Mortality Prediction via Multimodal ICU Data Xingqiao Li et.al. 2312.17624v1 link
2023-12-22 Joining Forces for Pathology Diagnostics with AI Assistance: The EMPAIA Initiative Norman Zerbe et.al. 2401.09450v2 null
2023-12-18 Robust Stochastic Graph Generator for Counterfactual Explanations Mario Alfonso Prado-Romero et.al. 2312.11747v2 null
2023-12-10 Evaluating the Utility of Model Explanations for Model Development Shawn Im et.al. 2312.06032v1 null
2023-12-05 Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety Manas Gaur et.al. 2312.06798v1 null
2023-12-04 Class-Discriminative Attention Maps for Vision Transformers Lennart Brocki et.al. 2312.02364v3 null
2023-11-28 Deployment of a Robust and Explainable Mortality Prediction Model: The COVID-19 Pandemic and Beyond Jacob R. Epifano et.al. 2311.17133v1 null
2023-11-27 Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions Benjamin Keel et.al. 2311.15719v1 link
2023-11-24 MRxaI: Black-Box Explainability for Image Classifiers in a Medical Setting Nathan Blake et.al. 2311.14471v1 null
2023-11-21 Moderating Model Marketplaces: Platform Governance Puzzles for AI Intermediaries Robert Gorwa et.al. 2311.12573v3 null
2023-11-20 Ovarian Cancer Data Analysis using Deep Learning: A Systematic Review from the Perspectives of Key Features of Data Analysis and AI Assurance Muta Tah Hira et.al. 2311.11932v1 null
2023-11-18 Representing visual classification as a linear combination of words Shobhit Agarwal et.al. 2311.10933v1 link
2023-11-03 Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging Emma A. M. Stanley et.al. 2311.02115v2 link
2023-10-29 Predicting recovery following stroke: deep learning, multimodal data and feature selection using explainable AI Adam White et.al. 2310.19174v1 null
2023-10-03 Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation Hossein Shreim et.al. 2310.01828v2 link
2023-09-26 Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI Muhammad Aurangzeb Ahmad et.al. 2311.01463v1 null
2023-09-20 When to Trust AI: Advances and Challenges for Certification of Neural Networks Marta Kwiatkowska et.al. 2309.11196v1 null

Abstracts

Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering

2411.00916v2 by Mehdi Hosseini Chagahi, Saeed Mohammadi Dashtaki, Niloufar Delfan, Nadia Mohammadi, Alireza Samari, Behzad Moshiri, Md. Jalil Piran, Oliver Faust

Osteoporosis is a common condition that increases fracture risk, especially in older adults. Early diagnosis is vital for preventing fractures, reducing treatment costs, and preserving mobility. However, healthcare providers face challenges like limited labeled data and difficulties in processing medical images. This study presents a novel multi-modal learning framework that integrates clinical and imaging data to improve diagnostic accuracy and model interpretability. The model utilizes three pre-trained networks-VGG19, InceptionV3, and ResNet50-to extract deep features from X-ray images. These features are transformed using PCA to reduce dimensionality and focus on the most relevant components. A clustering-based selection process identifies the most representative components, which are then combined with preprocessed clinical data and processed through a fully connected network (FCN) for final classification. A feature importance plot highlights key variables, showing that Medical History, BMI, and Height were the main contributors, emphasizing the significance of patient-specific data. While imaging features were valuable, they had lower importance, indicating that clinical data are crucial for accurate predictions. This framework promotes precise and interpretable predictions, enhancing transparency and building trust in AI-driven diagnoses for clinical integration.

摘要:骨質疏鬆症是一種常見的疾病,會增加骨折的風險,特別是老年人。早期診斷對於預防骨折、降低治療成本和維持行動能力至關重要。然而,醫療保健提供者面臨著標記數據有限和處理醫學影像困難等挑戰。本研究提出了一個新穎的多模式學習框架,該框架整合了臨床和影像數據,以提高診斷準確性和模型可解釋性。該模型利用三個預訓練的網路,VGG19、InceptionV3 和 ResNet50,從 X 射線影像中提取深度特徵。這些特徵使用 PCA 轉換以降低維度並專注於最相關的組成部分。基於聚類的選擇過程識別出最具代表性的組成部分,然後將這些組成部分與預處理的臨床數據結合,並通過全連接網路 (FCN) 進行最終分類。特徵重要性圖突出了關鍵變數,表明病史、BMI 和身高是主要貢獻因素,強調了患者特定數據的重要性。雖然影像特徵很有價值,但它們的重要性較低,這表明臨床數據對於準確預測至關重要。此框架促进了準確且可解釋的預測,提高了透明度,並建立了對 AI 驅動診斷在臨床整合中的信任。

A Review of Deep Learning Approaches for Non-Invasive Cognitive Impairment Detection

2410.19898v1 by Muath Alsuhaibani, Ali Pourramezan Fard, Jian Sun, Farida Far Poor, Peter S. Pressman, Mohammad H. Mahoor

This review paper explores recent advances in deep learning approaches for non-invasive cognitive impairment detection. We examine various non-invasive indicators of cognitive decline, including speech and language, facial, and motoric mobility. The paper provides an overview of relevant datasets, feature-extracting techniques, and deep-learning architectures applied to this domain. We have analyzed the performance of different methods across modalities and observed that speech and language-based methods generally achieved the highest detection performance. Studies combining acoustic and linguistic features tended to outperform those using a single modality. Facial analysis methods showed promise for visual modalities but were less extensively studied. Most papers focused on binary classification (impaired vs. non-impaired), with fewer addressing multi-class or regression tasks. Transfer learning and pre-trained language models emerged as popular and effective techniques, especially for linguistic analysis. Despite significant progress, several challenges remain, including data standardization and accessibility, model explainability, longitudinal analysis limitations, and clinical adaptation. Lastly, we propose future research directions, such as investigating language-agnostic speech analysis methods, developing multi-modal diagnostic systems, and addressing ethical considerations in AI-assisted healthcare. By synthesizing current trends and identifying key obstacles, this review aims to guide further development of deep learning-based cognitive impairment detection systems to improve early diagnosis and ultimately patient outcomes.

摘要:本篇評論探討了深度學習方法在非侵入式認知功能障礙檢測上的最新進展。我們檢視了各種非侵入式的認知衰退指標,包括語言和語言、面部和運動機能。本文概述了與此領域相關的資料集、特徵提取技術和深度學習架構。我們分析了不同方法在不同方式上的表現,並觀察到基於語言和語言的方法通常能達到最高的檢測表現。結合聲學和語言特徵的研究往往優於使用單一方式的研究。面部分析方法顯示出視覺方式的潛力,但研究較少。大多數論文專注於二元分類(受損與未受損),較少探討多類或回歸任務。遷移學習和預訓練語言模型已成為流行且有效的技術,特別是對於語言分析。儘管取得了重大進展,但仍存在一些挑戰,包括資料標準化和可及性、模型可解釋性、縱向分析限制和臨床適應性。最後,我們提出了未來的研究方向,例如調查與語言無關的語音分析方法、開發多模式診斷系統,以及解決人工智慧輔助醫療保健中的倫理考量。透過綜合目前的趨勢和找出關鍵障礙,本篇評論旨在引導深度學習為基礎的認知功能障礙檢測系統的進一步發展,以改善早期診斷,並最終改善患者的治療結果。

An Ontology-Enabled Approach For User-Centered and Knowledge-Enabled Explanations of AI Systems

2410.17504v1 by Shruthi Chari

Explainable Artificial Intelligence (AI) focuses on helping humans understand the working of AI systems or their decisions and has been a cornerstone of AI for decades. Recent research in explainability has focused on explaining the workings of AI models or model explainability. There have also been several position statements and review papers detailing the needs of end-users for user-centered explainability but fewer implementations. Hence, this thesis seeks to bridge some gaps between model and user-centered explainability. We create an explanation ontology (EO) to represent literature-derived explanation types via their supporting components. We implement a knowledge-augmented question-answering (QA) pipeline to support contextual explanations in a clinical setting. Finally, we are implementing a system to combine explanations from different AI methods and data modalities. Within the EO, we can represent fifteen different explanation types, and we have tested these representations in six exemplar use cases. We find that knowledge augmentations improve the performance of base large language models in the contextualized QA, and the performance is variable across disease groups. In the same setting, clinicians also indicated that they prefer to see actionability as one of the main foci in explanations. In our explanations combination method, we plan to use similarity metrics to determine the similarity of explanations in a chronic disease detection setting. Overall, through this thesis, we design methods that can support knowledge-enabled explanations across different use cases, accounting for the methods in today's AI era that can generate the supporting components of these explanations and domain knowledge sources that can enhance them.

摘要:可解釋人工智慧(AI)專注於協助人類了解 AI 系統運作或其決策,數十年來一直是 AI 的基石。最近的可解釋性研究專注於解釋 AI 模型或模型可解釋性的運作。也有幾份立場聲明和評論論文詳細說明了最終使用者對以使用者為中心的可解釋性的需求,但實作較少。因此,本論文旨在彌補模型和以使用者為中心的可解釋性之間的一些差距。我們建立一個解釋本體(EO)以透過其支援元件來表示從文獻中衍生的解釋類型。我們實作一個知識增強的問答(QA)管線,以在臨床環境中支援情境解釋。最後,我們正在實作一個系統,以結合來自不同 AI 方法和資料模式的解釋。在 EO 中,我們可以表示 15 種不同的解釋類型,並且我們已在六個範例使用案例中測試這些表示。我們發現,知識增強改善了基礎大型語言模型在情境化 QA 中的效能,並且效能因疾病群組而異。在相同的環境中,臨床醫生也表示他們希望將可操作性視為解釋中的主要焦點之一。在我們的解釋組合方法中,我們計畫使用相似性指標來確定慢性病偵測環境中解釋的相似性。總體而言,透過本論文,我們設計了可以在不同使用案例中支援知識啟用解釋的方法,考量到當今 AI 時代中可以產生這些解釋的支援元件和可以增強這些解釋的領域知識來源的方法。

Contrasting Attitudes Towards Current and Future AI Applications for Computerised Interpretation of ECG: A Clinical Stakeholder Interview Study

2410.16879v1 by Lukas Hughes-Noehrer, Leda Channer, Gabriel Strain, Gregory Yates, Richard Body, Caroline Jay

Objectives: To investigate clinicians' attitudes towards current automated interpretation of ECG and novel AI technologies and their perception of computer-assisted interpretation. Materials and Methods: We conducted a series of interviews with clinicians in the UK. Our study: (i) explores the potential for AI, specifically future 'human-like' computing approaches, to facilitate ECG interpretation and support clinical decision making, and (ii) elicits their opinions about the importance of explainability and trustworthiness of AI algorithms. Results: We performed inductive thematic analysis on interview transcriptions from 23 clinicians and identified the following themes: (i) a lack of trust in current systems, (ii) positive attitudes towards future AI applications and requirements for these, (iii) the relationship between the accuracy and explainability of algorithms, and (iv) opinions on education, possible deskilling, and the impact of AI on clinical competencies. Discussion: Clinicians do not trust current computerised methods, but welcome future 'AI' technologies. Where clinicians trust future AI interpretation to be accurate, they are less concerned that it is explainable. They also preferred ECG interpretation that demonstrated the results of the algorithm visually. Whilst clinicians do not fear job losses, they are concerned about deskilling and the need to educate the workforce to use AI responsibly. Conclusion: Clinicians are positive about the future application of AI in clinical decision-making. Accuracy is a key factor of uptake and visualisations are preferred over current computerised methods. This is viewed as a potential means of training and upskilling, in contrast to the deskilling that automation might be perceived to bring.

摘要:目的:調查臨床醫生對目前自動化心電圖解讀和新的人工智慧技術的態度,以及他們對電腦輔助解讀的看法。材料和方法:我們對英國的臨床醫生進行了一系列訪談。我們的研究:(i) 探討人工智慧的潛力,特別是未來的「類人類」運算方法,以促進心電圖解讀並支持臨床決策制定,以及 (ii) 徵求他們對人工智慧演算法的可解釋性和可信度的看法。結果:我們對 23 位臨床醫生的訪談記錄進行了歸納主題分析,並找出以下主題:(i) 對目前系統缺乏信任,(ii) 對未來人工智慧應用和對這些應用的要求持正面態度,(iii) 演算法的準確性和可解釋性之間的關係,以及 (iv) 對教育、可能的技能退化,以及人工智慧對臨床能力的影響的看法。討論:臨床醫生不信任目前的電腦化方法,但歡迎未來的「人工智慧」技術。在臨床醫生相信未來的 AI 解讀準確的情況下,他們不太擔心它是否可解釋。他們也比較喜歡能以視覺方式呈現演算法結果的心電圖解讀。雖然臨床醫生不害怕失業,但他們擔心技能退化,以及需要教育員工負責任地使用人工智慧。結論:臨床醫生對人工智慧在臨床決策制定中的未來應用持正面態度。準確性是採用人工智慧的一個關鍵因素,而視覺化比目前的電腦化方法更受青睞。這被視為一種潛在的培訓和提升技能的方法,與自動化可能帶來的技能退化形成對比。

Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer

2410.15012v1 by Gesa Mittmann, Sara Laiouar-Pedari, Hendrik A. Mehrtens, Sarah Haggenmüller, Tabea-Clara Bucher, Tirtha Chanda, Nadine T. Gaisa, Mathias Wagner, Gilbert Georg Klamminger, Tilman T. Rau, Christina Neppl, Eva Maria Compérat, Andreas Gocht, Monika Hämmerle, Niels J. Rupp, Jula Westhoff, Irene Krücken, Maximillian Seidl, Christian M. Schürch, Marcus Bauer, Wiebke Solass, Yu Chun Tam, Florian Weber, Rainer Grobholz, Jaroslaw Augustyniak, Thomas Kalinski, Christian Hörner, Kirsten D. Mertz, Constanze Döring, Andreas Erbersdobler, Gabriele Deubler, Felix Bremmer, Ulrich Sommer, Michael Brodhun, Jon Griffin, Maria Sarah L. Lenon, Kiril Trpkov, Liang Cheng, Fei Chen, Angelique Levi, Guoping Cai, Tri Q. Nguyen, Ali Amin, Alessia Cimadamore, Ahmed Shabaik, Varsha Manucha, Nazeel Ahmad, Nidia Messias, Francesca Sanguedolce, Diana Taheri, Ezra Baraban, Liwei Jia, Rajal B. Shah, Farshid Siadat, Nicole Swarbrick, Kyung Park, Oudai Hassan, Siamak Sakhaie, Michelle R. Downes, Hiroshi Miyamoto, Sean R. Williamson, Tim Holland-Letz, Carolin V. Schneider, Jakob Nikolas Kather, Yuri Tolkach, Titus J. Brinker

The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue, we introduce a novel dataset of 1,015 tissue microarray core images, annotated by an international group of 54 pathologists. The annotations provide detailed localized pattern descriptions for Gleason grading in line with international guidelines. Utilizing this dataset, we develop an inherently explainable AI system based on a U-Net architecture that provides predictions leveraging pathologists' terminology. This approach circumvents post-hoc explainability methods while maintaining or exceeding the performance of methods trained directly for Gleason pattern segmentation (Dice score: 0.713 $\pm$ 0.003 trained on explanations vs. 0.691 $\pm$ 0.010 trained on Gleason patterns). By employing soft labels during training, we capture the intrinsic uncertainty in the data, yielding strong results in Gleason pattern segmentation even in the context of high interobserver variability. With the release of this dataset, we aim to encourage further research into segmentation in medical tasks with high levels of subjectivity and to advance the understanding of pathologists' reasoning processes.

摘要:前列腺癌是全球男性最常見的癌症,其惡性程度主要根據 Gleason 評分系統使用組織病理學數據進行評估。雖然人工智慧 (AI) 在準確預測 Gleason 評分方面已展現潛力,但這些預測通常缺乏內在的可解釋性,可能會導致對人機互動的不信任。為了解決這個問題,我們引進了一個由 54 位病理學家組成的國際團隊註解的 1,015 個組織微陣列核心影像的新穎資料集。這些註解提供了詳細的局部模式描述,用於符合國際準則的 Gleason 分級。利用這個資料集,我們開發了一個基於 U-Net 架構的內在可解釋 AI 系統,該系統提供了利用病理學家術語進行預測。這種方法規避了事後可解釋性方法,同時維持或超越了直接訓練用於 Gleason 模式分割的方法的效能(Dice 分數:0.713 ± 0.003,訓練於解釋,相對於 0.691 ± 0.010,訓練於 Gleason 模式)。透過在訓練期間採用軟標籤,我們捕捉了資料中的內在不確定性,即使在觀察者間變異性高的情況下,也能在 Gleason 模式分割中產生強大的結果。透過釋出這個資料集,我們旨在鼓勵進一步研究主觀性高的醫療任務中的分割,並增進對病理學家推理過程的理解。

Explainable AI Methods for Multi-Omics Analysis: A Survey

2410.11910v1 by Ahmad Hussein, Mukesh Prasad, Ali Braytee

Advancements in high-throughput technologies have led to a shift from traditional hypothesis-driven methodologies to data-driven approaches. Multi-omics refers to the integrative analysis of data derived from multiple 'omes', such as genomics, proteomics, transcriptomics, metabolomics, and microbiomics. This approach enables a comprehensive understanding of biological systems by capturing different layers of biological information. Deep learning methods are increasingly utilized to integrate multi-omics data, offering insights into molecular interactions and enhancing research into complex diseases. However, these models, with their numerous interconnected layers and nonlinear relationships, often function as black boxes, lacking transparency in decision-making processes. To overcome this challenge, explainable artificial intelligence (xAI) methods are crucial for creating transparent models that allow clinicians to interpret and work with complex data more effectively. This review explores how xAI can improve the interpretability of deep learning models in multi-omics research, highlighting its potential to provide clinicians with clear insights, thereby facilitating the effective application of such models in clinical settings.

摘要:高通量技術的進步導致從傳統的假設驅動方法轉變為資料驅動的方法。多組學是指整合分析來自多個「組學」的資料,例如基因組學、蛋白質組學、轉錄組學、代謝組學和微生物組學。此方法透過擷取生物資訊的不同層面,能全面了解生物系統。深度學習方法愈來愈常被用於整合多組學資料,提供分子交互作用的洞察力,並加強對複雜疾病的研究。然而,這些模型具有許多相互連接的層級和非線性關係,通常會像黑盒子一樣運作,缺乏決策過程的透明度。為了克服此挑戰,可解釋人工智慧 (xAI) 方法對於建立透明模型至關重要,讓臨床醫生可以更有效地解釋和處理複雜資料。此評論探討 xAI 如何能改善多組學研究中深度學習模型的可解釋性,強調其提供臨床醫生明確見解的潛力,進而促進此類模型在臨床環境中的有效應用。

Study on the Helpfulness of Explainable Artificial Intelligence

2410.11896v1 by Tobias Labarta, Elizaveta Kulicheva, Ronja Froelian, Christian Geißler, Xenia Melman, Julian von Klitzing

Explainable Artificial Intelligence (XAI) is essential for building advanced machine learning-powered applications, especially in critical domains such as medical diagnostics or autonomous driving. Legal, business, and ethical requirements motivate using effective XAI, but the increasing number of different methods makes it challenging to pick the right ones. Further, as explanations are highly context-dependent, measuring the effectiveness of XAI methods without users can only reveal a limited amount of information, excluding human factors such as the ability to understand it. We propose to evaluate XAI methods via the user's ability to successfully perform a proxy task, designed such that a good performance is an indicator for the explanation to provide helpful information. In other words, we address the helpfulness of XAI for human decision-making. Further, a user study on state-of-the-art methods was conducted, showing differences in their ability to generate trust and skepticism and the ability to judge the rightfulness of an AI decision correctly. Based on the results, we highly recommend using and extending this approach for more objective-based human-centered user studies to measure XAI performance in an end-to-end fashion.

摘要:可解釋人工智慧 (XAI) 對於建構先進的機器學習驅動應用程式至關重要,特別是在醫療診斷或自動駕駛等關鍵領域。法律、商業和倫理要求促使使用有效的 XAI,但數量日益增加的不同方法使得挑選正確的方法具有挑戰性。此外,由於解釋高度依賴於背景,在沒有使用者的情況下衡量 XAI 方法的有效性只能揭示有限的資訊,排除人類因素,例如理解它的能力。我們建議透過使用者成功執行代理任務的能力來評估 XAI 方法,設計使得良好的執行表現是解釋提供有用資訊的指標。換句話說,我們探討 XAI 對人類決策制定的幫助。此外,對最先進的方法進行使用者研究,顯示出它們在產生信任和懷疑的能力以及正確判斷 AI 決策是否正確的能力方面存在差異。根據結果,我們強烈建議使用和擴充這種方法,以進行更多以目標為基礎的人為中心使用者研究,以終端到終端的方式衡量 XAI 效能。

Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health

2410.09635v1 by Abdullah Mamun, Lawrence D. Devoe, Mark I. Evans, David W. Britt, Judith Klein-Seetharaman, Hassan Ghasemzadeh

Early detection of intrapartum risk enables interventions to potentially prevent or mitigate adverse labor outcomes such as cerebral palsy. Currently, there is no accurate automated system to predict such events to assist with clinical decision-making. To fill this gap, we propose "Artificial Intelligence (AI) for Modeling and Explaining Neonatal Health" (AIMEN), a deep learning framework that not only predicts adverse labor outcomes from maternal, fetal, obstetrical, and intrapartum risk factors but also provides the model's reasoning behind the predictions made. The latter can provide insights into what modifications in the input variables of the model could have changed the predicted outcome. We address the challenges of imbalance and small datasets by synthesizing additional training data using Adaptive Synthetic Sampling (ADASYN) and Conditional Tabular Generative Adversarial Networks (CTGAN). AIMEN uses an ensemble of fully-connected neural networks as the backbone for its classification with the data augmentation supported by either ADASYN or CTGAN. AIMEN, supported by CTGAN, outperforms AIMEN supported by ADASYN in classification. AIMEN can predict a high risk for adverse labor outcomes with an average F1 score of 0.784. It also provides counterfactual explanations that can be achieved by changing 2 to 3 attributes on average. Resources available: https://github.com/ab9mamun/AIMEN.

摘要:產程中風險的早期偵測有助於進行干預措施,以預防或減輕不利的生產結果,例如腦性麻痺。目前,沒有準確的自動化系統可以預測此類事件,以協助臨床決策。為了填補這一空白,我們提出「用於建模和解釋新生兒健康的人工智慧」(AIMEN),這是一個深度學習架構,它不僅可以根據孕產婦、胎兒、產科和產程風險因素預測不利的生產結果,還能提供模型做出預測背後的原因。後者可以提供見解,說明模型輸入變數中的哪些修改可能會改變預測結果。我們透過使用適應性合成抽樣 (ADASYN) 和條件表格生成對抗網路 (CTGAN) 來合成額外的訓練資料,以解決不平衡和小型資料集的挑戰。AIMEN 使用全連接神經網路的集合作為其分類的骨幹,並透過 ADASYN 或 CTGAN 支援資料擴充。由 CTGAN 支援的 AIMEN 在分類方面優於由 ADASYN 支援的 AIMEN。AIMEN 可以預測不利的生產結果的高風險,平均 F1 分數為 0.784。它還提供反事實解釋,可透過平均變更 2 至 3 個屬性來達成。可用資源:https://github.com/ab9mamun/AIMEN。

Artificial intelligence techniques in inherited retinal diseases: A review

2410.09105v1 by Han Trinh, Jordan Vice, Jason Charng, Zahra Tajbakhsh, Khyber Alam, Fred K. Chen, Ajmal Mian

Inherited retinal diseases (IRDs) are a diverse group of genetic disorders that lead to progressive vision loss and are a major cause of blindness in working-age adults. The complexity and heterogeneity of IRDs pose significant challenges in diagnosis, prognosis, and management. Recent advancements in artificial intelligence (AI) offer promising solutions to these challenges. However, the rapid development of AI techniques and their varied applications have led to fragmented knowledge in this field. This review consolidates existing studies, identifies gaps, and provides an overview of AI's potential in diagnosing and managing IRDs. It aims to structure pathways for advancing clinical applications by exploring AI techniques like machine learning and deep learning, particularly in disease detection, progression prediction, and personalized treatment planning. Special focus is placed on the effectiveness of convolutional neural networks in these areas. Additionally, the integration of explainable AI is discussed, emphasizing its importance in clinical settings to improve transparency and trust in AI-based systems. The review addresses the need to bridge existing gaps in focused studies on AI's role in IRDs, offering a structured analysis of current AI techniques and outlining future research directions. It concludes with an overview of the challenges and opportunities in deploying AI for IRDs, highlighting the need for interdisciplinary collaboration and the continuous development of robust, interpretable AI models to advance clinical applications.

摘要:遺傳性視網膜疾病 (IRD) 是一組多樣化的遺傳疾病, 會導致視力逐漸喪失,是工作年齡成人失明的主要原因。IRD 的複雜性和異質性對診斷、預後和管理提出了重大挑戰。最近人工智能 (AI) 的進步為這些挑戰提供了有希望的解決方案。 然而,AI 技術的快速發展及其多種應用導致了該領域的知識分散。本綜述整合了現有研究,找出差距,並概述了 AI 在診斷和管理 IRD 中的潛力。它旨在通過探索機器學習和深度學習等 AI 技術,特別是在疾病檢測、進程預測和個性化治療計劃中,為推進臨床應用構建途徑。特別關注這些領域中卷積神經網路的有效性。此外,討論了可解釋 AI 的整合,強調了其在臨床環境中提高透明度和對基於 AI 的系統的信任的重要性。該綜述解決了彌合 AI 在 IRD 中作用的重點研究中現有差距的必要性,提供了對當前 AI 技術的結構化分析,並概述了未來的研究方向。最後概述了在 IRD 中部署 AI 的挑戰和機遇,強調了跨學科合作和持續開發強大、可解釋的 AI 模型以推進臨床應用的必要性。

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures

2410.05235v2 by Ekaterina Sviridova, Anar Yeginbergen, Ainara Estarrona, Elena Cabrio, Serena Villata, Rodrigo Agerri

Explaining Artificial Intelligence (AI) decisions is a major challenge nowadays in AI, in particular when applied to sensitive scenarios like medicine and law. However, the need to explain the rationale behind decisions is a main issue also for human-based deliberation as it is important to justify \textit{why} a certain decision has been taken. Resident medical doctors for instance are required not only to provide a (possibly correct) diagnosis, but also to explain how they reached a certain conclusion. Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. These explanations have been manually annotated with argument components (i.e., premise, claim) and argument relations (i.e., attack, support), resulting in the Multilingual CasiMedicos-Arg dataset which consists of 558 clinical cases in four languages (English, Spanish, French, Italian) with explanations, where we annotated 5021 claims, 2313 premises, 2431 support relations, and 1106 attack relations. We conclude by showing how competitive baselines perform over this challenging dataset for the argument mining task.

摘要:解釋人工智慧 (AI) 的決策是現在 AI 的一項重大挑戰,特別是應用於像醫學和法律等敏感情境時。然而,解釋決策背後理由的需求也是基於人類的考量的一個主要問題,因為有必要證明為什麼做出某個決策。例如,住院醫師不僅需要提供(可能是正確的)診斷,還需要解釋他們如何達成某個結論。因此,開發新的工具來幫助住院醫師訓練他們的解釋技巧是教育中 AI 的一項核心目標。在本文中,我們遵循這個方向,並且根據我們的了解,提出第一個多語言醫學問答資料集,其中臨床病例的正確和不正確診斷都附有由醫生撰寫的自然語言解釋。這些解釋已使用論證組成(即前提、主張)和論證關係(即攻擊、支持)進行手動註解,產生多語言 CasiMedicos-Arg 資料集,其中包含 558 個具有解釋的四種語言(英語、西班牙語、法語、義大利語)的臨床病例,我們註解了 5021 個主張、2313 個前提、2431 個支持關係和 1106 個攻擊關係。我們最後展示了競爭基準如何針對論證探勘任務執行此具挑戰性的資料集。

Explainable Diagnosis Prediction through Neuro-Symbolic Integration

2410.01855v1 by Qiuhao Lu, Rui Li, Elham Sagheb, Andrew Wen, Jinlian Wang, Liwei Wang, Jungwei W. Fan, Hongfang Liu

Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.

摘要:診斷預測是醫療保健中的一項關鍵任務,及時且準確地識別醫療狀況會對患者的結果產生重大影響。傳統機器學習和深度學習模型已在此領域取得顯著成功,但通常缺乏可解釋性,這是臨床環境中的關鍵要求。在本研究中,我們探討了神經符號方法,特別是邏輯神經網路 (LNN),以開發可解釋的診斷預測模型。基本上,我們設計並實作了基於 LNN 的模型,該模型透過邏輯規則和可學習的閾值整合領域特定的知識。我們的模型,特別是 $M_{\text{multi-pathway}}$ 和 $M_{\text{comprehensive}}$,表現出優於傳統模型(如邏輯迴歸、SVM 和隨機森林)的卓越效能,在糖尿病預測的案例研究中,達到了更高的準確度(高達 80.52%)和 AUROC 分數(高達 0.8457)。LNN 模型中學習的權重和閾值提供了對特徵貢獻的直接見解,增強了可解釋性,同時不損害預測能力。這些發現突顯了神經符號方法在彌合醫療保健 AI 應用中準確性和可解釋性差距方面的潛力。透過提供透明且適應性強的診斷模型,我們的研究有助於精準醫療的進步,並支援公平醫療保健解決方案的開發。未來的研究將專注於將這些方法擴展到更大且更多樣化的資料集,以進一步驗證其在不同醫療狀況和人群中的適用性。

Easydiagnos: a framework for accurate feature selection for automatic diagnosis in smart healthcare

2410.00366v1 by Prasenjit Maji, Amit Kumar Mondal, Hemanta Kumar Mondal, Saraju P. Mohanty

The rapid advancements in artificial intelligence (AI) have revolutionized smart healthcare, driving innovations in wearable technologies, continuous monitoring devices, and intelligent diagnostic systems. However, security, explainability, robustness, and performance optimization challenges remain critical barriers to widespread adoption in clinical environments. This research presents an innovative algorithmic method using the Adaptive Feature Evaluator (AFE) algorithm to improve feature selection in healthcare datasets and overcome problems. AFE integrating Genetic Algorithms (GA), Explainable Artificial Intelligence (XAI), and Permutation Combination Techniques (PCT), the algorithm optimizes Clinical Decision Support Systems (CDSS), thereby enhancing predictive accuracy and interpretability. The proposed method is validated across three diverse healthcare datasets using six distinct machine learning algorithms, demonstrating its robustness and superiority over conventional feature selection techniques. The results underscore the transformative potential of AFE in smart healthcare, enabling personalized and transparent patient care. Notably, the AFE algorithm, when combined with a Multi-layer Perceptron (MLP), achieved an accuracy of up to 98.5%, highlighting its capability to improve clinical decision-making processes in real-world healthcare applications.

摘要:人工智慧 (AI) 的快速進展徹底改變了智慧醫療保健,推動了可穿戴技術、持續監控裝置和智慧診斷系統的創新。然而,安全性、可解釋性、穩健性和效能最佳化挑戰仍然是臨床環境中廣泛採用的關鍵障礙。本研究提出一個創新的演算法方法,使用自適應特徵評估器 (AFE) 演算法來改善醫療保健資料集中的特徵選取並克服問題。AFE 整合了遺傳演算法 (GA)、可解釋人工智慧 (XAI) 和排列組合技術 (PCT),該演算法最佳化了臨床決策支援系統 (CDSS),從而提高了預測準確性和可解釋性。所提出的方法使用六種不同的機器學習演算法驗證了三個不同的醫療保健資料集,證明了其穩健性和優於傳統特徵選取技術。結果強調了 AFE 在智慧醫療保健中的轉變潛力,實現了個人化和透明的患者照護。值得注意的是,AFE 演算法與多層感知器 (MLP) 結合使用時,準確度高達 98.5%,突顯了其改善實際醫療保健應用中臨床決策制定流程的能力。

Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: eye-tracking study

2409.13476v1 by Tirtha Chanda, Sarah Haggenmueller, Tabea-Clara Bucher, Tim Holland-Letz, Harald Kittler, Philipp Tschandl, Markus V. Heppt, Carola Berking, Jochen S. Utikal, Bastian Schilling, Claudia Buerger, Cristian Navarrete-Dechent, Matthias Goebeler, Jakob Nikolas Kather, Carolin V. Schneider, Benjamin Durani, Hendrike Durani, Martin Jansen, Juliane Wacker, Joerg Wacker, Reader Study Consortium, Titus J. Brinker

Artificial intelligence (AI) systems have substantially improved dermatologists' diagnostic accuracy for melanoma, with explainable AI (XAI) systems further enhancing clinicians' confidence and trust in AI-driven decisions. Despite these advancements, there remains a critical need for objective evaluation of how dermatologists engage with both AI and XAI tools. In this study, 76 dermatologists participated in a reader study, diagnosing 16 dermoscopic images of melanomas and nevi using an XAI system that provides detailed, domain-specific explanations. Eye-tracking technology was employed to assess their interactions. Diagnostic performance was compared with that of a standard AI system lacking explanatory features. Our findings reveal that XAI systems improved balanced diagnostic accuracy by 2.8 percentage points relative to standard AI. Moreover, diagnostic disagreements with AI/XAI systems and complex lesions were associated with elevated cognitive load, as evidenced by increased ocular fixations. These insights have significant implications for clinical practice, the design of AI tools for visual tasks, and the broader development of XAI in medical diagnostics.

摘要:人工智慧 (AI) 系統已大幅改善皮膚科醫師對黑色素瘤的診斷準確度,而可解釋 AI (XAI) 系統進一步提升臨床醫師對 AI 驅動決策的信心與信賴。儘管有這些進展,對於皮膚科醫師如何使用 AI 和 XAI 工具,仍有客觀評估的迫切需求。在這項研究中,76 位皮膚科醫師參與了一項讀者研究,使用 XAI 系統診斷 16 張黑色素瘤和痣的皮膚鏡影像,該系統提供詳細的領域特定說明。採用眼球追蹤技術來評估他們的互動。將診斷表現與缺乏說明功能的標準 AI 系統進行比較。我們的研究結果顯示,XAI 系統相較於標準 AI,將平衡診斷準確度提升了 2.8 個百分點。此外,與 AI/XAI 系統的診斷分歧和複雜的病灶與認知負擔升高有關,這由增加的眼睛注視次數所證實。這些見解對臨床實務、視覺任務 AI 工具的設計和醫學診斷中 XAI 的廣泛發展具有重大意義。

Explainable AI for Autism Diagnosis: Identifying Critical Brain Regions Using fMRI Data

2409.15374v1 by Suryansh Vidya, Kush Gupta, Amir Aly, Andy Wills, Emmanuel Ifeachor, Rohit Shankar

Early diagnosis and intervention for Autism Spectrum Disorder (ASD) has been shown to significantly improve the quality of life of autistic individuals. However, diagnostics methods for ASD rely on assessments based on clinical presentation that are prone to bias and can be challenging to arrive at an early diagnosis. There is a need for objective biomarkers of ASD which can help improve diagnostic accuracy. Deep learning (DL) has achieved outstanding performance in diagnosing diseases and conditions from medical imaging data. Extensive research has been conducted on creating models that classify ASD using resting-state functional Magnetic Resonance Imaging (fMRI) data. However, existing models lack interpretability. This research aims to improve the accuracy and interpretability of ASD diagnosis by creating a DL model that can not only accurately classify ASD but also provide explainable insights into its working. The dataset used is a preprocessed version of the Autism Brain Imaging Data Exchange (ABIDE) with 884 samples. Our findings show a model that can accurately classify ASD and highlight critical brain regions differing between ASD and typical controls, with potential implications for early diagnosis and understanding of the neural basis of ASD. These findings are validated by studies in the literature that use different datasets and modalities, confirming that the model actually learned characteristics of ASD and not just the dataset. This study advances the field of explainable AI in medical imaging by providing a robust and interpretable model, thereby contributing to a future with objective and reliable ASD diagnostics.

摘要:自閉症譜系障礙 (ASD) 的早期診斷和介入已被證實能顯著改善自閉症患者的生活品質。然而,ASD 的診斷方法依賴於基於臨床表現的評估,容易產生偏見,且可能難以做出早期診斷。有必要找出 ASD 的客觀生物標記,以幫助提高診斷準確性。深度學習 (DL) 在從醫學影像資料診斷疾病和病症方面取得傑出的表現。已經針對建立使用靜態功能性磁振造影 (fMRI) 資料對 ASD 進行分類的模型進行廣泛的研究。然而,現有的模型缺乏可解釋性。本研究旨在透過建立一個不僅能準確分類 ASD,還能提供可解釋見解說明其運作原理的 DL 模型,來改善 ASD 診斷的準確性和可解釋性。所使用的資料集是自閉症大腦影像資料交換 (ABIDE) 的預處理版本,包含 884 個樣本。我們的研究結果顯示,該模型能準確分類 ASD,並強調 ASD 與典型對照組之間存在差異的關鍵腦區,對於 ASD 的早期診斷和神經基礎的理解具有潛在的意義。這些研究結果已由使用不同資料集和方式的文獻研究驗證,證實該模型實際上學習了 ASD 的特徵,而不僅僅是資料集。本研究透過提供一個強健且可解釋的模型,推動了醫學影像中可解釋 AI 的領域,從而為未來提供客觀且可靠的 ASD 診斷做出貢獻。

Improving Prototypical Parts Abstraction for Case-Based Reasoning Explanations Designed for the Kidney Stone Type Recognition

2409.12883v1 by Daniel Flores-Araiza, Francisco Lopez-Tiro, Clément Larose, Salvador Hinojosa, Andres Mendez-Vazquez, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz, Christian Daul

The in-vivo identification of the kidney stone types during an ureteroscopy would be a major medical advance in urology, as it could reduce the time of the tedious renal calculi extraction process, while diminishing infection risks. Furthermore, such an automated procedure would make possible to prescribe anti-recurrence treatments immediately. Nowadays, only few experienced urologists are able to recognize the kidney stone types in the images of the videos displayed on a screen during the endoscopy. Thus, several deep learning (DL) models have recently been proposed to automatically recognize the kidney stone types using ureteroscopic images. However, these DL models are of black box nature whicl limits their applicability in clinical settings. This contribution proposes a case-based reasoning DL model which uses prototypical parts (PPs) and generates local and global descriptors. The PPs encode for each class (i.e., kidney stone type) visual feature information (hue, saturation, intensity and textures) similar to that used by biologists. The PPs are optimally generated due a new loss function used during the model training. Moreover, the local and global descriptors of PPs allow to explain the decisions ("what" information, "where in the images") in an understandable way for biologists and urologists. The proposed DL model has been tested on a database including images of the six most widespread kidney stone types. The overall average classification accuracy was 90.37. When comparing this results with that of the eight other DL models of the kidney stone state-of-the-art, it can be seen that the valuable gain in explanability was not reached at the expense of accuracy which was even slightly increased with respect to that (88.2) of the best method of the literature. These promising and interpretable results also encourage urologists to put their trust in AI-based solutions.

摘要:尿路鏡檢查中腎結石類型的體內識別將是泌尿科的一項重大進展,因為它可以減少繁瑣的腎結石取出過程的時間,同時降低感染風險。此外,這種自動化程序將使立即開立抗復發治療成為可能。如今,只有少數經驗豐富的泌尿科醫生能夠在內視鏡檢查期間屏幕上顯示的視頻圖像中識別腎結石類型。因此,最近已提出多種深度學習 (DL) 模型,以使用輸尿管鏡圖像自動識別腎結石類型。然而,這些 DL 模型本質上是黑盒子,這限制了它們在臨床環境中的應用性。本文提出了一個基於案例推理的 DL 模型,它使用原型部分 (PP) 並生成局部和全局描述符。PP 為每種類型(即腎結石類型)編碼視覺特徵信息(色調、飽和度、強度和紋理),類似於生物學家使用的信息。由於在模型訓練期間使用的新損失函數,PP 得到了最佳生成。此外,PP 的局部和全局描述符允許以生物學家和泌尿科醫生可以理解的方式解釋決策(“什麼”信息,“圖像中的什麼位置”)。所提出的 DL 模型已在一個包含六種最廣泛的腎結石類型圖像的數據庫上進行了測試。總體平均分類準確率為 90.37。將此結果與腎結石最先進的八個其他 DL 模型的結果進行比較時,可以看出,可解釋性的寶貴增益並未以準確性為代價,甚至略有增加與文獻中最好的方法 (88.2) 相比。這些有希望且可解釋的結果也鼓勵泌尿科醫生相信基於人工智能的解決方案。

Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques

2409.12087v3 by Yubo Li, Saba Al-Sayouri, Rema Padman

This study explores the potential of utilizing administrative claims data, combined with advanced machine learning and deep learning techniques, to predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal Disease (ESRD). We analyze a comprehensive, 10-year dataset provided by a major health insurance organization to develop prediction models for multiple observation windows using traditional machine learning methods such as Random Forest and XGBoost as well as deep learning approaches such as Long Short-Term Memory (LSTM) networks. Our findings demonstrate that the LSTM model, particularly with a 24-month observation window, exhibits superior performance in predicting ESRD progression, outperforming existing models in the literature. We further apply SHapley Additive exPlanations (SHAP) analysis to enhance interpretability, providing insights into the impact of individual features on predictions at the individual patient level. This study underscores the value of leveraging administrative claims data for CKD management and predicting ESRD progression.

摘要:本研究探討利用行政申報資料,結合先進機器學習與深度學習技術,預測慢性腎臟病 (CKD) 進展至末期腎臟疾病 (ESRD) 的可能性。我們分析一家大型健康保險組織提供的 10 年綜合資料集,使用傳統機器學習方法(例如隨機森林和 XGBoost)以及深度學習方法(例如長期短期記憶 (LSTM) 網路)開發多個觀察視窗的預測模型。我們的研究結果顯示,LSTM 模型(尤其是 24 個月觀察視窗)在預測 ESRD 進展方面表現優異,優於文獻中的現有模型。我們進一步應用 SHapley 可加性解釋 (SHAP) 分析以增強可解釋性,深入了解個別特徵對個別患者層級預測的影響。本研究強調了利用行政申報資料進行 CKD 管理和預測 ESRD 進展的價值。

Explainable AI: Definition and attributes of a good explanation for health AI

2409.15338v1 by Evangelia Kyrimi, Scott McLachlan, Jared M Wohlgemut, Zane B Perkins, David A. Lagnado, William Marsh, the ExAIDSS Expert Group

Proposals of artificial intelligence (AI) solutions based on increasingly complex and accurate predictive models are becoming ubiquitous across many disciplines. As the complexity of these models grows, transparency and users' understanding often diminish. This suggests that accurate prediction alone is insufficient for making an AI-based solution truly useful. In the development of healthcare systems, this introduces new issues related to accountability and safety. Understanding how and why an AI system makes a recommendation may require complex explanations of its inner workings and reasoning processes. Although research on explainable AI (XAI) has significantly increased in recent years and there is high demand for XAI in medicine, defining what constitutes a good explanation remains ad hoc, and providing adequate explanations continues to be challenging. To fully realize the potential of AI, it is critical to address two fundamental questions about explanations for safety-critical AI applications, such as health-AI: (1) What is an explanation in health-AI? and (2) What are the attributes of a good explanation in health-AI? In this study, we examined published literature and gathered expert opinions through a two-round Delphi study. The research outputs include (1) a definition of what constitutes an explanation in health-AI and (2) a comprehensive list of attributes that characterize a good explanation in health-AI.

摘要:隨著越來越複雜且準確的預測模型,基於人工智慧 (AI) 解決方案的提案在許多領域中變得無處不在。隨著這些模型複雜性的增加,透明度和使用者的理解力往往會降低。這表示僅有準確的預測並不足以讓 AI 解決方案真正有用。在醫療保健系統的開發中,這引入了與問責制和安全性相關的新問題。瞭解 AI 系統如何以及為何提出建議可能需要對其內部運作和推理過程進行複雜的說明。儘管近年來對可解釋 AI (XAI) 的研究已大幅增加,且醫學領域對 XAI 有很高的需求,但定義什麼構成一個好的解釋仍是臨時性的,而提供適當的解釋仍然具有挑戰性。為了充分發揮 AI 的潛力,對於安全關鍵型 AI 應用(例如健康 AI)的解釋,探討兩個基本問題至關重要:(1) 什麼是健康 AI 中的解釋?以及 (2) 健康 AI 中一個好的解釋有哪些屬性?在本研究中,我們檢視了已發表的文獻,並透過兩輪德爾菲研究收集了專家意見。研究成果包括:(1) 健康 AI 中什麼構成解釋的定義,以及 (2) 健康 AI 中一個好解釋的屬性清單。

Exploring the Effect of Explanation Content and Format on User Comprehension and Trust

2408.17401v1 by Antonio Rago, Bence Palfi, Purin Sukpanichnant, Hannibal Nabli, Kavyesh Vivek, Olga Kostopoulou, James Kinross, Francesca Toni

In recent years, various methods have been introduced for explaining the outputs of "black-box" AI models. However, it is not well understood whether users actually comprehend and trust these explanations. In this paper, we focus on explanations for a regression tool for assessing cancer risk and examine the effect of the explanations' content and format on the user-centric metrics of comprehension and trust. Regarding content, we experiment with two explanation methods: the popular SHAP, based on game-theoretic notions and thus potentially complex for everyday users to comprehend, and occlusion-1, based on feature occlusion which may be more comprehensible. Regarding format, we present SHAP explanations as charts (SC), as is conventional, and occlusion-1 explanations as charts (OC) as well as text (OT), to which their simpler nature also lends itself. The experiments amount to user studies questioning participants, with two different levels of expertise (the general population and those with some medical training), on their subjective and objective comprehension of and trust in explanations for the outputs of the regression tool. In both studies we found a clear preference in terms of subjective comprehension and trust for occlusion-1 over SHAP explanations in general, when comparing based on content. However, direct comparisons of explanations when controlling for format only revealed evidence for OT over SC explanations in most cases, suggesting that the dominance of occlusion-1 over SHAP explanations may be driven by a preference for text over charts as explanations. Finally, we found no evidence of a difference between the explanation types in terms of objective comprehension. Thus overall, the choice of the content and format of explanations needs careful attention, since in some contexts format, rather than content, may play the critical role in improving user experience.

摘要:近年來,已經引進各種方法來解釋「黑箱」AI 模型的輸出。然而,目前並不清楚使用者是否實際理解和信任這些解釋。在本文中,我們專注於評估癌症風險的回歸工具的解釋,並探討解釋的內容和格式對以使用者為中心的理解和信任指標的影響。關於內容,我們實驗了兩種解釋方法:流行的 SHAP,基於博弈論概念,因此對於日常使用者來說可能很複雜,以及基於特徵遮蔽的 occlusion-1,可能更易於理解。關於格式,我們將 SHAP 解釋呈現為圖表 (SC),這是慣例,而將 occlusion-1 解釋呈現為圖表 (OC) 以及文字 (OT),其較為簡單的性質也適用於此。這些實驗等同於使用者研究,詢問參與者,具有兩種不同程度的專業知識(一般民眾和具備一些醫學訓練的人),他們對回歸工具輸出解釋的主觀和客觀理解和信任。在兩項研究中,我們發現,在基於內容進行比較時,一般來說,occlusion-1 優於 SHAP 解釋,在主觀理解和信任方面有明顯的偏好。然而,在僅控制格式的情況下直接比較解釋,在大多數情況下只顯示 OT 優於 SC 解釋的證據,這表明 occlusion-1 優於 SHAP 解釋的主導地位可能是由偏好文字而非圖表作為解釋所驅動的。最後,我們沒有發現解釋類型在客觀理解方面的差異證據。因此,總體而言,對解釋的內容和格式的選擇需要仔細注意,因為在某些情況下,格式而非內容,可能在改善使用者體驗方面發揮關鍵作用。

A Survey for Large Language Models in Biomedicine

2409.00133v1 by Chong Wang, Mengyao Li, Junjun He, Zhongruo Wang, Erfan Darzi, Zan Chen, Jin Ye, Tianbin Li, Yanzhou Su, Jing Ke, Kaili Qu, Shuxin Li, Yi Yu, Pietro Liò, Tianyun Wang, Yu Guang Wang, Yiqing Shen

Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv, provides an in-depth examination of the current landscape, applications, challenges, and prospects of LLMs in biomedicine, distinguishing itself by focusing on the practical implications of these models in real-world biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine, among others, with insights drawn from 137 key studies. Then, we discuss adaptation strategies of LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to enhance their performance in specialized biomedical contexts where zero-shot fails to achieve, such as medical question answering and efficient processing of biomedical literature. Finally, we discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics due to the sensitive nature of biomedical data, the need for highly reliable model outputs, and the ethical implications of deploying AI in healthcare. To address these challenges, we also identify future research directions of LLM in biomedicine including federated learning methods to preserve data privacy and integrating explainable AI methodologies to enhance the transparency of LLMs.

摘要:大型語言模型 (LLM) 的最新突破提供了前所未有的自然語言理解和生成能力。然而,現有關於生物醫學中 LLM 的調查通常專注於特定應用或模型架構,缺乏整合各種生物醫學領域最新進展的全面分析。本綜述基於對來自 PubMed、Web of Science 和 arXiv 等數據庫的 484 篇出版物的分析,深入探討了生物醫學中 LLM 的當前現況、應用、挑戰和前景,其特點是關注這些模型在現實世界生物醫學背景中的實際應用。首先,我們探討了 LLM 在廣泛的生物醫學任務中的零次學習能力,包括診斷輔助、藥物發現和個性化醫療等,並從 137 項關鍵研究中汲取見解。然後,我們討論了 LLM 的適應策略,包括單模態和多模態 LLM 的微調方法,以增強它們在零次學習無法實現的專業生物醫學背景中的性能,例如醫療問題解答和生物醫學文獻的有效處理。最後,我們討論了 LLM 在生物醫學領域面臨的挑戰,包括數據隱私問題、模型可解釋性有限、數據集質量問題以及由於生物醫學數據的敏感性、對高度可靠模型輸出的需求以及在醫療保健中部署 AI 的倫理影響而產生的倫理問題。為了應對這些挑戰,我們還確定了生物醫學中 LLM 未來的研究方向,包括用於保護數據隱私的聯合學習方法以及整合可解釋 AI 方法以增強 LLM 的透明度。

Aligning XAI with EU Regulations for Smart Biomedical Devices: A Methodology for Compliance Analysis

2408.15121v1 by Francesco Sovrano, Michael Lognoul, Giulia Vilone

Significant investment and development have gone into integrating Artificial Intelligence (AI) in medical and healthcare applications, leading to advanced control systems in medical technology. However, the opacity of AI systems raises concerns about essential characteristics needed in such sensitive applications, like transparency and trustworthiness. Our study addresses these concerns by investigating a process for selecting the most adequate Explainable AI (XAI) methods to comply with the explanation requirements of key EU regulations in the context of smart bioelectronics for medical devices. The adopted methodology starts with categorising smart devices by their control mechanisms (open-loop, closed-loop, and semi-closed-loop systems) and delving into their technology. Then, we analyse these regulations to define their explainability requirements for the various devices and related goals. Simultaneously, we classify XAI methods by their explanatory objectives. This allows for matching legal explainability requirements with XAI explanatory goals and determining the suitable XAI algorithms for achieving them. Our findings provide a nuanced understanding of which XAI algorithms align better with EU regulations for different types of medical devices. We demonstrate this through practical case studies on different neural implants, from chronic disease management to advanced prosthetics. This study fills a crucial gap in aligning XAI applications in bioelectronics with stringent provisions of EU regulations. It provides a practical framework for developers and researchers, ensuring their AI innovations advance healthcare technology and adhere to legal and ethical standards.

摘要:人工智慧(AI)在醫療和保健應用中投入了大量的投資和開發,進而導致醫療技術中的先進控制系統。然而,AI 系統的不透明性引發了對此類敏感應用中所需基本特性的擔憂,例如透明度和可信度。我們的研究透過調查一個程序來解決這些問題,用於選擇最充分的可解釋 AI(XAI)方法,以符合歐盟法規在醫療器材的智慧型生物電子學中的說明要求。採用的方法從透過其控制機制(開迴路、閉迴路和半閉迴路系統)對智慧型裝置進行分類,並深入探討其技術開始。然後,我們分析這些法規以定義其對各種裝置和相關目標的可解釋性要求。同時,我們透過其說明目標對 XAI 方法進行分類。這允許將法律可解釋性要求與 XAI 說明目標相匹配,並確定適當的 XAI 演算法來達成它們。我們的研究結果提供了對哪些 XAI 演算法更符合歐盟法規以適用於不同類型的醫療器材的細緻理解。我們透過不同神經植入物的實際案例研究來證明這一點,從慢性疾病管理到先進的義肢。這項研究填補了將生物電子學中的 XAI 應用與歐盟法規的嚴格規定相符的重要空白。它為開發人員和研究人員提供了一個實用的架構,確保其 AI 創新能促進醫療技術並遵守法律和道德標準。

Towards Case-based Interpretability for Medical Federated Learning

2408.13626v1 by Laura Latorre, Liliana Petrychenko, Regina Beets-Tan, Taisiya Kopytova, Wilson Silva

We explore deep generative models to generate case-based explanations in a medical federated learning setting. Explaining AI model decisions through case-based interpretability is paramount to increasing trust and allowing widespread adoption of AI in clinical practice. However, medical AI training paradigms are shifting towards federated learning settings in order to comply with data protection regulations. In a federated scenario, past data is inaccessible to the current user. Thus, we use a deep generative model to generate synthetic examples that protect privacy and explain decisions. Our proof-of-concept focuses on pleural effusion diagnosis and uses publicly available Chest X-ray data.

摘要:我們探索深度生成模型,在醫療聯邦學習設置中生成基於案例的說明。透過基於案例的可解釋性來解釋 AI 模型決策,對於增加信任並允許 AI 在臨床實務中廣泛採用至關重要。然而,醫療 AI 訓練範例正轉向聯邦學習設置,以符合資料保護法規。在聯邦情境中,過去的資料對目前的使用者而言是無法取得的。因此,我們使用深度生成模型來產生保護隱私和解釋決策的合成範例。我們的概念驗證著重於胸腔積液診斷,並使用公開可取得的胸部 X 光資料。

AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines

2408.12491v1 by Douwe J. Spaanderman, Matthew Marzetti, Xinyi Wan, Andrew F. Scarsbrook, Philip Robinson, Edwin H. G. Oei, Jacob J. Visser, Robert Hemke, Kirsten van Langevelde, David F. Hanff, Geert J. L. H. van Leenders, Cornelis Verhoef, Dirk J. Gruühagen, Wiro J. Niessen, Stefan Klein, Martijn P. A. Starmans

Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. The review covered literature from several bibliographic databases, including papers published before 17/07/2024. Original research in peer-reviewed journals focused on radiology-based AI for diagnosing or prognosing primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers for eligibility. Eligible papers were assessed against guidelines by one of three independent reviewers. The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9$\pm$7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1$\pm$2.1 out of 30. Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. define unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. build on previous work, explainability), evaluation (e.g. evaluating and addressing biases, evaluating AI against best practices), and data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods.

摘要:軟組織和骨骼腫瘤(STBT)是罕見、診斷具有挑戰性的病灶,其臨床行為和治療方法各不相同。這篇系統性回顧提供了使用放射影像進行診斷和預後的人工智慧 (AI) 方法的概觀,重點說明了臨床轉譯的挑戰,並評估研究與醫療影像 AI 核查表 (CLAIM) 和 FUTURE-AI 可信賴且可部署 AI 的國際共識準則的一致性,以促進 AI 方法的臨床轉譯。這篇回顧涵蓋了幾個書目資料庫中的文獻,包括在 2024 年 7 月 17 日之前發表的論文。納入了以放射為基礎的 AI 診斷或預後原發性 STBT 的同行評審期刊中的原始研究。排除標準是動物、屍體或實驗室研究,以及非英文論文。摘要由三位獨立審查員中的兩位篩選資格。合格的論文由三位獨立審查員中的一位根據準則進行評估。搜索識別出 15,015 篇摘要,其中 325 篇文章被納入評估。大多數研究在 CLAIM 中表現中等,平均得分為 53 分中的 28.9±7.5 分,但在 FUTURE-AI 中表現不佳,平均得分為 30 分中的 5.1±2.1 分。STBT 的影像 AI 工具仍處於概念驗證階段,表明有顯著的改進空間。AI 開發人員未來的努力應集中在設計(例如定義未滿足的臨床需求、預期的臨床環境以及 AI 如何整合到臨床工作流程中)、開發(例如建立在先前的工作、可解釋性)、評估(例如評估和解決偏差、評估 AI 與最佳實務)、以及數據可複製性和可用性(公開提供文件化的代碼和數據)。遵循這些建議可以改善 AI 方法的臨床轉譯。

Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy

2409.00001v1 by Kimji N. Pellano, Inga Strümke, Daniel Groos, Lars Adde, Espen Alexander F. Ihlen

Early detection of Cerebral Palsy (CP) is crucial for effective intervention and monitoring. This paper tests the reliability and applicability of Explainable AI (XAI) methods using a deep learning method that predicts CP by analyzing skeletal data extracted from video recordings of infant movements. Specifically, we use XAI evaluation metrics -- namely faithfulness and stability -- to quantitatively assess the reliability of Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) in this specific medical application. We utilize a unique dataset of infant movements and apply skeleton data perturbations without distorting the original dynamics of the infant movements. Our CP prediction model utilizes an ensemble approach, so we evaluate the XAI metrics performances for both the overall ensemble and the individual models. Our findings indicate that both XAI methods effectively identify key body points influencing CP predictions and that the explanations are robust against minor data perturbations. Grad-CAM significantly outperforms CAM in the RISv metric, which measures stability in terms of velocity. In contrast, CAM performs better in the RISb metric, which relates to bone stability, and the RRS metric, which assesses internal representation robustness. Individual models within the ensemble show varied results, and neither CAM nor Grad-CAM consistently outperform the other, with the ensemble approach providing a representation of outcomes from its constituent models.

摘要:腦性麻痺 (CP) 的早期偵測對於有效的介入和監測至關重要。本文測試了可解釋 AI (XAI) 方法的可靠性和適用性,使用深度學習方法,透過分析從嬰兒動作影片記錄中提取的骨骼資料來預測 CP。具體來說,我們使用 XAI 評估指標(即忠實度和穩定性)來量化評估類別激活映射 (CAM) 和梯度加權類別激活映射 (Grad-CAM) 在這個特定醫療應用中的可靠性。我們利用一個獨特的嬰兒動作資料集,並應用骨骼資料擾動,而不會扭曲嬰兒動作的原始動力。我們的 CP 預測模型利用整體方法,因此我們評估了整體整體和個別模型的 XAI 指標表現。我們的研究結果表明,兩種 XAI 方法都能有效識別影響 CP 預測的關鍵身體部位,並且這些解釋對於微小的資料擾動具有魯棒性。Grad-CAM 在 RISv 指標中顯著優於 CAM,該指標衡量速度方面的穩定性。相比之下,CAM 在 RISb 指標中表現得更好,該指標與骨骼穩定性有關,而 RRS 指標則評估內部表示的魯棒性。整體中的個別模型顯示出不同的結果,CAM 和 Grad-CAM 都不一致地優於另一種,整體方法提供了其組成模型結果的表示。

MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy

2408.11837v1 by Hanchen David Wang, Nibraas Khan, Anna Chen, Nilanjan Sarkar, Pamela Wisniewski, Meiyi Ma

Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors, providing therapists and patients with a comprehensive feedback interface, including video, text, and scores. Crucially, it employs multi-dimensional Dynamic Time Warping (DTW) and attribution-based explainable methods to analyze the existing deep learning neural networks in monitoring exercises, focusing on a high granularity of exercise. This synergistic approach is pivotal, providing output matching the input size to precisely highlight critical subtleties and movements in PT, thus transforming complex AI analysis into clear, actionable feedback. By highlighting these micro-motions in different metrics, such as stability and range of motion, MicroXercise significantly enhances the understanding and relevance of feedback for end-users. Comparative performance metrics underscore its effectiveness over traditional methods, such as a 39% and 42% improvement in Feature Mutual Information (FMI) and Continuity. MicroXercise is a step ahead in home-based physical therapy, providing a technologically advanced and intuitively helpful solution to enhance patient care and outcomes.

摘要:最近的全球估計表明,多達 24.1 億人有 健康狀況可從復健服務中受益。居家 物理治療 (PT) 在提供互動式 回饋和有意義的觀察方面面臨重大挑戰,供治療師和患者使用。為了填補這 個缺口,我們提出 MicroXercise,它將微動作分析與 可穿戴式感測器整合在一起,為治療師和患者提供一個全面的 回饋介面,包括影片、文字和分數。至關重要的是,它採用 多維動態時間規整 (DTW) 和基於歸因的可解釋 方法來分析監控運動中現有的深度學習神經網路,專注於運動的高粒度。這種協同 方法至關重要,提供與輸入大小匹配的輸出,以精確地 突出 PT 中關鍵的細微差別和動作,從而將複雜的 AI 分析轉換為清晰、可操作的回饋。透過在不同指標中突顯這些微動作,例如穩定性和動作範圍,MicroXercise 顯著提升最終使用者對回饋的理解和相關性。比較效能指標強調其優於 傳統方法的有效性,例如特徵互惠資訊 (FMI) 和連續性分別提升了 39% 和 42%。MicroXercise 在居家 物理治療方面更進一步,提供技術先進且直覺有用的 解決方案,以提升患者照護和結果。

The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development

2408.05239v1 by Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz

Systematic literature reviews are the highest quality of evidence in research. However, the review process is hindered by significant resource and data constraints. The Literature Review Network (LRN) is the first of its kind explainable AI platform adhering to PRISMA 2020 standards, designed to automate the entire literature review process. LRN was evaluated in the domain of surgical glove practices using 3 search strings developed by experts to query PubMed. A non-expert trained all LRN models. Performance was benchmarked against an expert manual review. Explainability and performance metrics assessed LRN's ability to replicate the experts' review. Concordance was measured with the Jaccard index and confusion matrices. Researchers were blinded to the other's results until study completion. Overlapping studies were integrated into an LRN-generated systematic review. LRN models demonstrated superior classification accuracy without expert training, achieving 84.78% and 85.71% accuracy. The highest performance model achieved high interrater reliability (k = 0.4953) and explainability metrics, linking 'reduce', 'accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51% of the relevant literature despite diverging from the non-expert's judgments (k = 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRN outperformed the manual review (19,920 minutes over 11 months), reducing the entire process to 288.6 minutes over 5 days. This study demonstrates that explainable AI does not require expert training to successfully conduct PRISMA-compliant systematic literature reviews like an expert. LRN summarized the results of surgical glove studies and identified themes that were nearly identical to the clinical researchers' findings. Explainable AI can accurately expedite our understanding of clinical practices, potentially revolutionizing healthcare research.

摘要:系統性文獻回顧是研究中證據品質最高的。然而,回顧過程受到顯著資源和資料限制的阻礙。文獻回顧網路 (LRN) 是第一個遵循 PRISMA 2020 標準的可解釋 AI 平台,旨在自動化整個文獻回顧過程。LRN 在外科手套實務領域中進行評估,使用專家開發的 3 個搜尋字串來查詢 PubMed。非專家訓練所有 LRN 模型。效能以專家手動回顧作為基準。可解釋性和效能指標評估 LRN 複製專家回顧的能力。一致性以 Jaccard 指數和混淆矩陣測量。研究人員在研究完成前對彼此的結果保密。重疊的研究整合到 LRN 生成的系統性回顧中。LRN 模型在沒有專家訓練的情況下展現出優異的分類準確率,達到 84.78% 和 85.71% 的準確率。效能最高的模型達到了高評分者間信賴度 (k = 0.4953) 和可解釋性指標,將「減少」、「意外」和「銳利」與「雙重戴手套」連結在一起。另一個 LRN 模型涵蓋了 91.51% 的相關文獻,儘管與非專家的判斷不同 (k = 0.2174),但包含了「乳膠」、「雙重」(手套)和「適應症」等詞彙。LRN 優於手動回顧(11 個月超過 19,920 分鐘),將整個過程縮短為 5 天超過 288.6 分鐘。這項研究顯示,可解釋的 AI 不需要專家訓練即可成功進行專家等級的 PRISMA 相容系統性文獻回顧。LRN 總結了外科手套研究的結果,並找出與臨床研究人員發現幾乎相同的主题。可解釋的 AI 可以準確地加快我們對臨床實務的理解,有潛力革新醫療保健研究。

Enhancing Medical Learning and Reasoning Systems: A Boxology-Based Comparative Analysis of Design Patterns

2408.02709v1 by Chi Him Ng

This study analyzes hybrid AI systems' design patterns and their effectiveness in clinical decision-making using the boxology framework. It categorizes and copares various architectures combining machine learning and rule-based reasoning to provide insights into their structural foundations and healthcare applications. Addressing two main questions, how to categorize these systems againts established design patterns and how to extract insights through comparative analysis, the study uses design patterns from software engineering to understand and optimize healthcare AI systems. Boxology helps identify commonalities and create reusable solutions, enhancing these systems' scalability, reliability, and performance. Five primary architectures are examined: REML, MLRB, RBML, RMLT, and PERML. Each has unique strengths and weaknesses, highlighting the need for tailored approaches in clinical tasks. REML excels in high-accuracy prediction for datasets with limited data; MLRB in handling large datasets and complex data integration; RBML in explainability and trustworthiness; RMLT in managing high-dimensional data; and PERML, though limited in analysis, shows promise in urgent care scenarios. The study introduces four new patterns, creates five abstract categorization patterns, and refines those five further to specific systems. These contributions enhance Boxlogy's taxonomical organization and offer novel approaches to integrating expert knowledge with machine learning. Boxology's structured, modular apporach offers significant advantages in developing and analyzing hybrid AI systems, revealing commonalities, and promoting reusable solutions. In conclusion, this study underscores hybrid AI systems' crucial role in advancing healthcare and Boxology's potential to drive further innovation in AI integration, ultimately improving clinical decision support and patient outcomes.

摘要:本研究使用盒子學框架分析混合人工智慧系統的設計模式及其在臨床決策中的有效性。它分類並比較結合機器學習和基於規則的推理的各種架構,以深入了解其結構基礎和醫療保健應用。針對兩個主要問題,如何根據既定的設計模式對這些系統進行分類,以及如何通過比較分析提取見解,本研究使用軟體工程中的設計模式來了解和優化醫療保健人工智慧系統。盒子學有助於識別共性並建立可重複使用的解決方案,從而增強這些系統的可擴充性、可靠性和效能。檢查了五種主要的架構:REML、MLRB、RBML、RMLT 和 PERML。每種架構都有獨特的優缺點,強調了在臨床任務中需要量身打造的方法。REML 在資料有限的資料集中表現出高精度的預測;MLRB 在處理大型資料集和複雜資料整合方面表現出色;RBML 在可解釋性和可信度方面表現出色;RMLT 在管理高維資料方面表現出色;而 PERML 儘管在分析方面有限,但在緊急照護場景中表現出潛力。本研究引入了四種新模式,建立了五種抽象分類模式,並進一步將這五種模式細化為具體的系統。這些貢獻增強了盒子學的分類組織,並提供了將專家知識與機器學習整合的新方法。盒子學的結構化、模組化方法在開發和分析混合人工智慧系統、揭示共性以及推廣可重複使用的解決方案方面具有顯著優勢。總之,本研究強調了混合人工智慧系統在推進醫療保健中的關鍵作用,以及盒子學在推動人工智慧整合進一步創新方面的潛力,最終改善臨床決策支援和患者的治療成果。

Bayesian Kolmogorov Arnold Networks (Bayesian_KANs): A Probabilistic Approach to Enhance Accuracy and Interpretability

2408.02706v1 by Masoud Muhammed Hassan

Because of its strong predictive skills, deep learning has emerged as an essential tool in many industries, including healthcare. Traditional deep learning models, on the other hand, frequently lack interpretability and omit to take prediction uncertainty into account two crucial components of clinical decision making. In order to produce explainable and uncertainty aware predictions, this study presents a novel framework called Bayesian Kolmogorov Arnold Networks (BKANs), which combines the expressive capacity of Kolmogorov Arnold Networks with Bayesian inference. We employ BKANs on two medical datasets, which are widely used benchmarks for assessing machine learning models in medical diagnostics: the Pima Indians Diabetes dataset and the Cleveland Heart Disease dataset. Our method provides useful insights into prediction confidence and decision boundaries and outperforms traditional deep learning models in terms of prediction accuracy. Moreover, BKANs' capacity to represent aleatoric and epistemic uncertainty guarantees doctors receive more solid and trustworthy decision support. Our Bayesian strategy improves the interpretability of the model and considerably minimises overfitting, which is important for tiny and imbalanced medical datasets, according to experimental results. We present possible expansions to further use BKANs in more complicated multimodal datasets and address the significance of these discoveries for future research in building reliable AI systems for healthcare. This work paves the way for a new paradigm in deep learning model deployment in vital sectors where transparency and reliability are crucial.

摘要:由於其強大的預測能力,深度學習已成為許多產業中不可或缺的工具,包括醫療保健。然而,傳統的深度學習模型通常缺乏可解釋性,並且忽略了將預測不確定性納入考量,而這兩個因素是臨床決策制定的關鍵組成部分。為了產生可解釋且具有不確定性意識的預測,本研究提出了一個名為貝氏柯爾莫哥洛夫阿諾德網路 (BKAN) 的新架構,它結合了柯爾莫哥洛夫阿諾德網路的表達能力與貝氏推論。我們在兩個醫學資料集上使用 BKAN,這些資料集是評估機器學習模型在醫學診斷中的廣泛使用基準:皮馬印第安人糖尿病資料集和克里夫蘭心臟病資料集。我們的模型提供了對預測信心和決策邊界的有益見解,並且在預測準確度方面優於傳統的深度學習模型。此外,BKAN 表現隨機和認識不確定性的能力,可確保醫生獲得更可靠且值得信賴的決策支援。根據實驗結果,我們的貝氏策略提高了模型的可解釋性,並大幅減少了過度擬合,這對於小型且不平衡的醫學資料集非常重要。我們提出了可能的擴充功能,以進一步將 BKAN 用於更複雜的多模式資料集,並探討這些發現對於未來建立可靠的醫療保健 AI 系統研究的重要性。這項工作為深度學習模型部署在透明度和可靠性至關重要的重要領域中開啟了一個新的典範。

MLtoGAI: Semantic Web based with Machine Learning for Enhanced Disease Prediction and Personalized Recommendations using Generative AI

2407.20284v1 by Shyam Dongre, Ritesh Chandra, Sonali Agarwal

In modern healthcare, addressing the complexities of accurate disease prediction and personalized recommendations is both crucial and challenging. This research introduces MLtoGAI, which integrates Semantic Web technology with Machine Learning (ML) to enhance disease prediction and offer user-friendly explanations through ChatGPT. The system comprises three key components: a reusable disease ontology that incorporates detailed knowledge about various diseases, a diagnostic classification model that uses patient symptoms to detect specific diseases accurately, and the integration of Semantic Web Rule Language (SWRL) with ontology and ChatGPT to generate clear, personalized health advice. This approach significantly improves prediction accuracy and ensures results that are easy to understand, addressing the complexity of diseases and diverse symptoms. The MLtoGAI system demonstrates substantial advancements in accuracy and user satisfaction, contributing to developing more intelligent and accessible healthcare solutions. This innovative approach combines the strengths of ML algorithms with the ability to provide transparent, human-understandable explanations through ChatGPT, achieving significant improvements in prediction accuracy and user comprehension. By leveraging semantic technology and explainable AI, the system enhances the accuracy of disease prediction and ensures that the recommendations are relevant and easily understood by individual patients. Our research highlights the potential of integrating advanced technologies to overcome existing challenges in medical diagnostics, paving the way for future developments in intelligent healthcare systems. Additionally, the system is validated using 200 synthetic patient data records, ensuring robust performance and reliability.

摘要:在現代醫療保健中,解決準確疾病預測和個性化建議的複雜性既至關重要又具有挑戰性。本研究引入了 MLtoGAI,它將語義網路技術與機器學習 (ML) 相結合,以增強疾病預測並透過 ChatGPT 提供使用者友善的說明。該系統包含三個關鍵組成部分:一個可重複使用的疾病本体,其中包含有關各種疾病的詳細知識;一個診斷分類模型,它使用患者症狀來準確檢測特定疾病;以及語義網路規則語言 (SWRL) 與本体和 ChatGPT 的整合,以產生清晰、個性化的健康建議。這種方法顯著提高了預測準確性,並確保了易於理解的結果,解決了疾病和不同症狀的複雜性。MLtoGAI 系統展示了準確性和使用者滿意度的實質性進步,有助於開發更智慧且更易於取得的醫療保健解決方案。這種創新的方法結合了 ML 演算法的優點,以及透過 ChatGPT 提供透明且人類可以理解的說明的能力,在預測準確性和使用者理解方面取得了顯著的進步。透過利用語義技術和可解釋的 AI,該系統提高了疾病預測的準確性,並確保了建議與個別患者相關且易於理解。我們的研究強調了整合先進技術以克服醫療診斷中現有挑戰的潛力,為智慧醫療保健系統的未來發展鋪路。此外,該系統使用 200 個合成患者資料記錄進行驗證,確保了穩健的效能和可靠性。

Introducing δ-XAI: a novel sensitivity-based method for local AI explanations

2407.18343v2 by Alessandro De Carlo, Enea Parimbelli, Nicola Melillo, Giovanna Nicora

Explainable Artificial Intelligence (XAI) is central to the debate on integrating Artificial Intelligence (AI) and Machine Learning (ML) algorithms into clinical practice. High-performing AI/ML models, such as ensemble learners and deep neural networks, often lack interpretability, hampering clinicians' trust in their predictions. To address this, XAI techniques are being developed to describe AI/ML predictions in human-understandable terms. One promising direction is the adaptation of sensitivity analysis (SA) and global sensitivity analysis (GSA), which inherently rank model inputs by their impact on predictions. Here, we introduce a novel delta-XAI method that provides local explanations of ML model predictions by extending the delta index, a GSA metric. The delta-XAI index assesses the impact of each feature's value on the predicted output for individual instances in both regression and classification problems. We formalize the delta-XAI index and provide code for its implementation. The delta-XAI method was evaluated on simulated scenarios using linear regression models, with Shapley values serving as a benchmark. Results showed that the delta-XAI index is generally consistent with Shapley values, with notable discrepancies in models with highly impactful or extreme feature values. The delta-XAI index demonstrated higher sensitivity in detecting dominant features and handling extreme feature values. Qualitatively, the delta-XAI provides intuitive explanations by leveraging probability density functions, making feature rankings clearer and more explainable for practitioners. Overall, the delta-XAI method appears promising for robustly obtaining local explanations of ML model predictions. Further investigations in real-world clinical settings will be conducted to evaluate its impact on AI-assisted clinical workflows.

摘要:可解釋人工智慧 (XAI) 是將人工智慧 (AI) 和機器學習 (ML) 演算法整合到臨床實務中的辯論核心。高執行效能的 AI/ML 模型,例如整體學習器和深度神經網路,通常缺乏可解釋性,阻礙臨床醫生對其預測的信任。為了解決這個問題,正在開發 XAI 技術,以人類可以理解的術語描述 AI/ML 預測。一個有希望的方向是採用敏感度分析 (SA) 和全球敏感度分析 (GSA),它們本質上會依據模型輸入對預測的影響來對其進行排名。在此,我們介紹一種新的 delta-XAI 方法,透過擴充 GSA 指標 delta 指數來提供 ML 模型預測的局部解釋。delta-XAI 指數評估每個特徵值對回歸和分類問題中個別例項的預測輸出之影響。我們將 delta-XAI 指數形式化,並提供其實作的程式碼。使用線性回歸模型對模擬情境評估 delta-XAI 方法,並以 Shapley 值作為基準。結果顯示 delta-XAI 指數通常與 Shapley 值一致,但在具有高度影響力或極端特徵值的模型中存在顯著差異。delta-XAI 指數在偵測主要特徵和處理極端特徵值方面表現出更高的敏感度。定性地來說,delta-XAI 透過利用機率密度函數提供直觀的解釋,使特徵排名更清晰且對從業人員來說更具可解釋性。總體而言,delta-XAI 方法對於穩健地取得 ML 模型預測的局部解釋似乎很有希望。將在真實世界的臨床環境中進行進一步調查,以評估其對 AI 輔助臨床工作流程的影響。

Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population

2407.17324v2 by Nikolaos Ntampakis, Konstantinos Diamantaras, Ioanna Chouvarda, Vasileios Argyriou, Panagiotis Sarigianndis

Dementia, a debilitating neurological condition affecting millions worldwide, presents significant diagnostic challenges. In this work, we introduce a novel methodology for the classification of demented and non-demented elderly patients using 3D brain Magnetic Resonance Imaging (MRI) scans. Our approach features a unique technique for selectively processing MRI slices, focusing on the most relevant brain regions and excluding less informative sections. This methodology is complemented by a confidence-based classification committee composed of three custom deep learning models: Dem3D ResNet, Dem3D CNN, and Dem3D EfficientNet. These models work synergistically to enhance decision-making accuracy, leveraging their collective strengths. Tested on the Open Access Series of Imaging Studies(OASIS) dataset, our method achieved an impressive accuracy of 94.12%, surpassing existing methodologies. Furthermore, validation on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset confirmed the robustness and generalizability of our approach. The use of explainable AI (XAI) techniques and comprehensive ablation studies further substantiate the effectiveness of our techniques, providing insights into the decision-making process and the importance of our methodology. This research offers a significant advancement in dementia diagnosis, providing a highly accurate and efficient tool for clinical applications.

摘要:失智症是一種影響全球數百萬人的衰弱性神經疾病,在診斷上具有重大挑戰。在這項工作中,我們提出了一種新的方法,用於對失智和非失智老年患者進行分類,使用 3D 大腦磁振造影 (MRI) 掃描。我們的做法採用了一種獨特技術,用於選擇性處理 MRI 切片,重點關注最相關的大腦區域,並排除信息量較少的部分。這種方法由一個基於信心的分類委員會補充,該委員會由三個自定義深度學習模型組成:Dem3D ResNet、Dem3D CNN 和 Dem3D EfficientNet。這些模型協同工作以增強決策的準確性,利用它們的集體優勢。在影像研究開放存取系列 (OASIS) 資料集上進行測試,我們的模型達到了 94.12% 的驚人準確度,超過了現有方法。此外,在阿茲海默症神經影像倡議 (ADNI) 資料集上的驗證證實了我們方法的穩健性和普遍性。可解釋 AI (XAI) 技術和全面的消融研究進一步證實了我們技術的有效性,提供了對決策過程和我們方法重要性的見解。這項研究為失智症診斷提供了重大進展,為臨床應用提供了一個高度準確且高效的工具。

Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition

2408.06352v1 by Michele Fiori, Gabriele Civitarese, Claudio Bettini

Recognizing daily activities with unobtrusive sensors in smart environments enables various healthcare applications. Monitoring how subjects perform activities at home and their changes over time can reveal early symptoms of health issues, such as cognitive decline. Most approaches in this field use deep learning models, which are often seen as black boxes mapping sensor data to activities. However, non-expert users like clinicians need to trust and understand these models' outputs. Thus, eXplainable AI (XAI) methods for Human Activity Recognition have emerged to provide intuitive natural language explanations from these models. Different XAI methods generate different explanations, and their effectiveness is typically evaluated through user surveys, that are often challenging in terms of costs and fairness. This paper proposes an automatic evaluation method using Large Language Models (LLMs) to identify, in a pool of candidates, the best XAI approach for non-expert users. Our preliminary results suggest that LLM evaluation aligns with user surveys.

摘要:藉由智慧環境中不引人注目的感測器辨識日常活動,能啟用各種醫療保健應用。監控受試者在家中如何執行活動,以及其隨著時間的變化,可以揭示健康問題的早期症狀,例如認知能力下降。此領域中的大多數方法都使用深度學習模型,這些模型通常被視為將感測器資料對應至活動的黑盒子。然而,非專家使用者(例如臨床醫師)需要信任並了解這些模型的輸出。因此,人類活動辨識的可解釋 AI (XAI) 方法應運而生,以提供來自這些模型的直覺自然語言說明。不同的 XAI 方法會產生不同的說明,而其有效性通常透過使用者調查來評估,這在成本和公平性方面通常具有挑戰性。本文提出使用大型語言模型 (LLM) 的自動評估方法,以在候選者中找出最適合非專家使用者的 XAI 方法。我們的初步結果表明,LLM 評估與使用者調查一致。

Explainable AI-based Intrusion Detection System for Industry 5.0: An Overview of the Literature, associated Challenges, the existing Solutions, and Potential Research Directions

2408.03335v1 by Naseem Khan, Kashif Ahmad, Aref Al Tamimi, Mohammed M. Alani, Amine Bermak, Issa Khalil

Industry 5.0, which focuses on human and Artificial Intelligence (AI) collaboration for performing different tasks in manufacturing, involves a higher number of robots, Internet of Things (IoTs) devices and interconnections, Augmented/Virtual Reality (AR), and other smart devices. The huge involvement of these devices and interconnection in various critical areas, such as economy, health, education and defense systems, poses several types of potential security flaws. AI itself has been proven a very effective and powerful tool in different areas of cybersecurity, such as intrusion detection, malware detection, and phishing detection, among others. Just as in many application areas, cybersecurity professionals were reluctant to accept black-box ML solutions for cybersecurity applications. This reluctance pushed forward the adoption of eXplainable Artificial Intelligence (XAI) as a tool that helps explain how decisions are made in ML-based systems. In this survey, we present a comprehensive study of different XAI-based intrusion detection systems for industry 5.0, and we also examine the impact of explainability and interpretability on Cybersecurity practices through the lens of Adversarial XIDS (Adv-XIDS) approaches. Furthermore, we analyze the possible opportunities and challenges in XAI cybersecurity systems for industry 5.0 that elicit future research toward XAI-based solutions to be adopted by high-stakes industry 5.0 applications. We believe this rigorous analysis will establish a foundational framework for subsequent research endeavors within the specified domain.

摘要:工業 5.0 著重於人類與人工智慧 (AI) 合作執行製造中的不同任務,涉及更多機器人、物聯網 (IoT) 裝置和互連、擴增/虛擬實境 (AR) 和其他智慧裝置。這些裝置和互連在經濟、醫療保健、教育和國防系統等各種關鍵領域的廣泛參與,引發了多種類型的潛在安全漏洞。AI 本身已被證明是網路安全不同領域中非常有效且強大的工具,例如入侵偵測、惡意軟體偵測和網路釣魚偵測等。就像在許多應用領域一樣,網路安全專業人員不願意接受黑盒 ML 解決方案來應用於網路安全。這種不願意促使可解釋人工智慧 (XAI) 作為一種工具被採用,有助於說明在基於 ML 的系統中如何做出決策。在這項調查中,我們對工業 5.0 的不同基於 XAI 的入侵偵測系統進行了全面的研究,並且我們也透過對抗式 XIDS (Adv-XIDS) 方法的觀點來探討可解釋性和可詮釋性對網路安全實務的影響。此外,我們分析了工業 5.0 的 XAI 網路安全系統中可能存在的機會和挑戰,引發了未來針對 XAI 基礎解決方案的研究,以供高風險的工業 5.0 應用採用。我們相信這項嚴謹的分析將為指定領域內的後續研究工作建立基礎架構。

A Comparative Study on Automatic Coding of Medical Letters with Explainability

2407.13638v1 by Jamie Glen, Lifeng Han, Paul Rayson, Goran Nenadic

This study aims to explore the implementation of Natural Language Processing (NLP) and machine learning (ML) techniques to automate the coding of medical letters with visualised explainability and light-weighted local computer settings. Currently in clinical settings, coding is a manual process that involves assigning codes to each condition, procedure, and medication in a patient's paperwork (e.g., 56265001 heart disease using SNOMED CT code). There are preliminary research on automatic coding in this field using state-of-the-art ML models; however, due to the complexity and size of the models, the real-world deployment is not achieved. To further facilitate the possibility of automatic coding practice, we explore some solutions in a local computer setting; in addition, we explore the function of explainability for transparency of AI models. We used the publicly available MIMIC-III database and the HAN/HLAN network models for ICD code prediction purposes. We also experimented with the mapping between ICD and SNOMED CT knowledge bases. In our experiments, the models provided useful information for 97.98\% of codes. The result of this investigation can shed some light on implementing automatic clinical coding in practice, such as in hospital settings, on the local computers used by clinicians , project page \url{https://github.com/Glenj01/Medical-Coding}.

摘要:本研究旨在探討將自然語言處理 (NLP) 和機器學習 (ML) 技術實作於醫療信函編碼自動化,並具備視覺化說明能力和輕量化的本地電腦設定。目前在臨床環境中,編碼是一種手動流程,涉及為病患文件中的每項病症、程序和藥物指派代碼 (例如,使用 SNOMED CT 代碼 56265001 表示心臟病)。此領域有使用最新 ML 模型進行自動編碼的初步研究;然而,由於模型的複雜性和大小,並未實現實際部署。為了進一步促進自動編碼實務的可能性,我們在本地電腦設定中探討了一些解決方案;此外,我們探討了說明功能在 AI 模型透明度中的功能。我們使用公開的 MIMIC-III 資料庫和 HAN/HLAN 網路模型進行 ICD 代碼預測。我們還試驗了 ICD 和 SNOMED CT 知識庫之間的對應。在我們的實驗中,這些模型提供了 97.98% 代碼的有用資訊。這項調查結果可以為實務中的自動臨床編碼實作提供一些見解,例如在醫院環境中,由臨床醫生使用的本地電腦,專案頁面 \url{https://github.com/Glenj01/Medical-Coding}。

Explainable AI for Enhancing Efficiency of DL-based Channel Estimation

2407.07009v1 by Abdul Karim Gizzini, Yahia Medjahdi, Ali J. Ghandour, Laurent Clavier

The support of artificial intelligence (AI) based decision-making is a key element in future 6G networks, where the concept of native AI will be introduced. Moreover, AI is widely employed in different critical applications such as autonomous driving and medical diagnosis. In such applications, using AI as black-box models is risky and challenging. Hence, it is crucial to understand and trust the decisions taken by these models. Tackling this issue can be achieved by developing explainable AI (XAI) schemes that aim to explain the logic behind the black-box model behavior, and thus, ensure its efficient and safe deployment. Recently, we proposed a novel perturbation-based XAI-CHEST framework that is oriented toward channel estimation in wireless communications. The core idea of the XAI-CHEST framework is to identify the relevant model inputs by inducing high noise on the irrelevant ones. This manuscript provides the detailed theoretical foundations of the XAI-CHEST framework. In particular, we derive the analytical expressions of the XAI-CHEST loss functions and the noise threshold fine-tuning optimization problem. Hence the designed XAI-CHEST delivers a smart input feature selection methodology that can further improve the overall performance while optimizing the architecture of the employed model. Simulation results show that the XAI-CHEST framework provides valid interpretations, where it offers an improved bit error rate performance while reducing the required computational complexity in comparison to the classical DL-based channel estimation.

摘要:人工智能 (AI) 支持的決策制定是未來 6G 網路中的關鍵元素,其中將引入原生 AI 的概念。此外,AI 廣泛用於不同的關鍵應用中,例如自動駕駛和醫療診斷。在這些應用中,使用 AI 作為黑盒模型是有風險且具有挑戰性的。因此,理解和信任這些模型做出的決策至關重要。解決此問題的方法是開發可解釋 AI (XAI) 架構,旨在解釋黑盒模型行為背後的邏輯,從而確保其有效且安全的部署。最近,我們提出了一個新的基於擾動的 XAI-CHEST 框架,該框架面向無線通信中的信道估計。XAI-CHEST 框架的核心思想是通過在無關輸入上引入高噪聲來識別相關模型輸入。這份手稿提供了 XAI-CHEST 框架的詳細理論基礎。特別是,我們推導了 XAI-CHEST 損失函數和噪聲閾值微調優化問題的解析表達式。因此,設計的 XAI-CHEST 提供了一種智能輸入特徵選擇方法,可以在優化所用模型的架構的同時進一步提高整體性能。模擬結果表明,XAI-CHEST 框架提供了有效的解釋,在降低所需的計算複雜度的同時,提供了改進的比特錯誤率性能,而這與基於傳統 DL 的信道估計相比。

Explainable AI: Comparative Analysis of Normal and Dilated ResNet Models for Fundus Disease Classification

2407.05440v2 by P. N. Karthikayan, Yoga Sri Varshan V, Hitesh Gupta Kattamuri, Umarani Jayaraman

This paper presents dilated Residual Network (ResNet) models for disease classification from retinal fundus images. Dilated convolution filters are used to replace normal convolution filters in the higher layers of the ResNet model (dilated ResNet) in order to improve the receptive field compared to the normal ResNet model for disease classification. This study introduces computer-assisted diagnostic tools that employ deep learning, enhanced with explainable AI techniques. These techniques aim to make the tool's decision-making process transparent, thereby enabling medical professionals to understand and trust the AI's diagnostic decision. They are particularly relevant in today's healthcare landscape, where there is a growing demand for transparency in AI applications to ensure their reliability and ethical use. The dilated ResNet is used as a replacement for the normal ResNet to enhance the classification accuracy of retinal eye diseases and reduce the required computing time. The dataset used in this work is the Ocular Disease Intelligent Recognition (ODIR) dataset which is a structured ophthalmic database with eight classes covering most of the common retinal eye diseases. The evaluation metrics used in this work include precision, recall, accuracy, and F1 score. In this work, a comparative study has been made between normal ResNet models and dilated ResNet models on five variants namely ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The dilated ResNet model shows promising results as compared to normal ResNet with an average F1 score of 0.71, 0.70, 0.69, 0.67, and 0.70 respectively for the above respective variants in ODIR multiclass disease classification.

摘要:这篇论文提出了用于从视网膜眼底图像进行疾病分类的扩张残差网络 (ResNet) 模型。扩张卷积滤波器用于替换 ResNet 模型较高层中的正常卷积滤波器(扩张 ResNet),以改善感知场,从而针对疾病分类对正常 ResNet 模型进行改进。本研究引入了采用深度学习的计算机辅助诊断工具,并通过可解释的 AI 技术进行了增强。这些技术旨在使该工具的决策过程透明化,从而使医学专业人士能够理解和信任 AI 的诊断决策。它们与当今的医疗保健领域尤为相关,在该领域,对 AI 应用的透明度需求不断增长,以确保其可靠性和合乎道德的使用。扩张 ResNet 用作正常 ResNet 的替代品,以提高视网膜眼部疾病的分类准确性并减少所需的计算时间。本工作中使用的数据集是眼科疾病智能识别 (ODIR) 数据集,这是一个结构化的眼科数据库,包含八类涵盖大多数常见视网膜眼部疾病。本工作中使用的评估指标包括精确度、召回率、准确度和 F1 得分。在这项工作中,对 ResNet-18、ResNet-34、ResNet-50、ResNet-101 和 ResNet-152 五个变体的正常 ResNet 模型和扩张 ResNet 模型进行了比较研究。与正常 ResNet 相比,扩张 ResNet 模型显示出有希望的结果,在 ODIR 多类疾病分类中,上述各个变体的平均 F1 得分为 0.71、0.70、0.69、0.67 和 0.70。

A Survey on Trustworthiness in Foundation Models for Medical Image Analysis

2407.15851v2 by Congzhen Shi, Ryan Rezai, Jiaxi Yang, Qi Dou, Xiaoxiao Li

The rapid advancement of foundation models in medical imaging represents a significant leap toward enhancing diagnostic accuracy and personalized treatment. However, the deployment of foundation models in healthcare necessitates a rigorous examination of their trustworthiness, encompassing privacy, robustness, reliability, explainability, and fairness. The current body of survey literature on foundation models in medical imaging reveals considerable gaps, particularly in the area of trustworthiness. Additionally, existing surveys on the trustworthiness of foundation models do not adequately address their specific variations and applications within the medical imaging domain. This survey aims to fill that gap by presenting a novel taxonomy of foundation models used in medical imaging and analyzing the key motivations for ensuring their trustworthiness. We review current research on foundation models in major medical imaging applications, focusing on segmentation, medical report generation, medical question and answering (Q\&A), and disease diagnosis. These areas are highlighted because they have seen a relatively mature and substantial number of foundation models compared to other applications. We focus on literature that discusses trustworthiness in medical image analysis manuscripts. We explore the complex challenges of building trustworthy foundation models for each application, summarizing current concerns and strategies for enhancing trustworthiness. Furthermore, we examine the potential of these models to revolutionize patient care. Our analysis underscores the imperative for advancing towards trustworthy AI in medical image analysis, advocating for a balanced approach that fosters innovation while ensuring ethical and equitable healthcare delivery.

摘要:基礎模型在醫學影像方面的快速進展,代表著在加強診斷準確性和個人化治療方面邁出一大步。然而,基礎模型在醫療保健中的部署需要對其可信度進行嚴格的審查,包括隱私、穩健性、可靠性、可解釋性和公平性。目前關於醫學影像中基礎模型的調查文獻中顯示出相當大的差距,特別是在可信度方面。此外,現有關於基礎模型可信度的調查並未充分解決其在醫學影像領域中的特定變化和應用。本調查旨在通過提出醫學影像中使用的基礎模型的新分類法並分析確保其可信度的關鍵動機,來填補這一空白。我們回顧了基礎模型在主要醫學影像應用中的當前研究,重點關注分割、醫療報告生成、醫療問題和回答 (Q&A) 以及疾病診斷。這些領域之所以被強調,是因為與其他應用相比,它們已經看到相對成熟且大量的基礎模型。我們專注於探討醫學影像分析手稿中可信度的文獻。我們探討了為每個應用構建可信基礎模型的複雜挑戰,總結了當前關注點和增強可信度的策略。此外,我們探討了這些模型在革新患者護理方面的潛力。我們的分析強調了在醫學影像分析中朝著可信賴的人工智慧邁進的必要性,並倡導一種平衡的方法,既能促進創新,又能確保道德和公平的醫療保健服務。

The Impact of an XAI-Augmented Approach on Binary Classification with Scarce Data

2407.06206v1 by Ximing Wen, Rosina O. Weber, Anik Sen, Darryl Hannan, Steven C. Nesbit, Vincent Chan, Alberto Goffi, Michael Morris, John C. Hunninghake, Nicholas E. Villalobos, Edward Kim, Christopher J. MacLellan

Point-of-Care Ultrasound (POCUS) is the practice of clinicians conducting and interpreting ultrasound scans right at the patient's bedside. However, the expertise needed to interpret these images is considerable and may not always be present in emergency situations. This reality makes algorithms such as machine learning classifiers extremely valuable to augment human decisions. POCUS devices are becoming available at a reasonable cost in the size of a mobile phone. The challenge of turning POCUS devices into life-saving tools is that interpretation of ultrasound images requires specialist training and experience. Unfortunately, the difficulty to obtain positive training images represents an important obstacle to building efficient and accurate classifiers. Hence, the problem we try to investigate is how to explore strategies to increase accuracy of classifiers trained with scarce data. We hypothesize that training with a few data instances may not suffice for classifiers to generalize causing them to overfit. Our approach uses an Explainable AI-Augmented approach to help the algorithm learn more from less and potentially help the classifier better generalize.

摘要:床邊超音波 (POCUS) 是臨床醫師在患者床邊進行和解讀超音波掃描的實務。然而,解讀這些影像所需的專業知識相當可觀,而且在緊急情況下可能並非隨時具備。這種現實情況使得機器學習分類器等演算法對於加強人類決策變得極為有價值。POCUS 裝置正以合理成本推出,尺寸為手機大小。將 POCUS 裝置轉變為救生工具的挑戰在於,解讀超音波影像需要專門訓練和經驗。不幸的是,取得正向訓練影像的困難度代表著建置有效率且準確的分類器的一大障礙。因此,我們嘗試探討的問題是如何探索策略,以提高使用稀疏資料訓練的分類器的準確度。我們假設使用少數資料實例進行訓練可能不足以讓分類器概括,導致它們過度擬合。我們的做法使用可解釋 AI 增強方法,以協助演算法從較少的資料中學習更多,並潛在協助分類器更好地概括。

Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach

2407.00167v1 by Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit-vaping intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit vaping intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection.

摘要:近年來,美國見證了電子煙或電子香菸使用率大幅激增,導致電子煙和電子煙使用相關肺損傷 (EVALI) 病例顯著增加,在 2019 年 EVALI 爆發期間造成住院和死亡,凸顯了理解電子煙行為和制定有效戒菸策略的迫切性。由於社群媒體平台的普及,全球超過 47 億使用者使用它們進行連結、溝通、新聞和娛樂,其中很大一部分與健康相關,因此將社群媒體資料建立為公共衛生研究中無價的有機資料資源。在本研究中,我們從 Reddit 上一個電子煙子社群中提取一個範例資料集,以分析使用者的戒電子煙意圖。利用 OpenAI 最新的大型語言模型 GPT-4 進行句子層級的戒電子煙意圖偵測,本研究比較了此模型的結果與外行人和臨床專家註解。使用不同的提示策略,例如零次學習、一次學習、少次學習和思考鏈提示,我們開發了 8 個提示,詳細程度不同,向 GPT-4 解釋任務,並評估這些策略彼此之間的效能。這些初步發現強調了 GPT-4 在社群媒體資料分析中的潛力,特別是在識別人類偵測可能無法察覺的使用者微妙意圖方面。

Towards Compositional Interpretability for XAI

2406.17583v1 by Sean Tull, Robin Lorenz, Stephen Clark, Ilyas Khan, Bob Coecke

Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.

摘要:人工智慧(AI)目前在很大程度上依賴於缺乏可解釋性的黑盒機器學習模型。可解釋性人工智慧(XAI)領域致力於解決這個主要問題,這在金融、法律和健康等高風險領域至關重要。 我們提出了一種基於範疇論定義 AI 模型及其可解釋性的方法。為此,我們採用組合模型的概念,它以形式弦圖的形式看待模型,這些弦圖捕獲了模型的抽象結構及其具體實現。這種綜合觀點包含了確定性、概率性和量子模型。我們將各種 AI 模型作為組合模型進行比較,包括線性和基於規則的模型、(遞迴)神經網路、Transformer、VAE,以及因果和 DisCoCirc 模型。 接下來,我們根據模型的組合結構給出模型解釋的定義,展示如何分析模型的可解釋性,並使用它來澄清 XAI 中的常見主題。我們發現,讓標準的「內在可解釋」模型如此透明的原因在圖表中表現得最為清楚。這引導我們得出更一般的組合可解釋(CI)模型概念,它另外還包括因果、概念空間和 DisCoCirc 模型。 接下來,我們展示了 CI 模型的可解釋性優勢。首先,它們的組合結構允許計算其他感興趣的量,並可能通過匹配模型的結構來促進從模型到被建模現象的推理。其次,它們允許對其行為進行圖解說明,這些說明基於影響約束、圖解手術和重寫說明。最後,我們討論了這種方法的許多未來方向,提出了如何在實踐中學習這種有意義的結構化模型的問題。

Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods

2406.12142v2 by Vincent Olesen, Nina Weng, Aasa Feragen, Eike Petersen

Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.

摘要:機器學習模型在醫學影像分析中已達到整體高準確度。然而,特定患者群體的效能差異對其臨床效用、安全性與公平性構成挑戰。這可能會影響已知的患者群體(例如基於性別、年齡或疾病亞型)以及先前未知且未標籤的群體。此外,此類觀察到的效能差異的根本原因通常難以發現,阻礙了緩解措施。在本文中,為了解決這些問題,我們利用切片發現方法 (SDM) 來識別可解釋的資料效能不佳子集,並針對觀察到的效能差異原因制定假設。我們引入一種新的 SDM,並在胸部 X 光片中肺炎和肺不張分類的案例研究中應用它。我們的研究證明了 SDM 在假設制定中的有效性,並對廣泛使用的胸部 X 光片資料集和模型中先前觀察到但無法解釋的男性和女性患者之間的效能差異提供了解釋。我們的發現表明,在分類任務中,透過胸腔引流管和心電圖導線的存在,存在捷徑學習。這些捷徑特徵的盛行率存在基於性別的差異,似乎會導致觀察到的分類效能差距,這代表捷徑學習和模型公平性分析之間先前未受到重視的交互作用。

Unlocking the Potential of Metaverse in Innovative and Immersive Digital Health

2406.07114v2 by Fatemeh Ebrahimzadeh, Ramin Safa

The concept of Metaverse has attracted a lot of attention in various fields and one of its important applications is health and treatment. The Metaverse has enormous potential to transform healthcare by changing patient care, medical education, and the way teaching/learning and research are done. The purpose of this research is to provide an introduction to the basic concepts and fundamental technologies of the Metaverse. This paper examines the pros and cons of the Metaverse in healthcare context and analyzes its potential from the technology and AI perspective. In particular, the role of machine learning methods is discussed; We will explain how machine learning algorithms can be applied to the Metaverse generated data to gain better insights in healthcare applications. Additionally, we examine the future visions of the Metaverse in health delivery, by examining emerging technologies such as blockchain and also addressing privacy concerns. The findings of this study contribute to a deeper understanding of the applications of Metaverse in healthcare and its potential to revolutionize the delivery of medical services.

摘要:元宇宙的概念在各個領域都備受關注,其重要應用之一便是醫療保健。元宇宙有巨大的潛力透過改變病患照護、醫學教育,以及教學/學習和研究的方式來轉型醫療保健。本研究的目的是提供元宇宙基本概念和基礎技術的介紹。本文探討了元宇宙在醫療保健背景下的優缺點,並從技術和 AI 的角度分析其潛力。特別是,討論了機器學習方法的角色;我們將說明如何將機器學習演算法應用於元宇宙產生的資料,以獲得醫療保健應用方面的更佳見解。此外,我們透過探討區塊鏈等新興技術,並解決隱私問題,來探討元宇宙在醫療保健方面的未來願景。本研究的發現有助於更深入地了解元宇宙在醫療保健中的應用,以及其在醫療服務提供方面發揮革命性變革的潛力。

AI-Driven Predictive Analytics Approach for Early Prognosis of Chronic Kidney Disease Using Ensemble Learning and Explainable AI

2406.06728v1 by K M Tawsik Jawad, Anusha Verma, Fathi Amsaad

Chronic Kidney Disease (CKD) is one of the widespread Chronic diseases with no known ultimo cure and high morbidity. Research demonstrates that progressive Chronic Kidney Disease (CKD) is a heterogeneous disorder that significantly impacts kidney structure and functions, eventually leading to kidney failure. With the progression of time, chronic kidney disease has moved from a life-threatening disease affecting few people to a common disorder of varying severity. The goal of this research is to visualize dominating features, feature scores, and values exhibited for early prognosis and detection of CKD using ensemble learning and explainable AI. For that, an AI-driven predictive analytics approach is proposed to aid clinical practitioners in prescribing lifestyle modifications for individual patients to reduce the rate of progression of this disease. Our dataset is collected on body vitals from individuals with CKD and healthy subjects to develop our proposed AI-driven solution accurately. In this regard, blood and urine test results are provided, and ensemble tree-based machine-learning models are applied to predict unseen cases of CKD. Our research findings are validated after lengthy consultations with nephrologists. Our experiments and interpretation results are compared with existing explainable AI applications in various healthcare domains, including CKD. The comparison shows that our developed AI models, particularly the Random Forest model, have identified more features as significant contributors than XgBoost. Interpretability (I), which measures the ratio of important to masked features, indicates that our XgBoost model achieved a higher score, specifically a Fidelity of 98\%, in this metric and naturally in the FII index compared to competing models.

摘要:慢性腎臟病(CKD)是一種廣泛的慢性疾病,沒有已知的最終療法且發病率很高。研究表明,進行性慢性腎臟病(CKD)是一種異質性疾病,會顯著影響腎臟結構和功能,最終導致腎衰竭。隨著時間的推移,慢性腎臟病已從影響少數人的致命疾病轉變為一種嚴重程度不同的常見疾病。本研究的目標是使用集成學習和可解釋的 AI 進行早期預後和 CKD 檢測,並視覺化主導特徵、特徵分數和表現出的值。為此,提出了一種 AI 驅動的預測分析方法,以幫助臨床醫生為個別患者開具生活方式修改建議,以降低這種疾病的進展速度。我們的數據集是從 CKD 患者和健康受試者的身體生命體徵中收集的,以準確開發我們提出的 AI 驅動的解決方案。在這方面,提供了血液和尿液檢測結果,並應用基於集成樹的機器學習模型來預測未發現的 CKD 病例。我們的研究結果經過與腎臟科醫生的長期諮詢後得到驗證。我們的實驗和解釋結果與各種醫療保健領域中現有的可解釋 AI 應用進行了比較,包括 CKD。比較表明,我們開發的 AI 模型,特別是隨機森林模型,已經確定了比 XgBoost 更多作為重要貢獻者的特徵。可解釋性 (I) 衡量重要特徵與掩蓋特徵的比率,表明我們的 XgBoost 模型在這個指標中獲得了更高的分數,特別是 98% 的保真度,並且在 FII 指數中自然高於競爭模型。

Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook

2406.05984v1 by Yusif Ibrahimov, Tarique Anwar, Tommy Yuan

Mental health constitutes a complex and pervasive global challenge, affecting millions of lives and often leading to severe consequences. In this paper, we conduct a thorough survey to explore the intersection of data science, artificial intelligence, and mental healthcare, focusing on the recent developments of mental disorder detection through online social media (OSM). A significant portion of the population actively engages in OSM platforms, creating a vast repository of personal data that holds immense potential for mental health analytics. The paper navigates through traditional diagnostic methods, state-of-the-art data- and AI-driven research studies, and the emergence of explainable AI (XAI) models for mental healthcare. We review state-of-the-art machine learning methods, particularly those based on modern deep learning, while emphasising the need for explainability in healthcare AI models. The experimental design section provides insights into prevalent practices, including available datasets and evaluation approaches. We also identify key issues and challenges in the field and propose promising future research directions. As mental health decisions demand transparency, interpretability, and ethical considerations, this paper contributes to the ongoing discourse on advancing XAI in mental healthcare through social media. The comprehensive overview presented here aims to guide researchers, practitioners, and policymakers in developing the area of mental disorder detection.

摘要:心理健康構成了一項複雜且普遍的全球挑戰,影響了數百萬人的生活,並經常導致嚴重的後果。在本文中,我們進行了一項徹底的調查,以探索數據科學、人工智慧和心理保健的交集,重點關注通過線上社交媒體 (OSM) 進行心理疾病檢測的最新發展。很大一部分人口積極參與 OSM 平台,創造了一個龐大的人員資料庫,對心理健康分析具有巨大的潛力。本文探討了傳統的診斷方法、最先進的資料和 AI 驅動的研究,以及心理保健中可解釋 AI (XAI) 模型的出現。我們回顧了最先進的機器學習方法,特別是那些基於現代深度學習的方法,同時強調了醫療保健 AI 模型中可解釋性的必要性。實驗設計部分提供了對普遍做法的見解,包括可用的資料集和評估方法。我們還找出該領域的主要問題和挑戰,並提出了有希望的未來研究方向。由於心理健康決策需要透明度、可解釋性和道德考量,本文有助於推進心理保健中透過社交媒體推進 XAI 的持續討論。這裡提出的全面概述旨在引導研究人員、從業人員和政策制定者發展心理疾病檢測領域。

Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

2406.05746v1 by Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.

摘要:醫療照護中需要 AI 輔助的臨床診斷。現有的深度學習模型缺乏可解釋性,並且主要專注於影像分析。最近開發的動態不確定因果關係圖 (DUCG) 方法是因果驅動的、可解釋的,並且在不同的應用場景中是不變的,沒有資料收集、標記、擬合、隱私、偏見、概化、高成本和高能耗的問題。通過臨床專家和 DUCG 技術人員之間的密切合作,構建了涵蓋 54 個主訴的 46 個 DUCG 模型。可以在沒有分流的情況下診斷出 1,000 多種疾病。在應用於實際世界之前,46 個 DUCG 模型已由第三方醫院回溯性驗證。驗證的診斷精度不低於 95%,其中包括罕見疾病在內的每種疾病的診斷精度不低於 80%。驗證後,46 個 DUCG 模型已在中國實際應用。已經執行了超過一百萬個真實診斷案例,僅發現 17 個不正確的診斷。由於 DUCG 的透明性,發現並糾正了導致不正確診斷的錯誤。頻繁應用 DUCG 的臨床醫生的診斷能力得到了顯著提高。在介紹了前面提出的 DUCG 方法論之後,提出了潛在健康檢查的推薦演算法,並提取了 DUCG 的關鍵思想。

Advancing Histopathology-Based Breast Cancer Diagnosis: Insights into Multi-Modality and Explainability

2406.12897v1 by Faseela Abdullakutty, Younes Akbari, Somaya Al-Maadeed, Ahmed Bouridane, Rifat Hamoudi

It is imperative that breast cancer is detected precisely and timely to improve patient outcomes. Diagnostic methodologies have traditionally relied on unimodal approaches; however, medical data analytics is integrating diverse data sources beyond conventional imaging. Using multi-modal techniques, integrating both image and non-image data, marks a transformative advancement in breast cancer diagnosis. The purpose of this review is to explore the burgeoning field of multimodal techniques, particularly the fusion of histopathology images with non-image data. Further, Explainable AI (XAI) will be used to elucidate the decision-making processes of complex algorithms, emphasizing the necessity of explainability in diagnostic processes. This review utilizes multi-modal data and emphasizes explainability to enhance diagnostic accuracy, clinician confidence, and patient engagement, ultimately fostering more personalized treatment strategies for breast cancer, while also identifying research gaps in multi-modality and explainability, guiding future studies, and contributing to the strategic direction of the field.

摘要:精確且及時地偵測乳癌對於改善患者預後至關重要。診斷方法傳統上依賴於單一模式方法;然而,醫療資料分析正在整合超越傳統影像的各種資料來源。使用整合影像和非影像資料的多模式技術,標誌著乳癌診斷的變革性進展。本篇綜述的目的是探討多模式技術的新興領域,特別是將組織病理學影像與非影像資料融合。此外,可解釋人工智慧 (XAI) 將用於闡明複雜演算法的決策過程,強調診斷過程中可解釋性的必要性。本綜述利用多模式資料並強調可解釋性,以提高診斷準確性、臨床醫師的信心和患者參與度,最終促進乳癌更個人化的治療策略,同時也找出多模式和可解釋性的研究差距,引導未來的研究,並為該領域的策略方向做出貢獻。

Revisiting Attention Weights as Interpretations of Message-Passing Neural Networks

2406.04612v1 by Yong-Min Shin, Siqing Li, Xin Cao, Won-Yong Shin

The self-attention mechanism has been adopted in several widely-used message-passing neural networks (MPNNs) (e.g., GATs), which adaptively controls the amount of information that flows along the edges of the underlying graph. This usage of attention has made such models a baseline for studies on explainable AI (XAI) since interpretations via attention have been popularized in various domains (e.g., natural language processing and computer vision). However, existing studies often use naive calculations to derive attribution scores from attention, and do not take the precise and careful calculation of edge attribution into consideration. In our study, we aim to fill the gap between the widespread usage of attention-enabled MPNNs and their potential in largely under-explored explainability, a topic that has been actively investigated in other areas. To this end, as the first attempt, we formalize the problem of edge attribution from attention weights in GNNs. Then, we propose GATT, an edge attribution calculation method built upon the computation tree. Through comprehensive experiments, we demonstrate the effectiveness of our proposed method when evaluating attributions from GATs. Conversely, we empirically validate that simply averaging attention weights over graph attention layers is insufficient to interpret the GAT model's behavior. Code is publicly available at https://github.com/jordan7186/GAtt/tree/main.

摘要:自注意力機制已被採用於多個廣泛使用的訊息傳遞神經網路 (MPNN)(例如 GAT),它可以自適應地控制沿著底層圖形邊緣流動的資訊量。這種注意力的使用使得此類模型成為可解釋 AI (XAI) 研究的基線,因為透過注意力的詮釋已在各種領域(例如自然語言處理和電腦視覺)中普及。然而,現有的研究通常使用天真的計算方法從注意力中推導出歸因分數,並且沒有考慮到邊緣歸因的精確且仔細的計算。在我們的研究中,我們旨在填補注意力啟用 MPNN 的廣泛使用與它們在很大程度上未被充分探索的可解釋性之間的差距,這個主題已在其他領域積極研究。為此,作為第一次嘗試,我們將 GNN 中注意力權重的邊緣歸因問題形式化。然後,我們提出 GATT,一種建立在計算樹上的邊緣歸因計算方法。透過全面的實驗,我們展示了我們提出的方法在評估 GAT 的歸因時所具有的效果。相反地,我們憑經驗驗證了僅對圖注意力層上的注意力權重取平均值不足以詮釋 GAT 模型的行為。程式碼已公開於 https://github.com/jordan7186/GAtt/tree/main。

Using Explainable AI for EEG-based Reduced Montage Neonatal Seizure Detection

2406.16908v3 by Dinuka Sandun Udayantha, Kavindu Weerasinghe, Nima Wickramasinghe, Akila Abeyratne, Kithmin Wickremasinghe, Jithangi Wanigasinghe, Anjula De Silva, Chamira U. S. Edussooriya

The neonatal period is the most vulnerable time for the development of seizures. Seizures in the immature brain lead to detrimental consequences, therefore require early diagnosis. The gold-standard for neonatal seizure detection currently relies on continuous video-EEG monitoring; which involves recording multi-channel electroencephalogram (EEG) alongside real-time video monitoring within a neonatal intensive care unit (NICU). However, video-EEG monitoring technology requires clinical expertise and is often limited to technologically advanced and resourceful settings. Cost-effective new techniques could help the medical fraternity make an accurate diagnosis and advocate treatment without delay. In this work, a novel explainable deep learning model to automate the neonatal seizure detection process with a reduced EEG montage is proposed, which employs convolutional nets, graph attention layers, and fully connected layers. Beyond its ability to detect seizures in real-time with a reduced montage, this model offers the unique advantage of real-time interpretability. By evaluating the performance on the Zenodo dataset with 10-fold cross-validation, the presented model achieves an absolute improvement of 8.31% and 42.86% in area under curve (AUC) and recall, respectively.

摘要:新生兒期是大腦發育最脆弱的時期,容易出現癲癇發作。大腦發育不成熟時出現癲癇發作會造成不良後果,因此需要及早診斷。目前新生兒癲癇發作的黃金標準依賴於連續的視訊腦電圖 (EEG) 監測;其中包括在新生兒加護病房 (NICU) 內同時進行多頻道腦電圖 (EEG) 記錄和即時視訊監控。然而,視訊腦電圖監控技術需要臨床專業知識,而且通常僅限於技術先進且資源豐富的環境。具成本效益的新技術可以幫助醫療界準確診斷並立即提倡治療。在這項工作中,提出了一個新穎的可解釋深度學習模型,以自動化新生兒癲癇發作偵測過程,並採用減少的腦電圖裝置,其中採用了卷積神經網路、圖形注意力層和全連接層。除了能夠使用減少的裝置即時偵測癲癇發作外,此模型還提供了即時可解釋性的獨特優勢。透過在 Zenodo 資料集上使用 10 倍交叉驗證評估效能,所提出的模型在曲線下面積 (AUC) 和召回率方面分別達到了 8.31% 和 42.86% 的絕對改善。

Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques

2406.00532v1 by Samita Bai, Sidra Nasir, Rizwan Ahmed Khan, Sheeraz Arif, Alexandre Meyer, Hubert Konik

Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies continue to permeate the healthcare sector, particularly in oncology, the need for transparent and interpretable models becomes imperative to enhance clinical decision-making and patient care. This review discusses the integration of various XAI approaches, such as SHAP, LIME, Grad-CAM, and others, with machine learning and deep learning models utilized in breast cancer detection and classification. By investigating the modalities of breast cancer datasets, including mammograms, ultrasounds and their processing with AI, the paper highlights how XAI can lead to more accurate diagnoses and personalized treatment plans. It also examines the challenges in implementing these techniques and the importance of developing standardized metrics for evaluating XAI's effectiveness in clinical settings. Through detailed analysis and discussion, this article aims to highlight the potential of XAI in bridging the gap between complex AI models and practical healthcare applications, thereby fostering trust and understanding among medical professionals and improving patient outcomes.

摘要:乳癌 (BC) 是影響全球女性最常見的惡性腫瘤之一,因此需要進步的診斷方法,以改善臨床結果。本文全面探討了可解釋人工智慧 (XAI) 技術在乳癌偵測和診斷中的應用。隨著人工智慧 (AI) 技術持續滲透醫療保健領域,特別是在腫瘤學中,透明且可解釋的模型需求變得勢在必行,以增強臨床決策制定和患者照護。此篇評論探討了各種 XAI 方法的整合,例如 SHAP、LIME、Grad-CAM 等,以及用於乳癌偵測和分類的機器學習和深度學習模型。透過探討乳癌資料集的模式,包括乳房攝影、超音波及其在 AI 中的處理,本文重點說明 XAI 如何能導致更準確的診斷和個人化治療計畫。它也探討了實施這些技術的挑戰,以及制定標準化評量指標以評估 XAI 在臨床環境中的有效性的重要性。透過詳細的分析和討論,本文旨在強調 XAI 在縮小複雜 AI 模型與實務醫療保健應用之間差距的潛力,進而促進醫療專業人員之間的信任與理解,並改善患者的結果。

Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition

2406.01624v2 by Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. The source code of this paper is publicly available via this https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition.

摘要:語音情緒辨識 (SER) 由於其在心理健康、教育和人機互動等多個應用領域而備受關注。然而,SER 系統的準確性受到高維特徵集的阻礙,這些特徵集可能包含不相關和冗餘的資訊。為了克服這個挑戰,本研究提出了一種用於 SER 的迭代特徵提升方法,該方法強調特徵相關性和可解釋性,以增強機器學習模型的效能。我們的做法涉及仔細的特徵選擇和分析,以建立高效的 SER 系統。為了透過模型可解釋性解決我們的核心問題,我們採用了具有 Shapley 值的特徵評估迴圈,以反覆改善特徵集。這個過程在模型效能和透明度之間取得平衡,這使得我們能夠全面了解模型的預測。所提出的方法提供了多項優點,包括識別和移除不相關和冗餘的特徵,從而建立更有效的模型。此外,它促進了可解釋性,有助於理解模型的預測以及識別情緒決定的關鍵特徵。所提出的方法的有效性已在多倫多情緒語音集 (TESS)、柏林情緒語音資料庫 (EMO-DB)、賴爾森音訊視覺情緒語音和歌曲資料庫 (RAVDESS) 和薩里音訊視覺表達情緒 (SAVEE) 資料集的 SER 基準上得到驗證,其效能優於現有方法。據我們所知,這是第一個將模型可解釋性納入 SER 架構的研究。本文的原始碼可透過此連結公開取得:https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition。

The Explanation Necessity for Healthcare AI

2406.00216v1 by Michail Mamalakis, Héloïse de Vareilles, Graham Murray, Pietro Lio, John Suckling

Explainability is often critical to the acceptable implementation of artificial intelligence (AI). Nowhere is this more important than healthcare where decision-making directly impacts patients and trust in AI systems is essential. This trust is often built on the explanations and interpretations the AI provides. Despite significant advancements in AI interpretability, there remains the need for clear guidelines on when and to what extent explanations are necessary in the medical context. We propose a novel categorization system with four distinct classes of explanation necessity, guiding the level of explanation required: patient or sample (local) level, cohort or dataset (global) level, or both levels. We introduce a mathematical formulation that distinguishes these categories and offers a practical framework for researchers to determine the necessity and depth of explanations required in medical AI applications. Three key factors are considered: the robustness of the evaluation protocol, the variability of expert observations, and the representation dimensionality of the application. In this perspective, we address the question: When does an AI medical application need to be explained, and at what level of detail?

摘要:可解释性通常对于人工智能 (AI) 的可接受实施至关重要。在医疗保健领域,这一点尤为重要,因为决策直接影响患者,并且对 AI 系统的信任至关重要。这种信任通常建立在 AI 提供的解释和诠释之上。尽管 AI 可解释性取得了重大进展,但仍然需要明确的指导方针,说明在医疗环境中何时以及在多大程度上需要解释。我们提出了一种新颖的分类系统,该系统具有四种不同的解释必要性类别,指导所需的解释级别:患者或样本(局部)级别、队列或数据集(全局)级别,或两个级别。我们引入了一个数学公式,该公式区分了这些类别,并为研究人员提供了一个实用框架,以确定医疗 AI 应用中所需的解释的必要性和深度。考虑了三个关键因素:评估协议的稳健性、专家观察的可变性以及应用程序的表示维数。从这个角度来看,我们解决了这个问题:AI 医疗应用何时需要解释,以及需要解释到何种程度?

Interdisciplinary Expertise to Advance Equitable Explainable AI

2406.18563v1 by Chloe R. Bennett, Heather Cole-Lewis, Stephanie Farquhar, Naama Haamel, Boris Babenko, Oran Lang, Mat Fleck, Ilana Traynis, Charles Lau, Ivor Horn, Courtney Lyles

The field of artificial intelligence (AI) is rapidly influencing health and healthcare, but bias and poor performance persists for populations who face widespread structural oppression. Previous work has clearly outlined the need for more rigorous attention to data representativeness and model performance to advance equity and reduce bias. However, there is an opportunity to also improve the explainability of AI by leveraging best practices of social epidemiology and health equity to help us develop hypotheses for associations found. In this paper, we focus on explainable AI (XAI) and describe a framework for interdisciplinary expert panel review to discuss and critically assess AI model explanations from multiple perspectives and identify areas of bias and directions for future research. We emphasize the importance of the interdisciplinary expert panel to produce more accurate, equitable interpretations which are historically and contextually informed. Interdisciplinary panel discussions can help reduce bias, identify potential confounders, and identify opportunities for additional research where there are gaps in the literature. In turn, these insights can suggest opportunities for AI model improvement.

摘要:人工智慧 (AI) 領域正快速影響著健康與醫療保健,但對於面臨廣泛結構性壓迫的人群來說,偏見和不良表現依然存在。先前的研究已清楚說明,需要更嚴格地注意資料代表性和模型效能,以促進公平性並減少偏見。然而,我們有機會透過運用社會流行病學和健康公平的最佳實務,來改善 AI 的可解釋性,以幫助我們針對發現的關聯性,發展假設。在本文中,我們專注於可解釋 AI (XAI),並描述一個跨領域專家小組審查架構,以從多重觀點討論和批判性評估 AI 模型的解釋,並找出偏見領域和未來研究的方向。我們強調跨領域專家小組對於產生更準確、公平的詮釋至關重要,而這些詮釋是根據歷史和脈絡而來的。跨領域小組討論有助於減少偏見、找出潛在的混淆因素,並在文獻中有缺口時找出額外研究的機會。反過來,這些見解可以建議 AI 模型改進的機會。

"It depends": Configuring AI to Improve Clinical Usefulness Across Contexts

2407.11978v1 by Hubert D. Zając, Jorge M. N. Ribeiro, Silvia Ingala, Simona Gentile, Ruth Wanjohi, Samuel N. Gitau, Jonathan F. Carlsen, Michael B. Nielsen, Tariq O. Andersen

Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in Denmark and Kenya, based on three iterations of a functional AI-based prototype. Ten sociotechnical dependencies were identified as crucial for the design of AI in radiology. We conceptualised four technical dimensions that must be configured to the intended clinical context of use: AI functionality, AI medical focus, AI decision threshold, and AI Explainability. We present four design recommendations on how to address dependencies pertaining to the medical knowledge, clinic type, user expertise level, patient context, and user situation that condition the configuration of these technical dimensions.

摘要:人工智慧(AI)在實驗室實驗中不斷地與放射科醫師匹敵或表現得更出色。然而,發現放射科 AI 為基礎系統的實際執行幾乎沒有提供臨床價值。本文探討如何為 AI 設計在不同情境中臨床上的效用。我們根據功能性 AI 為基礎原型的三次迭代,在丹麥和肯亞的 7 個臨床場域與 13 位放射科醫師進行了 19 次設計會議和設計介入。十個社會技術依賴關係被認為對於放射科中 AI 的設計至關重要。我們概念化了四個技術面向,必須根據預期的臨床使用情境進行設定:AI 功能、AI 醫療重點、AI 決策門檻,以及 AI 可解釋性。我們提出四項設計建議,說明如何處理與醫療知識、診所類型、使用者專業知識等級、患者情境,以及影響這些技術面向設定的使用者情境相關的依賴關係。

Improving Health Professionals' Onboarding with AI and XAI for Trustworthy Human-AI Collaborative Decision Making

2405.16424v1 by Min Hun Lee, Silvana Xin Yi Choo, Shamala D/O Thilarajah

With advanced AI/ML, there has been growing research on explainable AI (XAI) and studies on how humans interact with AI and XAI for effective human-AI collaborative decision-making. However, we still have a lack of understanding of how AI systems and XAI should be first presented to users without technical backgrounds. In this paper, we present the findings of semi-structured interviews with health professionals (n=12) and students (n=4) majoring in medicine and health to study how to improve onboarding with AI and XAI. For the interviews, we built upon human-AI interaction guidelines to create onboarding materials of an AI system for stroke rehabilitation assessment and AI explanations and introduce them to the participants. Our findings reveal that beyond presenting traditional performance metrics on AI, participants desired benchmark information, the practical benefits of AI, and interaction trials to better contextualize AI performance, and refine the objectives and performance of AI. Based on these findings, we highlight directions for improving onboarding with AI and XAI and human-AI collaborative decision-making.

摘要:隨著先進的 AI/ML,對可解釋 AI (XAI) 的研究不斷增加,以及關於人類如何與 AI 和 XAI 互動以進行有效的人工智慧協作決策制定。然而,我們仍然缺乏對 AI 系統和 XAI 應如何首先呈現給沒有技術背景的用戶的了解。在本文中,我們展示了與醫療專業人員 (n=12) 和主修醫學和健康的學生 (n=4) 進行半結構化訪談的結果,以研究如何改善 AI 和 XAI 的入門。對於訪談,我們建立在人機互動準則之上,為中風康復評估和 AI 解釋的 AI 系統創建入門材料,並將它們介紹給參與者。我們的研究結果表明,除了呈現傳統的 AI 性能指標外,參與者還希望基准信息、AI 的實際好處以及交互試驗,以更好地將 AI 性能情境化,並完善 AI 的目標和性能。根據這些發現,我們強調了改進 AI 和 XAI 以及人機協作決策制定的入門方向。

Exploring Nutritional Impact on Alzheimer's Mortality: An Explainable AI Approach

2405.17502v1 by Ziming Liu, Longjian Liu, Robert E. Heidel, Xiaopeng Zhao

This article uses machine learning (ML) and explainable artificial intelligence (XAI) techniques to investigate the relationship between nutritional status and mortality rates associated with Alzheimers disease (AD). The Third National Health and Nutrition Examination Survey (NHANES III) database is employed for analysis. The random forest model is selected as the base model for XAI analysis, and the Shapley Additive Explanations (SHAP) method is used to assess feature importance. The results highlight significant nutritional factors such as serum vitamin B12 and glycated hemoglobin. The study demonstrates the effectiveness of random forests in predicting AD mortality compared to other diseases. This research provides insights into the impact of nutrition on AD and contributes to a deeper understanding of disease progression.

摘要:本文使用機器學習 (ML) 和可解釋人工智慧 (XAI) 技術來探討營養狀況與阿茲海默症 (AD) 相關的死亡率之間的關係。採用第三次全國健康與營養檢查調查 (NHANES III) 資料庫進行分析。選擇隨機森林模型作為 XAI 分析的基礎模型,並使用 Shapley Additive Explanations (SHAP) 方法來評估特徵重要性。結果突顯了重要的營養因素,例如血清維生素 B12 和糖化血紅蛋白。該研究證明了隨機森林在預測 AD 死亡率方面相較於其他疾病的有效性。本研究提供了營養對 AD 的影響的見解,並有助於更深入地了解疾病的進展。

Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone

2407.11974v1 by Catalina Gomez, Ruolin Wang, Katharina Breininger, Corinne Casey, Chris Bradley, Mitchell Pavlak, Alex Pham, Jithin Yohannan, Mathias Unberath

Primary care providers are vital for initial triage and referrals to specialty care. In glaucoma, asymptomatic and fast progression can lead to vision loss, necessitating timely referrals to specialists. However, primary eye care providers may not identify urgent cases, potentially delaying care. Artificial Intelligence (AI) offering explanations could enhance their referral decisions. We investigate how various AI explanations help providers distinguish between patients needing immediate or non-urgent specialist referrals. We built explainable AI algorithms to predict glaucoma surgery needs from routine eyecare data as a proxy for identifying high-risk patients. We incorporated intrinsic and post-hoc explainability and conducted an online study with optometrists to assess human-AI team performance, measuring referral accuracy and analyzing interactions with AI, including agreement rates, task time, and user experience perceptions. AI support enhanced referral accuracy among 87 participants (59.9%/50.8% with/without AI), though Human-AI teams underperformed compared to AI alone. Participants believed they included AI advice more when using the intrinsic model, and perceived it more useful and promising. Without explanations, deviations from AI recommendations increased. AI support did not increase workload, confidence, and trust, but reduced challenges. On a separate test set, our black-box and intrinsic models achieved an accuracy of 77% and 71%, respectively, in predicting surgical outcomes. We identify opportunities of human-AI teaming for glaucoma management in primary eye care, noting that while AI enhances referral accuracy, it also shows a performance gap compared to AI alone, even with explanations. Human involvement remains essential in medical decision making, underscoring the need for future research to optimize collaboration, ensuring positive experiences and safe AI use.

摘要:初級保健提供者對於最初的分流和轉診到專科照護至關重要。在青光眼的情況下,無症狀且快速惡化可能導致視力喪失,因此需要及時轉診給專家。然而,初級眼科保健提供者可能無法識別緊急情況,可能會延誤照護。提供解釋的人工智慧 (AI) 可以加強他們的轉診決策。我們研究各種 AI 解釋如何幫助提供者區分需要立即或非緊急專科轉診的患者。我們建立了解釋性 AI 演算法,以從例行眼科護理資料預測青光眼手術需求,作為識別高風險患者的代理。我們納入了內在和事後解釋性,並與驗光師進行了一項線上研究,以評估人機團隊的表現,衡量轉診準確度並分析與 AI 的互動,包括同意率、任務時間和使用者體驗感知。在 87 名參與者中,AI 支援提高了轉診準確度(使用 AI/未使用的比例為 59.9%/50.8%),儘管人機團隊的表現不如單獨使用 AI。參與者認為他們在使用內在模型時更多地納入了 AI 建議,並認為它更有用且更有希望。沒有解釋,AI 建議的偏差會增加。AI 支援並未增加工作量、信心和信任,但減少了挑戰。在一個單獨的測試集中,我們的黑盒子和內在模型在預測手術結果方面分別達到了 77% 和 71% 的準確度。我們找出在初級眼科保健中,人機團隊合作管理青光眼的機會,並注意到雖然 AI 提高了轉診準確度,但即使有解釋,它也顯示出與單獨使用 AI 相比的效能差距。人類參與在醫療決策中仍然至關重要,這強調了未來研究優化協作、確保正面經驗和安全使用 AI 的必要性。

Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery

2406.18552v1 by Yingying Fang, Zihao Jin, Xiaodan Xing, Simon Walsh, Guang Yang

In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To bridge this gap, we propose an explainable model that is equipped with both decision reasoning and feature identification capabilities. Our approach not only detects influential image patterns but also uncovers the decisive features that drive the model's final predictions. By implementing our method, we can efficiently identify and visualise class-specific features leveraged by the data-driven model, providing insights into the decision-making processes of deep learning models. We validated our model in the demanding realm of medical prognosis task, demonstrating its efficacy and potential in enhancing the reliability of AI in healthcare and in discovering new knowledge in diseases where prognostic understanding is limited.

摘要:在醫學影像中,特別是在早期疾病檢測和預後任務中,辨別 AI 模型預測背後的原理對於評估其決策的可靠性至關重要。傳統的解釋方法在識別醫學影像分類中可識別的決定性特徵時面臨挑戰,其中區別性特徵很微妙或並不明顯。為了彌合這一差距,我們提出了一個可解釋的模型,該模型具備決策推理和特徵識別能力。我們的做法不僅檢測有影響力的影像模式,還揭示了推動模型最終預測的決定性特徵。通過實施我們的模型,我們可以有效識別和視覺化由數據驅動模型利用的類特定特徵,從而深入了解深度學習模型的決策過程。我們在要求嚴格的醫學預後任務領域驗證了我們的模型,展示了其在提高 AI 在醫療保健中的可靠性和發現預後理解受限疾病的新知識方面的功效和潛力。

The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach

2405.13099v1 by Mohsen Jozani, Jason A. Williams, Ahmed Aleroud, Sarbottam Bhagat

This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded in informational support exchanges, demonstrating the importance of emotion in providing informational support. This complex interplay between emotional and informational support has not been previously researched. The study refines social support theory and lays the groundwork for the development of user decision aids. Further implications are discussed.

摘要:本研究探討線上健康社群中尋求資訊支持的問題、回應,以及有幫助的評分之間的關係。我們建立了一組標記的問答配對資料集,並開發了多模態機器學習和深度學習模型,以可靠地預測資訊支持問題和回應。我們採用可解釋的 AI 來揭示資訊支持交流中蘊含的情緒,證明情緒在提供資訊支持中的重要性。這種情緒支持和資訊支持之間的複雜交互作用以前並未被研究過。本研究改進了社會支持理論,並為使用者決策輔助工具的開發奠定了基礎。討論了進一步的影響。

ChatGPT in Classrooms: Transforming Challenges into Opportunities in Education

2405.10645v1 by Harris Bin Munawar, Nikolaos Misirlis

In the era of exponential technology growth, one unexpected guest has claimed a seat in classrooms worldwide, Artificial Intelligence. Generative AI, such as ChatGPT, promises a revolution in education, yet it arrives with a double-edged sword. Its potential for personalized learning is offset by issues of cheating, inaccuracies, and educators struggling to incorporate it effectively into their lesson design. We are standing on the brink of this educational frontier, and it is clear that we need to navigate this terrain with a lot of care. This is a major challenge that could undermine the integrity and value of our educational process. So, how can we turn these challenges into opportunities? When used inappropriately, AI tools can become the perfect tool for the cut copy paste mentality, and quickly begin to corrode critical thinking, creativity, and deep understanding, the most important skills in our rapidly changing world. Teachers feel that they are not equipped to leverage this technology, widening the digital divide among educators and institutions. Addressing these concerns calls for an in depth research approach. We will employ empirical research, drawing on the Technology Acceptance Model, to assess the attitudes toward generative AI among educators and students. Understanding their perceptions, usage patterns, and hurdles is the first crucial step in creating an effective solution. The present study will be used as a process manual for future researchers to apply, running their own data, based on the steps explained here

摘要:在科技飛速發展的時代,一位意外的訪客已在全球教室中佔有一席之地,那就是人工智慧。生成式 AI,例如 ChatGPT,承諾在教育領域掀起一場革命,但它卻是一把雙面刃。它在個人化學習方面的潛力,卻因作弊、不準確以及教育工作者難以將其有效融入教學設計等問題而抵銷。我們正站在這教育前沿的邊緣,顯然我們需要非常小心地探索這片領域。這是一個重大的挑戰,可能會損害我們教育過程的完整性和價值。那麼,我們如何將這些挑戰轉化為機遇?當不適當地使用時,AI 工具可能會成為複製貼上心態的完美工具,並迅速腐蝕批判性思維、創造力和深入理解,這些都是我們快速變化的世界中最重要的技能。教師們覺得他們沒有能力利用這項技術,這擴大了教育工作者和機構之間的數位鴻溝。解決這些問題需要深入的研究方法。我們將採用實證研究,借鑑技術接受模型,來評估教育工作者和學生對生成式 AI 的態度。了解他們的看法、使用模式和障礙是創造有效解決方案的第一個關鍵步驟。本研究將作為未來研究人員應用的流程手冊,根據此處說明的步驟運行他們自己的數據

Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data

2405.07590v1 by Camelia Oprea, Mike Grüne, Mateusz Buglowski, Lena Olivier, Thorsten Orlikowsky, Stefan Kowalewski, Mark Schoberer, André Stollenwerk

With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.

摘要:隨著醫療保健系統的數位化,人工智慧在醫學領域中變得更加普及。特別是機器學習在時間序列分類等複雜任務中展現出極大的潛力,但通常是以透明度和可理解性為代價。這導致人類缺乏信任,從而阻礙了其積極使用。可解釋的人工智慧試圖通過提供對決策過程的洞察來彌補這一差距,但其不同方法的實際效用尚不清楚。本文提出了一個基於使用者研究的評估,其中包含了 Grad-CAM 解釋方法,並將其應用於神經網路以分類時間序列新生兒呼吸數據中的呼吸。我們展示了不同利益相關者對可解釋性方法的感知效用,揭示了實現實際透明度的難度,以及許多參與者希望獲得更深入的解釋。

XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

2405.06270v3 by Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia, Eugenio di Sciascio

The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications.

摘要:大型語言模型 (LLM) 與醫療診斷整合 為臨床決策提供了一個有前景的途徑。本研究概述了一種新穎方法的開發,用於零次學習/少量學習情境學習 (ICL),方法是使用多層結構化提示整合醫療領域知識。我們還探討了使用者與 LLM 之間兩種溝通方式的功效:數值對話 (NC) 方式,它會逐步處理資料,以及自然語言單回合 (NL-ST) 方式,它會使用長篇敘事提示。 我們的研究系統性地評估了診斷準確性和風險因子,包括性別偏見和假陰性率,使用了一個包含 920 個患者記錄的資料集,採用各種少量學習情境。結果表明,傳統的臨床機器學習 (ML) 模型通常在零次學習和少量學習設定中表現優於 LLM。然而,當使用少量學習範例以及有效的可解釋 AI (XAI) 方法作為領域知識來源時,效能差距會顯著縮小。此外,隨著時間充足和範例數量增加,對話方式 (NC) 幾乎可以媲美 ML 模型的效能。最值得注意的是,LLM 相對於 ML 模型展現出相當或更佳的成本敏感準確度。 本研究證實,透過適當的領域知識和量身打造的溝通策略,LLM 可以顯著增強診斷程序。這些發現突顯了最佳化訓練範例數量和溝通方式的重要性,以提高準確度並減少 LLM 應用中的偏差。

To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems

2405.05766v1 by Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Manuel González-Hidalgo, Maria Gemma Sempere Campello, Juan Antonio Palmer Sancho

The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel approach for measuring user trust in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted a case study in a realistic medical scenario: the usage of XAI system for the detection of pneumonia from x-ray images.

摘要:隨著對深度學習模型依賴性的增加,加上其固有的透明度不足,促使一個新的研究領域發展,稱為可解釋 AI (XAI) 方法。這些方法旨在透過深入了解決策背後的原理,來提升最終使用者對自動化系統的信賴。本文提出了一種衡量使用者對 XAI 系統信賴度的新穎方法,允許對其進行改進。我們提出的指標結合了客觀觀點下的效能指標和信賴指標。為了驗證這個新穎的方法,我們在一個真實的醫療場景中進行了一個案例研究:使用 XAI 系統從 X 光影像中偵測肺炎。

Region-specific Risk Quantification for Interpretable Prognosis of COVID-19

2405.02815v1 by Zhusi Zhong, Jie Li, Zhuoqi Ma, Scott Collins, Harrison Bai, Paul Zhang, Terrance Healey, Xinbo Gao, Michael K. Atalay, Zhicheng Jiao

The COVID-19 pandemic has strained global public health, necessitating accurate diagnosis and intervention to control disease spread and reduce mortality rates. This paper introduces an interpretable deep survival prediction model designed specifically for improved understanding and trust in COVID-19 prognosis using chest X-ray (CXR) images. By integrating a large-scale pretrained image encoder, Risk-specific Grad-CAM, and anatomical region detection techniques, our approach produces regional interpretable outcomes that effectively capture essential disease features while focusing on rare but critical abnormal regions. Our model's predictive results provide enhanced clarity and transparency through risk area localization, enabling clinicians to make informed decisions regarding COVID-19 diagnosis with better understanding of prognostic insights. We evaluate the proposed method on a multi-center survival dataset and demonstrate its effectiveness via quantitative and qualitative assessments, achieving superior C-indexes (0.764 and 0.727) and time-dependent AUCs (0.799 and 0.691). These results suggest that our explainable deep survival prediction model surpasses traditional survival analysis methods in risk prediction, improving interpretability for clinical decision making and enhancing AI system trustworthiness.

摘要:COVID-19 疫情對全球公共衛生造成壓力,必須進行準確的診斷和干預,以控制疾病傳播並降低死亡率。本文介紹了一個可解釋的深度生存預測模型,專門設計用於透過胸部 X 光 (CXR) 影像改善對 COVID-19 預後的理解和信賴。透過整合大規模預訓練影像編碼器、風險特定 Grad-CAM 和解剖區域偵測技術,我們的做法產生區域可解釋的結果,有效捕捉必要的疾病特徵,同時專注於罕見但關鍵的異常區域。我們的模型預測結果透過風險區域定位提供增強的清晰度和透明度,讓臨床醫生能夠在更了解預後見解的情況下,就 COVID-19 診斷做出明智的決策。我們在多中心生存資料集上評估所提出的方法,並透過量化和質化評估證明其有效性,達到優異的 C 指數(0.764 和 0.727)和時間相關 AUC(0.799 和 0.691)。這些結果表明,我們可解釋的深度生存預測模型在風險預測方面超越傳統的生存分析方法,提升臨床決策的解釋性,並增強 AI 系統的信賴度。

Rad4XCNN: a new agnostic method for post-hoc global explanation of CNN-derived features by means of radiomics

2405.02334v1 by Francesco Prinzi, Carmelo Militello, Calogero Zarcaro, Tommaso Vincenzo Bartolotta, Salvatore Gaglio, Salvatore Vitabile

In the last years, artificial intelligence (AI) in clinical decision support systems (CDSS) played a key role in harnessing machine learning and deep learning architectures. Despite their promising capabilities, the lack of transparency and explainability of AI models poses significant challenges, particularly in medical contexts where reliability is a mandatory aspect. Achieving transparency without compromising predictive accuracy remains a key challenge. This paper presents a novel method, namely Rad4XCNN, to enhance the predictive power of CNN-derived features with the interpretability inherent in radiomic features. Rad4XCNN diverges from conventional methods based on saliency map, by associating intelligible meaning to CNN-derived features by means of Radiomics, offering new perspectives on explanation methods beyond visualization maps. Using a breast cancer classification task as a case study, we evaluated Rad4XCNN on ultrasound imaging datasets, including an online dataset and two in-house datasets for internal and external validation. Some key results are: i) CNN-derived features guarantee more robust accuracy when compared against ViT-derived and radiomic features; ii) conventional visualization map methods for explanation present several pitfalls; iii) Rad4XCNN does not sacrifice model accuracy for their explainability; iv) Rad4XCNN provides global explanation insights enabling the physician to analyze the model outputs and findings. In addition, we highlight the importance of integrating interpretability into AI models for enhanced trust and adoption in clinical practice, emphasizing how our method can mitigate some concerns related to explainable AI methods.

摘要:在過去幾年,臨床決策支援系統 (CDSS) 中的人工智慧 (AI) 在利用機器學習和深度學習架構方面發揮了關鍵作用。儘管 AI 模型具有令人滿意的能力,但缺乏透明度和可解釋性,特別是在可靠性為必要考量的醫療背景下,這帶來了重大的挑戰。在不影響預測精準度的情況下實現透明度仍然是一項關鍵挑戰。本文提出了一種新方法,即 Rad4XCNN,以增強 CNN 衍生特徵的預測能力,同時具備放射特徵固有的可解釋性。Rad4XCNN 不同於基於顯著性圖的傳統方法,它通過放射組學將可理解的含義與 CNN 衍生特徵關聯起來,為超越視覺化圖表的解釋方法提供了新的觀點。我們以乳癌分類任務作為案例研究,在超音波影像資料集上評估 Rad4XCNN,包括一個線上資料集和兩個用於內部和外部驗證的內部資料集。一些關鍵結果如下:i) 與 ViT 衍生特徵和放射特徵相比,CNN 衍生特徵保證了更穩健的準確度;ii) 傳統的視覺化圖解釋方法存在一些缺陷;iii) Rad4XCNN 沒有犧牲模型準確度來換取其可解釋性;iv) Rad4XCNN 提供了全局解釋見解,使醫師能夠分析模型輸出和發現。此外,我們強調將可解釋性整合到 AI 模型中對於增強臨床實務中的信任和採用至關重要,並強調了我們的方法如何能緩解與可解釋 AI 方法相關的一些疑慮。

Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability

2404.16957v1 by Yunfei Ge, Quanyan Zhu

The pervasive integration of Artificial Intelligence (AI) has introduced complex challenges in the responsibility and accountability in the event of incidents involving AI-enabled systems. The interconnectivity of these systems, ethical concerns of AI-induced incidents, coupled with uncertainties in AI technology and the absence of corresponding regulations, have made traditional responsibility attribution challenging. To this end, this work proposes a Computational Reflective Equilibrium (CRE) approach to establish a coherent and ethically acceptable responsibility attribution framework for all stakeholders. The computational approach provides a structured analysis that overcomes the limitations of conceptual approaches in dealing with dynamic and multifaceted scenarios, showcasing the framework's explainability, coherence, and adaptivity properties in the responsibility attribution process. We examine the pivotal role of the initial activation level associated with claims in equilibrium computation. Using an AI-assisted medical decision-support system as a case study, we illustrate how different initializations lead to diverse responsibility distributions. The framework offers valuable insights into accountability in AI-induced incidents, facilitating the development of a sustainable and resilient system through continuous monitoring, revision, and reflection.

摘要:隨著人工智慧 (AI) 的普及整合,在涉及 AI 驅動系統的事故中,責任和義務歸屬產生了複雜的挑戰。這些系統的互連性、AI 引發事故的倫理問題,加上 AI 技術的不確定性和缺乏相應法規,使得傳統責任歸屬面臨挑戰。為此,本研究提出了一種計算反思均衡 (CRE) 方法,以建立一個連貫且在倫理上可接受的責任歸屬架構,適用於所有利害關係人。計算方法提供了結構化的分析,克服了概念方法在處理動態且多面向情境時的限制,展示了該架構在責任歸屬過程中具備的可解釋性、連貫性和適應性。我們探討了與均衡計算中索賠相關的初始啟動層級的關鍵作用。我們以 AI 輔助醫療決策支援系統為案例研究,說明不同的初始化如何導致不同的責任分配。該架構提供了對 AI 引發事故中問責制的寶貴見解,透過持續監控、修訂和反思,促進了永續且有韌性的系統發展。

Explainable AI for Fair Sepsis Mortality Predictive Model

2404.13139v1 by Chia-Hsuan Chang, Xiaoyang Wang, Christopher C. Yang

Artificial intelligence supports healthcare professionals with predictive modeling, greatly transforming clinical decision-making. This study addresses the crucial need for fairness and explainability in AI applications within healthcare to ensure equitable outcomes across diverse patient demographics. By focusing on the predictive modeling of sepsis-related mortality, we propose a method that learns a performance-optimized predictive model and then employs the transfer learning process to produce a model with better fairness. Our method also introduces a novel permutation-based feature importance algorithm aiming at elucidating the contribution of each feature in enhancing fairness on predictions. Unlike existing explainability methods concentrating on explaining feature contribution to predictive performance, our proposed method uniquely bridges the gap in understanding how each feature contributes to fairness. This advancement is pivotal, given sepsis's significant mortality rate and its role in one-third of hospital deaths. Our method not only aids in identifying and mitigating biases within the predictive model but also fosters trust among healthcare stakeholders by improving the transparency and fairness of model predictions, thereby contributing to more equitable and trustworthy healthcare delivery.

摘要:人工智慧透過預測模型協助醫療專業人員,大幅轉變了臨床決策制定。本研究探討了在醫療保健中使用人工智慧應用程式時公平性和可解釋性的關鍵需求,以確保在不同的患者人口統計資料中獲得公平的結果。透過專注於敗血症相關死亡率的預測模型,我們提出了一種方法,該方法會學習一個效能最佳化的預測模型,然後採用轉移學習過程來產生一個具有更好公平性的模型。我們的模型還引入了一種新穎的基於排列的特徵重要性演算法,旨在闡明每個特徵在增強預測公平性方面的貢獻。與現有的可解釋性方法專注於解釋特徵對預測效能的貢獻不同,我們提出的方法獨特地彌補了理解每個特徵如何有助於公平性的差距。這項進展至關重要,因為敗血症的死亡率很高,且在三分之一的醫院死亡中扮演著角色。我們的模型不僅有助於識別和減輕預測模型中的偏差,還能透過提高模型預測的透明度和公平性來培養醫療保健利益相關者之間的信任,進而有助於提供更公平且值得信賴的醫療保健服務。

Multi Class Depression Detection Through Tweets using Artificial Intelligence

2404.13104v1 by Muhammad Osama Nusrat, Waseem Shahzad, Saad Ahmed Jamal

Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc. These platforms contain valuable information which can be used for research purposes. Considerable research has been conducted across various social media platforms. However, certain limitations persist in these endeavors. Particularly, previous studies were only focused on detecting depression and the intensity of depression in tweets. Also, there existed inaccuracies in dataset labeling. In this research work, five types of depression (Bipolar, major, psychotic, atypical, and postpartum) were predicted using tweets from the Twitter database based on lexicon labeling. Explainable AI was used to provide reasoning by highlighting the parts of tweets that represent type of depression. Bidirectional Encoder Representations from Transformers (BERT) was used for feature extraction and training. Machine learning and deep learning methodologies were used to train the model. The BERT model presented the most promising results, achieving an overall accuracy of 0.96.

摘要:現今,憂鬱症是一個重要的議題。根據世界衛生組織 (WHO) 的資料,在 2023 年,超過 2.8 億人正在與憂鬱症搏鬥。這是一個龐大的數字;如果不認真看待,這些數字將會快速增加。大約有 48.9 億人是社群媒體使用者。人們在 Twitter、Facebook、Reddit、Instagram 等平台上表達自己的感受和情緒。這些平台包含有價值的資訊,可用於研究目的。已經在各種社群媒體平台上進行了大量的研究。然而,這些努力仍存在某些限制。特別是,先前的研究僅專注於偵測推文中的憂鬱症和憂鬱症的強度。此外,資料集標籤中存在不準確的情況。在這項研究工作中,使用基於詞彙標籤的 Twitter 資料庫中的推文預測了五種類型的憂鬱症(雙極型、重度、精神病型、非典型和產後)。可解釋的 AI 用於透過強調代表憂鬱症類型的推文部分來提供推理。從 Transformers(BERT)中提取的雙向編碼器表示用於特徵提取和訓練。機器學習和深度學習方法用於訓練模型。BERT 模型呈現出最有希望的結果,達到 0.96 的整體準確度。

COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images

2404.12832v2 by Dmytro Shvetsov, Joonas Ariva, Marharyta Domnich, Raul Vicente, Dmytro Fishman

Deep learning is dramatically transforming the field of medical imaging and radiology, enabling the identification of pathologies in medical images, including computed tomography (CT) and X-ray scans. However, the performance of deep learning models, particularly in segmentation tasks, is often limited by the need for extensive annotated datasets. To address this challenge, the capabilities of weakly supervised semantic segmentation are explored through the lens of Explainable AI and the generation of counterfactual explanations. The scope of this research is development of a novel counterfactual inpainting approach (COIN) that flips the predicted classification label from abnormal to normal by using a generative model. For instance, if the classifier deems an input medical image X as abnormal, indicating the presence of a pathology, the generative model aims to inpaint the abnormal region, thus reversing the classifier's original prediction label. The approach enables us to produce precise segmentations for pathologies without depending on pre-existing segmentation masks. Crucially, image-level labels are utilized, which are substantially easier to acquire than creating detailed segmentation masks. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia. The findings indicate that COIN greatly surpasses established attribution methods, such as RISE, ScoreCAM, and LayerCAM, as well as an alternative counterfactual explanation method introduced by Singla et al. This evidence suggests that COIN is a promising approach for semantic segmentation of tumors in CT images, and presents a step forward in making deep learning applications more accessible and effective in healthcare, where annotated data is scarce.

摘要:深度学习正大幅轉變醫學影像和放射線學領域,能辨識醫學影像中的病理,包括電腦斷層掃描 (CT) 和 X 光掃描。然而,深度學習模型的效能,特別是在分割任務中,常常受到廣泛註解資料集需求的限制。為了應對此挑戰,透過可解釋 AI 和反事實解釋的產生,探索弱監督語意分割的能力。本研究的範圍是開發一種新的反事實內插方法 (COIN),該方法使用生成模型將預測的分類標籤從異常翻轉為正常。例如,如果分類器將輸入的醫學影像 X 視為異常,表示存在病理,則生成模型旨在內插異常區域,從而逆轉分類器的原始預測標籤。此方法使我們能夠產生病理的精確分割,而無需依賴於預先存在的分割遮罩。至關重要的是,利用影像層級標籤,這比建立詳細的分割遮罩容易取得。該方法的有效性透過分割合成目標和從愛沙尼亞塔爾圖大學醫院取得的 CT 影像中的實際腎臟腫瘤來證明。研究結果表明,COIN 遠遠超過已建立的歸因方法,例如 RISE、ScoreCAM 和 LayerCAM,以及 Singla 等人提出的另一種反事實解釋方法。此證據表明,COIN 是一種很有前途的 CT 影像中腫瘤語意分割方法,並在醫療保健中讓深度學習應用更易於取得和更有效率邁進一步,其中註解資料很稀少。

Hybrid Intelligence for Digital Humanities

2406.15374v1 by Victor de Boer, Lise Stork

In this paper, we explore the synergies between Digital Humanities (DH) as a discipline and Hybrid Intelligence (HI) as a research paradigm. In DH research, the use of digital methods and specifically that of Artificial Intelligence is subject to a set of requirements and constraints. We argue that these are well-supported by the capabilities and goals of HI. Our contribution includes the identification of five such DH requirements: Successful AI systems need to be able to 1) collaborate with the (human) scholar; 2) support data criticism; 3) support tool criticism; 4) be aware of and cater to various perspectives and 5) support distant and close reading. We take the CARE principles of Hybrid Intelligence (collaborative, adaptive, responsible and explainable) as theoretical framework and map these to the DH requirements. In this mapping, we include example research projects. We finally address how insights from DH can be applied to HI and discuss open challenges for the combination of the two disciplines.

摘要:在本文中,我們探討數位人文學科 (DH) 作為一門學科與混合智能 (HI) 作為一個研究典範之間的協同作用。在 DH 研究中,數位方法的使用,特別是人工智慧的使用,受到一系列要求和限制。我們認為這些要求和限制獲得 HI 的能力和目標的充分支持。我們的貢獻包括找出五個這樣的 DH 要求:成功的 AI 系統需要能夠 1) 與(人類)學者合作;2) 支援資料批評;3) 支援工具批評;4) 察覺並迎合各種觀點;5) 支援遠距和近距離閱讀。我們將混合智能的 CARE 原則(協作、適應、負責和可解釋)作為理論架構,並將這些原則對應到 DH 要求。在此對應中,我們納入範例研究專案。最後,我們探討如何將 DH 的見解應用於 HI,並討論結合這兩個學科的開放挑戰。

Ethical Framework for Responsible Foundational Models in Medical Imaging

2406.11868v1 by Abhijit Das, Debesh Jha, Jasmer Sanjotra, Onkar Susladkar, Suramyaa Sarkar, Ashish Rauniyar, Nikhil Tomar, Vanshali Sharma, Ulas Bagci

Foundational models (FMs) have tremendous potential to revolutionize medical imaging. However, their deployment in real-world clinical settings demands extensive ethical considerations. This paper aims to highlight the ethical concerns related to FMs and propose a framework to guide their responsible development and implementation within medicine. We meticulously examine ethical issues such as privacy of patient data, bias mitigation, algorithmic transparency, explainability and accountability. The proposed framework is designed to prioritize patient welfare, mitigate potential risks, and foster trust in AI-assisted healthcare.

摘要:基礎模型 (FM) 具有徹底改變醫學影像的巨大潛力。然而,它們在現實世界臨床環境中的部署需要廣泛的倫理考量。本文旨在強調與 FM 相關的倫理問題,並提出一個框架來指導它們在醫學中的負責任開發和實施。我們仔細審查了倫理問題,例如患者數據隱私、偏差緩解、演算法透明度、可解釋性和問責制。所提出的框架旨在優先考慮患者福利、減輕潛在風險,並培養對 AI 輔助醫療保健的信任。

Advancements in Radiomics and Artificial Intelligence for Thyroid Cancer Diagnosis

2404.07239v1 by Milad Yousefi, Shadi Farabi Maleki, Ali Jafarizadeh, Mahya Ahmadpour Youshanlui, Aida Jafari, Siamak Pedrammehr, Roohallah Alizadehsani, Ryszard Tadeusiewicz, Pawel Plawiak

Thyroid cancer is an increasing global health concern that requires advanced diagnostic methods. The application of AI and radiomics to thyroid cancer diagnosis is examined in this review. A review of multiple databases was conducted in compliance with PRISMA guidelines until October 2023. A combination of keywords led to the discovery of an English academic publication on thyroid cancer and related subjects. 267 papers were returned from the original search after 109 duplicates were removed. Relevant studies were selected according to predetermined criteria after 124 articles were eliminated based on an examination of their abstract and title. After the comprehensive analysis, an additional six studies were excluded. Among the 28 included studies, radiomics analysis, which incorporates ultrasound (US) images, demonstrated its effectiveness in diagnosing thyroid cancer. Various results were noted, some of the studies presenting new strategies that outperformed the status quo. The literature has emphasized various challenges faced by AI models, including interpretability issues, dataset constraints, and operator dependence. The synthesized findings of the 28 included studies mentioned the need for standardization efforts and prospective multicenter studies to address these concerns. Furthermore, approaches to overcome these obstacles were identified, such as advances in explainable AI technology and personalized medicine techniques. The review focuses on how AI and radiomics could transform the diagnosis and treatment of thyroid cancer. Despite challenges, future research on multidisciplinary cooperation, clinical applicability validation, and algorithm improvement holds the potential to improve patient outcomes and diagnostic precision in the treatment of thyroid cancer.

摘要:甲狀腺癌是一種日益嚴重的全球健康問題,需要先進的診斷方法。本篇評論探討了人工智能與放射特徵分析在甲狀腺癌診斷中的應用。在符合 PRISMA 指南的情況下,對多個資料庫進行了回顧,直到 2023 年 10 月。通過結合關鍵字,發現了一篇關於甲狀腺癌和相關主題的英文學術出版物。在移除 109 篇重複文獻後,原始搜尋共回傳 267 篇論文。在根據預先確定的標準,淘汰了 124 篇文章的摘要和標題後,選出了相關研究。在進行全面分析後,額外排除了六項研究。在納入的 28 項研究中,結合超音波 (US) 影像的放射特徵分析,證明了其在診斷甲狀腺癌方面的有效性。研究結果不一,有些研究提出了優於現狀的新策略。文獻強調了人工智能模型面臨的各種挑戰,包括可解釋性問題、資料集限制和操作員依賴性。28 項納入研究的綜合發現提到,需要標準化工作和前瞻性多中心研究來解決這些問題。此外,還確定了克服這些障礙的方法,例如可解釋人工智能技術和個人化醫療技術的進步。本篇評論重點探討了人工智能和放射特徵分析如何轉變甲狀腺癌的診斷和治療。儘管存在挑戰,但未來對多學科合作、臨床適用性驗證和演算法改進的研究,仍有潛力改善甲狀腺癌治療中的患者預後和診斷精準度。

Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI

2404.04686v1 by Taminul Islam, Md. Alif Sheakh, Mst. Sazia Tahosin, Most. Hasna Hena, Shopnil Akash, Yousef A. Bin Jardan, Gezahign Fentahun Wondmie, Hiba-Allah Nafidi, Mohammed Bourhia

Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.

摘要:近年來,乳癌的盛行率迅速增加,使其成為全球主要的死亡原因之一。在所有癌症中,乳癌迄今為止是最常見的。手動診斷此疾病需要大量的時間和專業知識。由於乳癌的檢測過程耗時,因此透過建立機器學習模型來預測,有助於防止其進一步擴散。機器學習和可解釋 AI 在分類中至關重要,因為它們不僅可以提供準確的預測,還可以深入了解模型如何做出決策,有助於理解和信賴分類結果。在此研究中,我們評估並比較了五種不同的機器學習方法的分類準確度、精確度、召回率和 F1 分數,使用了一個主要的資料集(達卡醫學院醫院的 500 名患者)。五種不同的監督式機器學習技術,包括決策樹、隨機森林、邏輯迴歸、朴素貝氏和 XGBoost,已用於在我們的資料集上取得最佳結果。此外,本研究將 SHAP 分析應用於 XGBoost 模型,以解釋模型的預測並了解每個特徵對模型輸出的影響。我們比較了幾種演算法對資料進行分類的準確度,並與該領域的其他文獻進行對比。在最後評估後,本研究發現 XGBoost 達到了最佳的模型準確度,為 97%。

Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI

2404.03892v3 by Maryam Ahmed, Tooba Bibi, Rizwan Ahmed Khan, Sidra Nasir

The Deep learning (DL) models for diagnosing breast cancer from mammographic images often operate as "black boxes", making it difficult for healthcare professionals to trust and understand their decision-making processes. The study presents an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cancer using the CBIS-DDSM dataset. The methodology encompasses an elaborate data preprocessing pipeline and advanced data augmentation techniques to counteract dataset limitations and transfer learning using pre-trained networks such as VGG-16, Inception-V3 and ResNet was employed. A focal point of our study is the evaluation of XAI's effectiveness in interpreting model predictions, highlighted by utilizing the Hausdorff measure to assess the alignment between AI-generated explanations and expert annotations quantitatively. This approach is critical for XAI in promoting trustworthiness and ethical fairness in AI-assisted diagnostics. The findings from our research illustrate the effective collaboration between CNNs and XAI in advancing diagnostic methods for breast cancer, thereby facilitating a more seamless integration of advanced AI technologies within clinical settings. By enhancing the interpretability of AI driven decisions, this work lays the groundwork for improved collaboration between AI systems and medical practitioners, ultimately enriching patient care. Furthermore, the implications of our research extended well beyond the current methodologies. It encourages further research into how to combine multimodal data and improve AI explanations to meet the needs of clinical practice.

摘要:深度學習 (DL) 用於從乳房攝影術影像診斷乳癌的模型通常以「黑盒子」方式運作,這使得醫療保健專業人員難以信任和理解其決策過程。本研究提出一個整合架構,結合卷積神經網路 (CNN) 和可解釋人工智慧 (XAI),以使用 CBIS-DDSM 資料集增強乳癌的診斷。方法包含一個精細的資料前處理管線和進階資料擴充技術,以對抗資料集限制,並採用預先訓練的網路(例如 VGG-16、Inception-V3 和 ResNet)進行遷移學習。我們研究的重點是評估 XAI 在解釋模型預測中的有效性,重點利用豪斯多夫測度量化評估 AI 生成的解釋和專家註解之間的一致性。這種方法對於 XAI 在促進 AI 輔助診斷中的可信度和倫理公平性至關重要。我們研究的發現說明了 CNN 和 XAI 在推進乳癌診斷方法中的有效協作,從而促進了先進 AI 技術在臨床環境中的更順暢整合。透過增強 AI 驅動決策的可解釋性,這項工作為 AI 系統和醫療從業人員之間的改善協作奠定了基礎,最終豐富了患者照護。此外,我們研究的影響遠遠超出了目前的技術。它鼓勵進一步研究如何結合多模式資料並改善 AI 解釋,以滿足臨床實務的需求。

Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives

2404.00320v2 by Xingrui Gu, Zhixuan Wang, Irisa Jin, Zekun Wu

This research presents a novel multimodal data fusion methodology for pain behavior recognition, integrating statistical correlation analysis with human-centered insights. Our approach introduces two key innovations: 1) integrating data-driven statistical relevance weights into the fusion strategy to effectively utilize complementary information from heterogeneous modalities, and 2) incorporating human-centric movement characteristics into multimodal representation learning for detailed modeling of pain behaviors. Validated across various deep learning architectures, our method demonstrates superior performance and broad applicability. We propose a customizable framework that aligns each modality with a suitable classifier based on statistical significance, advancing personalized and effective multimodal fusion. Furthermore, our methodology provides explainable analysis of multimodal data, contributing to interpretable and explainable AI in healthcare. By highlighting the importance of data diversity and modality-specific representations, we enhance traditional fusion techniques and set new standards for recognizing complex pain behaviors. Our findings have significant implications for promoting patient-centered healthcare interventions and supporting explainable clinical decision-making.

摘要:本研究提出了一種創新的多模態數據融合方法,用於疼痛行為識別,將統計相關分析與以人為中心的見解相結合。我們的做法引入了兩項關鍵創新:1) 將數據驅動的統計相關權重整合到融合策略中,以有效利用來自異質模態的補充信息,以及 2) 將以人為中心的運動特徵納入多模態表示學習中,以詳細建模疼痛行為。我們的模型在各種深度學習架構中得到驗證,展示了卓越的性能和廣泛的適用性。我們提出了一個可自定義的框架,根據統計顯著性將每個模態與合適的分類器對齊,推進個性化和有效的多模態融合。此外,我們的模型提供對多模態數據的可解釋分析,有助於醫療保健中的可解釋和可解釋 AI。通過強調數據多樣性和模態特定表示的重要性,我們增強了傳統的融合技術,並為識別複雜的疼痛行為設定了新的標準。我們的發現對促進以患者為中心的醫療保健干預和支持可解釋的臨床決策制定具有重要意義。

Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach

2403.17873v1 by Andrea Ferrario, Alberto Termine, Alessandro Facchini

Human-centered explainable AI (HCXAI) advocates for the integration of social aspects into AI explanations. Central to the HCXAI discourse is the Social Transparency (ST) framework, which aims to make the socio-organizational context of AI systems accessible to their users. In this work, we suggest extending the ST framework to address the risks of social misattributions in Large Language Models (LLMs), particularly in sensitive areas like mental health. In fact LLMs, which are remarkably capable of simulating roles and personas, may lead to mismatches between designers' intentions and users' perceptions of social attributes, risking to promote emotional manipulation and dangerous behaviors, cases of epistemic injustice, and unwarranted trust. To address these issues, we propose enhancing the ST framework with a fifth 'W-question' to clarify the specific social attributions assigned to LLMs by its designers and users. This addition aims to bridge the gap between LLM capabilities and user perceptions, promoting the ethically responsible development and use of LLM-based technology.

摘要:以人为本的可解释 AI (HCXAI) 倡导将社会层面整合到 AI 解释中。HCXAI 话语的核心是社会透明度 (ST) 框架,其目标是让 AI 系统的社会组织背景对用户来说是可理解的。在这项工作中,我们建议扩展 ST 框架以解决大型语言模型 (LLM) 中社会错误归因的风险,尤其是在心理健康等敏感领域。事实上,LLM 能够出色地模拟角色和人格,这可能导致设计者的意图和用户对社会属性的认知之间出现错配,从而有风险促进情绪操纵和危险行为、认知不公正和不合理的信任。为了解决这些问题,我们建议用第五个“W 问题”来增强 ST 框架,以明确设计者和用户赋予 LLM 的具体社会属性。此补充旨在弥合 LLM 能力和用户认知之间的差距,促进基于 LLM 的技术在道德上负责任地开发和使用。

Clinical Domain Knowledge-Derived Template Improves Post Hoc AI Explanations in Pneumothorax Classification

2403.18871v1 by Han Yuan, Chuan Hong, Pengtao Jiang, Gangming Zhao, Nguyen Tuan Anh Tran, Xinxing Xu, Yet Yen Yan, Nan Liu

Background: Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. To address the opaqueness often associated with deep learning (DL) models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax diagnoses made by DL models. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement. Method: We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of these explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template's boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods with and without our template guidance when explaining two DL models in two real-world datasets. Results: The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. Conclusions: In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving AI explanations. We anticipate that our template guidance will forge a fresh approach to elucidating AI models by integrating clinical domain expertise.

摘要:背景:氣胸是一種因肺部與胸壁之間異常集氣所引起的急性胸腔疾病。為了解決深度學習(DL)模型經常伴隨的不透明性,可解釋人工智慧(XAI)方法已被引入,用於概述與 DL 模型做出的氣胸診斷相關的區域。然而,這些解釋有時會與實際病灶區域有所出入,突顯出進一步改進的必要性。方法:我們提出了一種模板引導式方法,將氣胸的臨床知識納入 XAI 方法產生的模型解釋中,從而提升這些解釋的品質。利用放射科醫師建立的病灶描繪,我們的做法首先產生一個模板,用於表示氣胸可能發生的區域。然後將此模板疊加在模型解釋上,以篩選出超出模板邊界的無關解釋。為了驗證其效力,我們對三種 XAI 方法進行了比較分析,在兩個真實世界資料集中解釋兩個 DL 模型時,分別採用和不採用我們的模板引導。結果:所提出的方法在建立於三種 XAI 方法、兩個 DL 模型和兩個資料集的十二種基準情境中,始終改善了基準 XAI 方法。在比較模型解釋和真實病灶區域時,透過基準效能的效能改進計算出的平均增量百分比為交集比(IoU)的 97.8% 和骰子相似性係數(DSC)的 94.1%。結論:在氣胸診斷的背景下,我們提出了一種模板引導式方法,用於改善 AI 解釋。我們預期我們的模板引導將透過整合臨床領域專業知識,為闡明 AI 模型建立一種新方法。

Enhancing Neural Machine Translation of Low-Resource Languages: Corpus Development, Human Evaluation and Explainable AI Architectures

2403.01580v1 by Séamus Lankford

In the current machine translation (MT) landscape, the Transformer architecture stands out as the gold standard, especially for high-resource language pairs. This research delves into its efficacy for low-resource language pairs including both the English$\leftrightarrow$Irish and English$\leftrightarrow$Marathi language pairs. Notably, the study identifies the optimal hyperparameters and subword model type to significantly improve the translation quality of Transformer models for low-resource language pairs. The scarcity of parallel datasets for low-resource languages can hinder MT development. To address this, gaHealth was developed, the first bilingual corpus of health data for the Irish language. Focusing on the health domain, models developed using this in-domain dataset exhibited very significant improvements in BLEU score when compared with models from the LoResMT2021 Shared Task. A subsequent human evaluation using the multidimensional quality metrics error taxonomy showcased the superior performance of the Transformer system in reducing both accuracy and fluency errors compared to an RNN-based counterpart. Furthermore, this thesis introduces adaptNMT and adaptMLLM, two open-source applications streamlined for the development, fine-tuning, and deployment of neural machine translation models. These tools considerably simplify the setup and evaluation process, making MT more accessible to both developers and translators. Notably, adaptNMT, grounded in the OpenNMT ecosystem, promotes eco-friendly natural language processing research by highlighting the environmental footprint of model development. Fine-tuning of MLLMs by adaptMLLM demonstrated advancements in translation performance for two low-resource language pairs: English$\leftrightarrow$Irish and English$\leftrightarrow$Marathi, compared to baselines from the LoResMT2021 Shared Task.

摘要:在當前機器翻譯 (MT) 領域中,Transformer 架構脫穎而出,成為黃金標準,特別是對於高資源語言對。本研究探討其對低資源語言對的效能,包括英語↔愛爾蘭語和英語↔馬拉地語語言對。值得注意的是,本研究識別出最佳超參數和子詞模型類型,以顯著提高 Transformer 模型對低資源語言對的翻譯品質。 低資源語言的平行資料集的稀缺會阻礙 MT 的發展。為了解決這個問題,開發了 gaHealth,這是愛爾蘭語的第一個雙語健康資料語料庫。專注於健康領域,使用此域內資料集開發的模型在 BLEU 得分方面表現出非常顯著的進步,與 LoResMT2021 共享任務中的模型相比。隨後使用多維品質指標錯誤分類法進行的人工評估顯示,與基於 RNN 的對應模型相比,Transformer 系統在減少準確性和流暢性錯誤方面表現出優異的性能。 此外,本論文介紹了 adaptNMT 和 adaptMLLM,這兩個開源應用程式簡化了神經機器翻譯模型的開發、微調和部署。這些工具大幅簡化了設定和評估流程,讓 MT 更容易讓開發人員和翻譯人員使用。值得注意的是,adaptNMT 以 OpenNMT 生態系統為基礎,通過強調模型開發的環境足跡來促進生態友好的自然語言處理研究。與 LoResMT2021 共享任務中的基準相比,adaptMLLM 對 MLLM 的微調證明了英語↔愛爾蘭語和英語↔馬拉地語這兩個低資源語言對的翻譯性能進步。

Cause and Effect: Can Large Language Models Truly Understand Causality?

2402.18139v3 by Swagata Ashwani, Kshiteesh Hegde, Nishith Reddy Mannuru, Mayank Jindal, Dushyant Singh Sengar, Krishna Chaitanya Rao Kathala, Dishant Banga, Vinija Jain, Aman Chadha

With the rise of Large Language Models(LLMs), it has become crucial to understand their capabilities and limitations in deciphering and explaining the complex web of causal relationships that language entails. Current methods use either explicit or implicit causal reasoning, yet there is a strong need for a unified approach combining both to tackle a wide array of causal relationships more effectively. This research proposes a novel architecture called Context Aware Reasoning Enhancement with Counterfactual Analysis(CARE CA) framework to enhance causal reasoning and explainability. The proposed framework incorporates an explicit causal detection module with ConceptNet and counterfactual statements, as well as implicit causal detection through LLMs. Our framework goes one step further with a layer of counterfactual explanations to accentuate LLMs understanding of causality. The knowledge from ConceptNet enhances the performance of multiple causal reasoning tasks such as causal discovery, causal identification and counterfactual reasoning. The counterfactual sentences add explicit knowledge of the not caused by scenarios. By combining these powerful modules, our model aims to provide a deeper understanding of causal relationships, enabling enhanced interpretability. Evaluation of benchmark datasets shows improved performance across all metrics, such as accuracy, precision, recall, and F1 scores. We also introduce CausalNet, a new dataset accompanied by our code, to facilitate further research in this domain.

摘要:隨著大型語言模型 (LLM) 的興起,了解它們在解碼和解釋語言所蘊含的複雜因果關係網路中的能力和限制變得至關重要。目前的技術使用明確或隱含的因果推理,但強烈需要一種統一的方法,結合兩者以更有效地處理廣泛的因果關係。本研究提出了一種稱為情境感知推理增強與反事實分析 (CARE CA) 框架的新架構,以增強因果推理和可解釋性。提出的框架結合了使用 ConceptNet 和反事實陳述的明確因果檢測模組,以及透過 LLM 進行的隱含因果檢測。我們的框架更進一步,加入一層反事實解釋,以強調 LLM 對因果關係的理解。來自 ConceptNet 的知識增強了多項因果推理任務的執行,例如因果發現、因果識別和反事實推理。反事實句加入了未由情境造成的明確知識。透過結合這些強大的模組,我們的模型旨在提供對因果關係更深入的理解,實現增強的可解釋性。基準資料集的評估顯示在所有指標(例如準確度、精確度、召回率和 F1 分數)上都有所提升。我們還引入了 CausalNet,一個新的資料集,並附上了我們的程式碼,以促進在這個領域的進一步研究。

Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina

2402.18600v1 by Yasin Sadeghi Bazargani, Majid Mirzaei, Navid Sobhi, Mirsaeed Abdollahi, Ali Jafarizadeh, Siamak Pedrammehr, Roohallah Alizadehsani, Ru San Tan, Sheikh Mohammed Shariful Islam, U. Rajendra Acharya

Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled systems developed for high-throughput detection of DR using digitized retinal images have become clinically adopted. Beyond DR screening, AI integration also holds immense potential to address challenges associated with the holistic care of the patient with DM. In this work, we aim to comprehensively review the literature for studies on AI applications based on retinal images related to DM diagnosis, prognostication, and management. We will describe the findings of holistic AI-assisted diabetes care, including but not limited to DR screening, and discuss barriers to implementing such systems, including issues concerning ethics, data privacy, equitable access, and explainability. With the ability to evaluate the patient's health status vis a vis DM complication as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM.

摘要:糖尿病(DM)使患者容易出現血管併發症。 視網膜影像和血管反映身體的微血管和巨血管健康狀況。它們可用於診斷糖尿病併發症,包括糖尿病視網膜病變(DR)、神經病變、腎病和動脈粥樣硬化性心血管疾病,以及預測心血管事件的風險。為使用數位化視網膜影像進行高通量 DR 檢測而開發的人工智慧(AI)啟用系統已在臨床採用。除了 DR 篩檢外,AI 整合也具有巨大的潛力來應對與糖尿病患者整體照護相關的挑戰。在這項工作中,我們旨在全面回顧基於視網膜影像的 AI 應用相關研究的文獻,這些研究與糖尿病的診斷、預後和管理有關。我們將描述整體 AI 輔助糖尿病照護的發現,包括但不限於 DR 篩檢,並討論實施此類系統的障礙,包括與倫理、資料隱私、公平存取和可解釋性有關的問題。透過評估患者的健康狀況,同時考量糖尿病併發症以及未來心血管併發症的風險預後,AI 輔助視網膜影像分析有潛力成為糖尿病患者現代化個人化醫療的中心工具。

Multi-stakeholder Perspective on Responsible Artificial Intelligence and Acceptability in Education

2402.15027v2 by A. J. Karran, P. Charland, J-T. Martineau, A. Ortiz de Guinea Lopez de Arana, AM. Lesage, S. Senecal, P-M. Leger

This study investigates the acceptability of different artificial intelligence (AI) applications in education from a multi-stakeholder perspective, including students, teachers, and parents. Acknowledging the transformative potential of AI in education, it addresses concerns related to data privacy, AI agency, transparency, explainability and the ethical deployment of AI. Through a vignette methodology, participants were presented with four scenarios where AI's agency, transparency, explainability, and privacy were manipulated. After each scenario, participants completed a survey that captured their perceptions of AI's global utility, individual usefulness, justice, confidence, risk, and intention to use each scenario's AI if available. The data collection comprising a final sample of 1198 multi-stakeholder participants was distributed through a partner institution and social media campaigns and focused on individual responses to four AI use cases. A mediation analysis of the data indicated that acceptance and trust in AI varies significantly across stakeholder groups. We found that the key mediators between high and low levels of AI's agency, transparency, and explainability, as well as the intention to use the different educational AI, included perceived global utility, justice, and confidence. The study highlights that the acceptance of AI in education is a nuanced and multifaceted issue that requires careful consideration of specific AI applications and their characteristics, in addition to the diverse stakeholders' perceptions.

摘要:這項研究從多個利害關係人的角度探討不同的人工智慧 (AI) 應用在教育上的可接受性,包括學生、老師和家長。承認 AI 在教育上的轉型潛力,它解決了與資料隱私、AI 代理、透明度、可解釋性和 AI 的道德部署相關的疑慮。透過小插曲方法,參與者被呈現了四種情境,其中 AI 的代理、透明度、可解釋性和隱私受到操縱。在每個情境後,參與者完成了一項調查,該調查捕捉了他們對 AI 的整體效用、個人效用、正義、信心、風險和如果可用,使用每個情境的 AI 的意圖的看法。資料蒐集包含來自合作機構和社群媒體活動的 1198 位多利害關係人參與者的最終樣本,並專注於對四個 AI 使用案例的個別回應。對資料的調解分析表明,對 AI 的接受度和信任在利害關係人團體之間有顯著差異。我們發現,AI 的代理、透明度和可解釋性高低程度之間的關鍵調解者,以及使用不同教育 AI 的意圖,包括感知到的整體效用、正義和信心。這項研究強調,接受 AI 在教育上的應用是一個微妙且多面向的問題,除了不同的利害關係人的看法外,還需要仔細考慮具體的 AI 應用及其特徵。

Deciphering Heartbeat Signatures: A Vision Transformer Approach to Explainable Atrial Fibrillation Detection from ECG Signals

2402.09474v2 by Aruna Mohan, Danne Elbers, Or Zilbershot, Fatemeh Afghah, David Vorchheimer

Remote patient monitoring based on wearable single-lead electrocardiogram (ECG) devices has significant potential for enabling the early detection of heart disease, especially in combination with artificial intelligence (AI) approaches for automated heart disease detection. There have been prior studies applying AI approaches based on deep learning for heart disease detection. However, these models are yet to be widely accepted as a reliable aid for clinical diagnostics, in part due to the current black-box perception surrounding many AI algorithms. In particular, there is a need to identify the key features of the ECG signal that contribute toward making an accurate diagnosis, thereby enhancing the interpretability of the model. In the present study, we develop a vision transformer approach to identify atrial fibrillation based on single-lead ECG data. A residual network (ResNet) approach is also developed for comparison with the vision transformer approach. These models are applied to the Chapman-Shaoxing dataset to classify atrial fibrillation, as well as another common arrhythmia, sinus bradycardia, and normal sinus rhythm heartbeats. The models enable the identification of the key regions of the heartbeat that determine the resulting classification, and highlight the importance of P-waves and T-waves, as well as heartbeat duration and signal amplitude, in distinguishing normal sinus rhythm from atrial fibrillation and sinus bradycardia.

摘要:基於可穿戴式單導程心電圖 (ECG) 裝置的遠端病患監測在早期偵測心臟疾病方面具有顯著的潛力,特別是與用於自動化心臟疾病偵測的人工智慧 (AI) 方法結合使用時。先前已有研究應用基於深度學習的 AI 方法進行心臟疾病偵測。然而,這些模型尚未被廣泛接受為臨床診斷的可靠輔助工具,部分原因在於圍繞許多 AI 演算法的當前黑箱感知。特別是,有必要找出有助於做出準確診斷的 ECG 訊號關鍵特徵,從而增強模型的可解釋性。在本研究中,我們開發了一種視覺轉換器方法,以根據單導程 ECG 資料找出心房顫動。殘差網路 (ResNet) 方法也已開發出來,以便與視覺轉換器方法進行比較。這些模型應用於 Chapman-Shaoxing 資料集,以分類心房顫動,以及另一種常見的心律不整,竇性心動過緩,和正常竇性心律的心跳。這些模型能夠找出決定最終分類的心跳關鍵區域,並強調 P 波和 T 波,以及心跳持續時間和訊號振幅在區分正常竇性心律與心房顫動和竇性心動過緩方面的重要性。

Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering

2402.05127v1 by Aryan Agrawal

This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini. These LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression. A unique few-shot prompting method enhances the models' ability to analyze and explain depressive symptoms based on the DSM-5 criteria. In the interaction phase, the models engage in empathetic dialogue management, drawing from resources like PsychDB and a Cognitive Behavioral Therapy (CBT) Guide, fostering supportive interactions with individuals experiencing major depressive disorders. Additionally, the research introduces the Illuminate Database, enriched with various CBT modules, aiding in personalized therapy recommendations. The study evaluates LLM performance using metrics such as F1 scores, Precision, Recall, Cosine similarity, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) across different test sets, demonstrating their effectiveness. This comprehensive approach blends cutting-edge AI with established psychological methods, offering new possibilities in mental health care and showcasing the potential of LLMs in revolutionizing depression diagnosis and treatment strategies.

摘要:本文介紹了一種使用先進大型語言模型 (LLM) 進行憂鬱症偵測和治療的新模式:生成式預訓練Transformer 4 (GPT-4)、Llama 2 聊天機器人和 Gemini。這些 LLM 經過微調,具備專業提示,可診斷、解釋並建議憂鬱症的治療介入方法。一種獨特的少次提示方法增強了模型根據 DSM-5 標準分析和解釋憂鬱症狀的能力。在互動階段,這些模型會參與同理心對話管理,從 PsychDB 和認知行為療法 (CBT) 指南等資源中汲取,促進與經歷重度憂鬱症的人們的支持性互動。此外,這項研究還介紹了 Illuminate 資料庫,其中包含各種 CBT 模組,有助於個性化治療建議。這項研究使用 F1 分數、準確率、召回率、餘弦相似度和面向召回率的 Gisting 評估替身 (ROUGE) 等指標,在不同的測試集中評估 LLM 的表現,證明了它們的有效性。這種綜合方法結合了尖端的 AI 與既定的心理方法,為心理保健提供了新的可能性,並展示了 LLM 在革新憂鬱症診斷和治療策略方面的潛力。

Information That Matters: Exploring Information Needs of People Affected by Algorithmic Decisions

2401.13324v6 by Timothée Schmude, Laura Koesten, Torsten Möller, Sebastian Tschiatschek

Every AI system that makes decisions about people has a group of stakeholders that are personally affected by these decisions. However, explanations of AI systems rarely address the information needs of this stakeholder group, who often are AI novices. This creates a gap between conveyed information and information that matters to those who are impacted by the system's decisions, such as domain experts and decision subjects. To address this, we present the "XAI Novice Question Bank," an extension of the XAI Question Bank containing a catalog of information needs from AI novices in two use cases: employment prediction and health monitoring. The catalog covers the categories of data, system context, system usage, and system specifications. We gathered information needs through task-based interviews where participants asked questions about two AI systems to decide on their adoption and received verbal explanations in response. Our analysis showed that participants' confidence increased after receiving explanations but that their understanding faced challenges. These included difficulties in locating information and in assessing their own understanding, as well as attempts to outsource understanding. Additionally, participants' prior perceptions of the systems' risks and benefits influenced their information needs. Participants who perceived high risks sought explanations about the intentions behind a system's deployment, while those who perceived low risks rather asked about the system's operation. Our work aims to support the inclusion of AI novices in explainability efforts by highlighting their information needs, aims, and challenges. We summarize our findings as five key implications that can inform the design of future explanations for lay stakeholder audiences.

摘要:每個對人做出決定的 AI 系統都有一群利害關係人 受到這些決定的親身影響。然而,AI 系統的解釋很少能滿足這群利害關係人的資訊需求,而他們 通常都是 AI 新手。這造成了傳達資訊與 受到系統決策影響的人士(例如領域專家和決策主體)重視的資訊之間的落差。為了解決這個問題,我們提出了 「XAI 新手問題庫」,它是 XAI 問題庫的延伸,包含來自 AI 新手在兩個使用案例中的資訊需求目錄:就業 預測和健康監測。目錄涵蓋了資料、 系統背景、系統使用和系統規格等類別。我們透過任務型訪談收集資訊需求,參與者在訪談中詢問了兩個 AI 系統的問題,以決定是否採用它們,並收到口頭 解釋作為回應。我們的分析顯示,參與者在收到解釋後信心有所提升,但他們的理解卻面臨挑戰。這些挑戰包括難以找到資訊和評估自己的理解,以及試圖外包 理解。此外,參與者對系統風險和好處的先前回饋影響了他們的資訊需求。認為風險高的參與者尋求解釋系統部署背後的意圖,而認為風險低的人則詢問系統的 操作。我們的研究旨在透過強調 AI 新手的資訊需求、目標和 挑戰,來支持將 AI 新手納入可解釋性工作中。我們將我們的研究結果總結為五個關鍵啟示,這些啟示可以為未來針對非專業利害關係人受眾的解釋設計提供參考。

Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education

2401.02985v1 by Vahid Ashrafimoghari, Necdet Gürkan, Jordan W. Suchow

The rapid evolution of artificial intelligence (AI), especially in the domain of Large Language Models (LLMs) and generative AI, has opened new avenues for application across various fields, yet its role in business education remains underexplored. This study introduces the first benchmark to assess the performance of seven major LLMs, OpenAI's models (GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo), Google's models (PaLM 2, Gemini 1.0 Pro), and Anthropic's models (Claude 2 and Claude 2.1), on the GMAT, which is a key exam in the admission process for graduate business programs. Our analysis shows that most LLMs outperform human candidates, with GPT-4 Turbo not only outperforming the other models but also surpassing the average scores of graduate students at top business schools. Through a case study, this research examines GPT-4 Turbo's ability to explain answers, evaluate responses, identify errors, tailor instructions, and generate alternative scenarios. The latest LLM versions, GPT-4 Turbo, Claude 2.1, and Gemini 1.0 Pro, show marked improvements in reasoning tasks compared to their predecessors, underscoring their potential for complex problem-solving. While AI's promise in education, assessment, and tutoring is clear, challenges remain. Our study not only sheds light on LLMs' academic potential but also emphasizes the need for careful development and application of AI in education. As AI technology advances, it is imperative to establish frameworks and protocols for AI interaction, verify the accuracy of AI-generated content, ensure worldwide access for diverse learners, and create an educational environment where AI supports human expertise. This research sets the stage for further exploration into the responsible use of AI to enrich educational experiences and improve exam preparation and assessment methods.

摘要:人工智慧 (AI) 的快速演進,尤其是在大型語言模型 (LLM) 和生成式 AI 的領域,為各個領域的應用開啟了新途徑,但其在商業教育中的角色仍未被充分探討。本研究首次引入了基準,用以評估七個主要 LLM 的效能,包括 OpenAI 的模型 (GPT-3.5 Turbo、GPT-4 和 GPT-4 Turbo)、Google 的模型 (PaLM 2、Gemini 1.0 Pro) 和 Anthropic 的模型 (Claude 2 和 Claude 2.1),這些模型將用於研究生商業課程入學程序中的關鍵考試 GMAT。我們的分析顯示,大多數 LLM 的表現都優於人類考生,其中 GPT-4 Turbo 不僅優於其他模型,更超越了頂尖商學院的研究生平均分數。透過案例研究,本研究探討了 GPT-4 Turbo 在解釋答案、評估回應、辨識錯誤、調整說明和產生替代情境方面的能力。與前一代版本相比,最新的 LLM 版本 GPT-4 Turbo、Claude 2.1 和 Gemini 1.0 Pro 在推理任務方面有顯著的進步,凸顯了其在解決複雜問題方面的潛力。儘管 AI 在教育、評量和輔導方面的承諾很明確,但仍有挑戰存在。我們的研究不僅闡明了 LLM 的學術潛力,也強調了在教育中審慎開發和應用 AI 的必要性。隨著 AI 技術的進步,建立 AI 互動的架構和協定、驗證 AI 生成的內容的準確性、確保全球各地多元學習者的存取權,以及創造一個 AI 支持人類專業知識的教育環境至關重要。本研究為進一步探索負責任地使用 AI 來豐富教育體驗並改善考試準備和評量方法奠定了基礎。

XAI for In-hospital Mortality Prediction via Multimodal ICU Data

2312.17624v1 by Xingqiao Li, Jindong Gu, Zhiyong Wang, Yancheng Yuan, Bo Du, Fengxiang He

Predicting in-hospital mortality for intensive care unit (ICU) patients is key to final clinical outcomes. AI has shown advantaged accuracy but suffers from the lack of explainability. To address this issue, this paper proposes an eXplainable Multimodal Mortality Predictor (X-MMP) approaching an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data. We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions. Furthermore, we introduce an explainable method, namely Layer-Wise Propagation to Transformer, as a proper extension of the LRP method to Transformers, producing explanations over multimodal inputs and revealing the salient features attributed to prediction. Moreover, the contribution of each modality to clinical outcomes can be visualized, assisting clinicians in understanding the reasoning behind decision-making. We construct a multimodal dataset based on MIMIC-III and MIMIC-III Waveform Database Matched Subset. Comprehensive experiments on benchmark datasets demonstrate that our proposed framework can achieve reasonable interpretation with competitive prediction accuracy. In particular, our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.

摘要:預測加護病房 (ICU) 病患的院內死亡率是最終臨床結果的關鍵。AI 已展現出優異的準確度,但卻缺乏可解釋性。為了解決這個問題,本文提出了一個可解釋的多模式死亡率預測器 (X-MMP),採用有效且可解釋的 AI 方式,藉由多模式 ICU 資料來預測院內死亡率。我們在架構中採用多模式學習,可以接收來自臨床資料的異質輸入並做出決策。此外,我們引入了一個可解釋的方法,也就是分層傳播至 Transformer,作為 LRP 方法適當地延伸至 Transformer,對多模式輸入產生解釋,並揭露歸因於預測的顯著特徵。此外,每個模式對臨床結果的貢獻可以視覺化,協助臨床醫師了解決策背後的理由。我們根據 MIMIC-III 和 MIMIC-III 波形資料庫比對子集建構了一個多模式資料集。在基準資料集上的全面實驗證明,我們提出的架構可以達成合理的詮釋,並具備競爭力的預測準確度。特別是,我們的架構可以輕鬆地轉移到其他臨床任務,這有助於在醫療保健研究中發現關鍵因素。

Joining Forces for Pathology Diagnostics with AI Assistance: The EMPAIA Initiative

2401.09450v2 by Norman Zerbe, Lars Ole Schwen, Christian Geißler, Katja Wiesemann, Tom Bisson, Peter Boor, Rita Carvalho, Michael Franz, Christoph Jansen, Tim-Rasmus Kiehl, Björn Lindequist, Nora Charlotte Pohlan, Sarah Schmell, Klaus Strohmenger, Falk Zakrzewski, Markus Plass, Michael Takla, Tobias Küster, André Homeyer, Peter Hufnagl

Over the past decade, artificial intelligence (AI) methods in pathology have advanced substantially. However, integration into routine clinical practice has been slow due to numerous challenges, including technical and regulatory hurdles in translating research results into clinical diagnostic products and the lack of standardized interfaces. The open and vendor-neutral EMPAIA initiative addresses these challenges. Here, we provide an overview of EMPAIA's achievements and lessons learned. EMPAIA integrates various stakeholders of the pathology AI ecosystem, i.e., pathologists, computer scientists, and industry. In close collaboration, we developed technical interoperability standards, recommendations for AI testing and product development, and explainability methods. We implemented the modular and open-source EMPAIA platform and successfully integrated 14 AI-based image analysis apps from 8 different vendors, demonstrating how different apps can use a single standardized interface. We prioritized requirements and evaluated the use of AI in real clinical settings with 14 different pathology laboratories in Europe and Asia. In addition to technical developments, we created a forum for all stakeholders to share information and experiences on digital pathology and AI. Commercial, clinical, and academic stakeholders can now adopt EMPAIA's common open-source interfaces, providing a unique opportunity for large-scale standardization and streamlining of processes. Further efforts are needed to effectively and broadly establish AI assistance in routine laboratory use. To this end, a sustainable infrastructure, the non-profit association EMPAIA International, has been established to continue standardization and support broad implementation and advocacy for an AI-assisted digital pathology future.

摘要:在過去的十年中,病理學中的人工智慧 (AI) 方法已大幅進步。然而,由於許多挑戰,包括將研究結果轉化為臨床診斷產品在技術和法規方面的障礙,以及缺乏標準化介面,導致整合到常規臨床實務中進展緩慢。開放且與供應商無關的 EMPAIA 計畫應對了這些挑戰。在此,我們提供 EMPAIA 的成就和經驗教訓的概述。EMPAIA 整合了病理學 AI 生態系統的各個利害關係人,即病理學家、電腦科學家和產業。在密切合作下,我們制定了技術互通性標準、AI 測試和產品開發建議,以及可解釋性方法。我們實作了模組化且開放原始碼的 EMPAIA 平臺,並成功整合了來自 8 個不同供應商的 14 個基於 AI 的影像分析應用程式,展示了不同的應用程式如何使用單一的標準化介面。我們優先考慮需求,並評估了 AI 在歐洲和亞洲的 14 個不同病理實驗室中的實際臨床應用。除了技術開發外,我們還為所有利害關係人建立了一個論壇,以分享數位病理學和 AI 的資訊和經驗。商業、臨床和學術利害關係人現在可以採用 EMPAIA 的常見開放原始碼介面,這為大規模標準化和簡化流程提供了獨特的機會。需要進一步的努力才能有效且廣泛地建立例行實驗室使用中的 AI 輔助。為此,已成立非營利協會 EMPAIA International,以作為永續基礎架構,繼續進行標準化,並支援廣泛實作和倡導 AI 輔助數位病理學的未來。

Robust Stochastic Graph Generator for Counterfactual Explanations

2312.11747v2 by Mario Alfonso Prado-Romero, Bardh Prenkaj, Giovanni Stilo

Counterfactual Explanation (CE) techniques have garnered attention as a means to provide insights to the users engaging with AI systems. While extensively researched in domains such as medical imaging and autonomous vehicles, Graph Counterfactual Explanation (GCE) methods have been comparatively under-explored. GCEs generate a new graph similar to the original one, with a different outcome grounded on the underlying predictive model. Among these GCE techniques, those rooted in generative mechanisms have received relatively limited investigation despite demonstrating impressive accomplishments in other domains, such as artistic styles and natural language modelling. The preference for generative explainers stems from their capacity to generate counterfactual instances during inference, leveraging autonomously acquired perturbations of the input graph. Motivated by the rationales above, our study introduces RSGG-CE, a novel Robust Stochastic Graph Generator for Counterfactual Explanations able to produce counterfactual examples from the learned latent space considering a partially ordered generation sequence. Furthermore, we undertake quantitative and qualitative analyses to compare RSGG-CE's performance against SoA generative explainers, highlighting its increased ability to engendering plausible counterfactual candidates.

摘要:反事實解釋 (CE) 技術已引起關注,作為一種為與 AI 系統互動的使用者提供見解的方法。雖然在醫學影像和自動駕駛汽車等領域廣泛研究,圖形反事實解釋 (GCE) 方法相對較少被探索。GCE 會產生一個類似於原始圖形的新圖形,並根據基礎預測模型產生不同的結果。在這些 GCE 技術中,儘管在其他領域(例如藝術風格和自然語言建模)中展現出令人印象深刻的成就,但植基於生成機制的技術獲得的關注相對有限。對生成式解釋器的偏好源於它們在推理期間產生反事實實例的能力,利用輸入圖形的自主獲取擾動。基於上述理由,我們的研究引入了 RSGG-CE,一種用於反事實解釋的新型穩健隨機圖形生成器,能夠從學習到的潛在空間中產生反事實範例,考慮部分有序的生成序列。此外,我們進行定量和定性分析,以比較 RSGG-CE 的效能與 SoA 生成式解釋器,強調其增強了產生合理解釋候選的能力。

Evaluating the Utility of Model Explanations for Model Development

2312.06032v1 by Shawn Im, Jacob Andreas, Yilun Zhou

One of the motivations for explainable AI is to allow humans to make better and more informed decisions regarding the use and deployment of AI models. But careful evaluations are needed to assess whether this expectation has been fulfilled. Current evaluations mainly focus on algorithmic properties of explanations, and those that involve human subjects often employ subjective questions to test human's perception of explanation usefulness, without being grounded in objective metrics and measurements. In this work, we evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. We conduct a mixed-methods user study involving image data to evaluate saliency maps generated by SmoothGrad, GradCAM, and an oracle explanation on two tasks: model selection and counterfactual simulation. To our surprise, we did not find evidence of significant improvement on these tasks when users were provided with any of the saliency maps, even the synthetic oracle explanation designed to be simple to understand and highly indicative of the answer. Nonetheless, explanations did help users more accurately describe the models. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.

摘要:可解釋 AI 的動機之一是讓人們在使用和部署 AI 模型時做出更好、更明智的決策。但需要仔細評估以評估是否已達到此預期。目前的評估主要集中在解釋的演算法特性,而涉及人類受試者的評估通常採用主觀問題來測試人類對解釋有用性的看法,而沒有基於客觀指標和測量。在這項工作中,我們評估解釋是否可以在機器學習模型開發的實際場景中改善人類決策制定。我們進行了一項涉及影像資料的混合方法使用者研究,以評估 SmoothGrad、GradCAM 和預言解釋在兩個任務中產生的顯著性圖:模型選擇和反事實模擬。令人驚訝的是,我們沒有發現任何顯著性圖(即使是設計為易於理解且高度指示答案的合成預言解釋)能讓使用者在這些任務上顯著改善的證據。儘管如此,解釋確實有助於使用者更準確地描述模型。這些發現提示我們要對基於顯著性的解釋中可能存在誤解的有用性保持謹慎。

Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety

2312.06798v1 by Manas Gaur, Amit Sheth

Explainability and Safety engender Trust. These require a model to exhibit consistency and reliability. To achieve these, it is necessary to use and analyze data and knowledge with statistical and symbolic AI methods relevant to the AI application - neither alone will do. Consequently, we argue and seek to demonstrate that the NeuroSymbolic AI approach is better suited for making AI a trusted AI system. We present the CREST framework that shows how Consistency, Reliability, user-level Explainability, and Safety are built on NeuroSymbolic methods that use data and knowledge to support requirements for critical applications such as health and well-being. This article focuses on Large Language Models (LLMs) as the chosen AI system within the CREST framework. LLMs have garnered substantial attention from researchers due to their versatility in handling a broad array of natural language processing (NLP) scenarios. For example, ChatGPT and Google's MedPaLM have emerged as highly promising platforms for providing information in general and health-related queries, respectively. Nevertheless, these models remain black boxes despite incorporating human feedback and instruction-guided tuning. For instance, ChatGPT can generate unsafe responses despite instituting safety guardrails. CREST presents a plausible approach harnessing procedural and graph-based knowledge within a NeuroSymbolic framework to shed light on the challenges associated with LLMs.

摘要:可解釋性和安全性建立信任。這些需要一個模型來展示一致性和可靠性。為了實現這些,有必要使用和分析數據和知識,並使用與 AI 應用相關的統計和符號 AI 方法 - 單獨使用任何一種方法都不會奏效。因此,我們主張並試圖證明 NeuroSymbolic AI 方法更適合於使 AI 成為受信任的 AI 系統。我們提出了 CREST 框架,展示了一致性、可靠性、使用者層級的可解釋性和安全性是如何建立在 NeuroSymbolic 方法上的,該方法使用數據和知識來支持關鍵應用(例如健康和福祉)的要求。本文重點關注大型語言模型 (LLM),因為它是 CREST 框架中選擇的 AI 系統。LLM 因其在處理廣泛的自然語言處理 (NLP) 場景方面的多功能性而備受研究人員的關注。例如,ChatGPT 和 Google 的 MedPaLM 已成為提供一般和健康相關查詢信息的極有希望的平台。儘管如此,這些模型仍然是黑盒子,儘管納入了人類反饋和指令引導的調整。例如,儘管制定了安全防護措施,ChatGPT 仍可能產生不安全的回應。CREST 提出了一種合理的方法,在 NeuroSymbolic 框架中利用程序和基於圖表的知識,以闡明與 LLM 相關的挑戰。

Class-Discriminative Attention Maps for Vision Transformers

2312.02364v3 by Lennart Brocki, Jakub Binda, Neo Christopher Chung

Importance estimators are explainability methods that quantify feature importance for deep neural networks (DNN). In vision transformers (ViT), the self-attention mechanism naturally leads to attention maps, which are sometimes interpreted as importance scores that indicate which input features ViT models are focusing on. However, attention maps do not account for signals from downstream tasks. To generate explanations that are sensitive to downstream tasks, we have developed class-discriminative attention maps (CDAM), a gradient-based extension that estimates feature importance with respect to a known class or a latent concept. CDAM scales attention scores by how relevant the corresponding tokens are for the predictions of a classifier head. In addition to targeting the supervised classifier, CDAM can explain an arbitrary concept shared by selected samples by measuring similarity in the latent space of ViT. Additionally, we introduce Smooth CDAM and Integrated CDAM, which average a series of CDAMs with slightly altered tokens. Our quantitative benchmarks include correctness, compactness, and class sensitivity, in comparison to 7 other importance estimators. Vanilla, Smooth, and Integrated CDAM excel across all three benchmarks. In particular, our results suggest that existing importance estimators may not provide sufficient class-sensitivity. We demonstrate the utility of CDAM in medical images by training and explaining malignancy and biomarker prediction models based on lung Computed Tomography (CT) scans. Overall, CDAM is shown to be highly class-discriminative and semantically relevant, while providing compact explanations.

摘要:重要性估計器是一種可解釋性方法,用於量化深度神經網路 (DNN) 的特徵重要性。在視覺Transformer (ViT) 中,自我注意機制自然會導致注意力圖,有時會將其解釋為重要性分數,表示 ViT 模型關注哪些輸入特徵。然而,注意力圖並未考慮來自下游任務的信號。為了產生對下游任務敏感的解釋,我們開發了類別區分注意力圖 (CDAM),這是一種基於梯度的擴充,用於估計相對於已知類別或潛在概念的特徵重要性。CDAM 根據對應的符號與分類器頭的預測相關程度,調整注意力分數。除了針對監督分類器外,CDAM 還可以通過測量 ViT 的潛在空間中的相似性來解釋選定樣本共有的任意概念。此外,我們引入了平滑 CDAM 和積分 CDAM,它們對一系列具有略微改變的符號的 CDAM 進行平均。我們的量化基準包括正確性、緊湊性和類別敏感性,與其他 7 個重要性估計器相比。香草、平滑和積分 CDAM 在所有三個基準中表現出色。特別是,我們的結果表明現有的重要性估計器可能無法提供足夠的類別敏感性。我們通過基於肺部電腦斷層掃描 (CT) 掃描訓練和解釋惡性腫瘤和生物標記預測模型,證明了 CDAM 在醫學影像中的效用。總的來說,CDAM 被證明具有高度類別區分性和語義相關性,同時提供簡潔的解釋。

Deployment of a Robust and Explainable Mortality Prediction Model: The COVID-19 Pandemic and Beyond

2311.17133v1 by Jacob R. Epifano, Stephen Glass, Ravi P. Ramachandran, Sharad Patel, Aaron J. Masino, Ghulam Rasool

This study investigated the performance, explainability, and robustness of deployed artificial intelligence (AI) models in predicting mortality during the COVID-19 pandemic and beyond. The first study of its kind, we found that Bayesian Neural Networks (BNNs) and intelligent training techniques allowed our models to maintain performance amidst significant data shifts. Our results emphasize the importance of developing robust AI models capable of matching or surpassing clinician predictions, even under challenging conditions. Our exploration of model explainability revealed that stochastic models generate more diverse and personalized explanations thereby highlighting the need for AI models that provide detailed and individualized insights in real-world clinical settings. Furthermore, we underscored the importance of quantifying uncertainty in AI models which enables clinicians to make better-informed decisions based on reliable predictions. Our study advocates for prioritizing implementation science in AI research for healthcare and ensuring that AI solutions are practical, beneficial, and sustainable in real-world clinical environments. By addressing unique challenges and complexities in healthcare settings, researchers can develop AI models that effectively improve clinical practice and patient outcomes.

摘要:本研究调查了在 COVID-19 疫情期间及以后预测死亡率时,已部署人工智能 (AI) 模型的性能、可解释性和稳健性。作为同类研究中的首例,我们发现贝叶斯神经网络 (BNN) 和智能训练技术让我们的模型在数据发生重大变化时仍能保持性能。我们的结果强调了开发稳健的 AI 模型的重要性,即使在具有挑战性的条件下,这些模型也能匹配或超越临床医生的预测。我们对模型可解释性的探索表明,随机模型会产生更多样化且个性化的解释,从而突出了在现实世界的临床环境中提供详细且个性化见解的 AI 模型的必要性。此外,我们强调了量化 AI 模型中不确定性的重要性,这使临床医生能够根据可靠的预测做出更明智的决策。我们的研究提倡在医疗保健的 AI 研究中优先考虑实施科学,并确保 AI 解决方案在现实世界的临床环境中实用、有益且可持续。通过解决医疗保健环境中的独特挑战和复杂性,研究人员可以开发出有效改善临床实践和患者预后的 AI 模型。

Variational Autoencoders for Feature Exploration and Malignancy Prediction of Lung Lesions

2311.15719v1 by Benjamin Keel, Aaron Quyn, David Jayne, Samuel D. Relton

Lung cancer is responsible for 21% of cancer deaths in the UK and five-year survival rates are heavily influenced by the stage the cancer was identified at. Recent studies have demonstrated the capability of AI methods for accurate and early diagnosis of lung cancer from routine scans. However, this evidence has not translated into clinical practice with one barrier being a lack of interpretable models. This study investigates the application Variational Autoencoders (VAEs), a type of generative AI model, to lung cancer lesions. Proposed models were trained on lesions extracted from 3D CT scans in the LIDC-IDRI public dataset. Latent vector representations of 2D slices produced by the VAEs were explored through clustering to justify their quality and used in an MLP classifier model for lung cancer diagnosis, the best model achieved state-of-the-art metrics of AUC 0.98 and 93.1% accuracy. Cluster analysis shows the VAE latent space separates the dataset of malignant and benign lesions based on meaningful feature components including tumour size, shape, patient and malignancy class. We also include a comparative analysis of the standard Gaussian VAE (GVAE) and the more recent Dirichlet VAE (DirVAE), which replaces the prior with a Dirichlet distribution to encourage a more explainable latent space with disentangled feature representation. Finally, we demonstrate the potential for latent space traversals corresponding to clinically meaningful feature changes.

摘要:肺癌占英國癌症死亡人數的 21%,五年存活率很大程度取決於癌症被發現的階段。最近的研究已證明人工智能方法具有從例行掃描中準確及早診斷肺癌的能力。然而,此證據尚未轉化為臨床實務,其中一個障礙是缺乏可解釋的模型。本研究探討了應用變分自動編碼器 (VAE),一種生成式人工智能模型,於肺癌病灶。將提出的模型訓練於從 LIDC-IDRI 公共數據集中提取的 3D 電腦斷層掃描病灶。通過聚類探索了 VAE 生成的 2D 切片的潛在向量表示,以證明其品質,並用於肺癌診斷的 MLP 分類器模型,最佳模型達到了 AUC 0.98 和 93.1% 準確度的最先進指標。聚類分析顯示,VAE 潛在空間根據有意義的特徵組成(包括腫瘤大小、形狀、患者和惡性類別)將惡性和良性病灶的數據集分開。我們還包括標準高斯 VAE (GVAE) 和更新的狄利克雷 VAE (DirVAE) 的比較分析,後者用狄利克雷分佈取代先驗,以促進具有解開特徵表示的更具可解釋性的潛在空間。最後,我們展示了與臨床有意義的特徵變化相應的潛在空間橫越的潛力。

MRxaI: Black-Box Explainability for Image Classifiers in a Medical Setting

2311.14471v1 by Nathan Blake, Hana Chockler, David A. Kelly, Santiago Calderon Pena, Akchunya Chanchal

Existing tools for explaining the output of image classifiers can be divided into white-box, which rely on access to the model internals, and black-box, agnostic to the model. As the usage of AI in the medical domain grows, so too does the usage of explainability tools. Existing work on medical image explanations focuses on white-box tools, such as gradcam. However, there are clear advantages to switching to a black-box tool, including the ability to use it with any classifier and the wide selection of black-box tools available. On standard images, black-box tools are as precise as white-box. In this paper we compare the performance of several black-box methods against gradcam on a brain cancer MRI dataset. We demonstrate that most black-box tools are not suitable for explaining medical image classifications and present a detailed analysis of the reasons for their shortcomings. We also show that one black-box tool, a causal explainability-based rex, performs as well as \gradcam.

摘要:現有的圖像分類器輸出解釋工具可分為依賴於模型內部存取權限的白盒,以及與模型無關的黑盒。隨著 AI 在醫療領域的使用增加,可解釋性工具的使用也隨之增加。現有醫學影像解釋的工作重點在於白盒工具,例如 gradcam。然而,切換到黑盒工具有明顯的優點,包括能夠與任何分類器一起使用,以及廣泛的黑盒工具可供選擇。在標準影像上,黑盒工具與白盒一樣精確。在本文中,我們比較了多種黑盒方法在腦癌 MRI 資料集上與 gradcam 的效能。我們證明大多數黑盒工具不適合解釋醫學影像分類,並詳細分析其缺點的原因。我們還表明一種黑盒工具,基於因果可解釋性的 rex,表現與 \gradcam 一樣好。

Moderating Model Marketplaces: Platform Governance Puzzles for AI Intermediaries

2311.12573v3 by Robert Gorwa, Michael Veale

The AI development community is increasingly making use of hosting intermediaries such as Hugging Face provide easy access to user-uploaded models and training data. These model marketplaces lower technical deployment barriers for hundreds of thousands of users, yet can be used in numerous potentially harmful and illegal ways. In this article, we explain ways in which AI systems, which can both `contain' content and be open-ended tools, present one of the trickiest platform governance challenges seen to date. We provide case studies of several incidents across three illustrative platforms -- Hugging Face, GitHub and Civitai -- to examine how model marketplaces moderate models. Building on this analysis, we outline important (and yet nevertheless limited) practices that industry has been developing to respond to moderation demands: licensing, access and use restrictions, automated content moderation, and open policy development. While the policy challenge at hand is a considerable one, we conclude with some ideas as to how platforms could better mobilize resources to act as a careful, fair, and proportionate regulatory access point.

摘要:AI 開發社群日益利用 Hugging Face 等託管中介機構提供用戶上傳的模型和訓練資料的簡易存取權限。這些模型市集降低了數十萬名用戶的技術部署障礙,但可能會被用於許多潛在有害和非法的方式。在本文中,我們說明 AI 系統既可以「包含」內容,又可以作為開放式工具,這提出了迄今為止最棘手的平台治理挑戰之一。我們提供 Hugging Face、GitHub 和 Civitai 等三個說明性平台上數起事件的案例研究,以檢視模型市集如何審核模型。根據此分析,我們概述產業為回應審核需求而開發的重要(但仍有限)實務:授權、存取和使用限制、自動化內容審核和開放政策制定。雖然當前政策挑戰相當可觀,我們最後提出一些構想,說明平台如何能更好地動員資源,作為謹慎、公平且適度的法規存取點。

Ovarian Cancer Data Analysis using Deep Learning: A Systematic Review from the Perspectives of Key Features of Data Analysis and AI Assurance

2311.11932v1 by Muta Tah Hira, Mohammad A. Razzaque, Mosharraf Sarker

Background and objectives: By extracting this information, Machine or Deep Learning (ML/DL)-based autonomous data analysis tools can assist clinicians and cancer researchers in discovering patterns and relationships from complex data sets. Many DL-based analyses on ovarian cancer (OC) data have recently been published. These analyses are highly diverse in various aspects of cancer (e.g., subdomain(s) and cancer type they address) and data analysis features. However, a comprehensive understanding of these analyses in terms of these features and AI assurance (AIA) is currently lacking. This systematic review aims to fill this gap by examining the existing literature and identifying important aspects of OC data analysis using DL, explicitly focusing on the key features and AI assurance perspectives. Methods: The PRISMA framework was used to conduct comprehensive searches in three journal databases. Only studies published between 2015 and 2023 in peer-reviewed journals were included in the analysis. Results: In the review, a total of 96 DL-driven analyses were examined. The findings reveal several important insights regarding DL-driven ovarian cancer data analysis: - Most studies 71% (68 out of 96) focused on detection and diagnosis, while no study addressed the prediction and prevention of OC. - The analyses were predominantly based on samples from a non-diverse population (75% (72/96 studies)), limited to a geographic location or country. - Only a small proportion of studies (only 33% (32/96)) performed integrated analyses, most of which used homogeneous data (clinical or omics). - Notably, a mere 8.3% (8/96) of the studies validated their models using external and diverse data sets, highlighting the need for enhanced model validation, and - The inclusion of AIA in cancer data analysis is in a very early stage; only 2.1% (2/96) explicitly addressed AIA through explainability.

摘要:背景和目標:通過提取這些資訊,機器或深度學習 (ML/DL) 基於自主數據分析工具可以協助臨床醫生和癌症研究人員從複雜的數據集中發現模式和關係。最近已發表許多基於 DL 的卵巢癌 (OC) 數據分析。這些分析在癌症的各個方面(例如,它們涉及的子領域和癌症類型)和數據分析功能方面高度多樣化。然而,目前缺乏對這些分析在這些特徵和 AI 保證 (AIA) 方面的全面理解。這篇系統性回顧旨在通過檢視現有文獻並明確關注關鍵特徵和 AI 保證觀點,來填補這個空白。方法:使用 PRISMA 架構在三個期刊資料庫中進行全面搜尋。分析僅包括 2015 年至 2023 年間發表於同行評審期刊的研究。結果:在回顧中,總共檢視了 96 項由 DL 驅動的分析。研究結果揭示了幾個關於由 DL 驅動的卵巢癌數據分析的重要見解:- 大多數研究 71%(96 項中有 68 項)專注於檢測和診斷,而沒有研究探討 OC 的預測和預防。- 這些分析主要基於來自非多元族群的樣本(75%(96 項研究中的 72 項)),僅限於某個地理位置或國家。- 只有少部分研究(僅 33%(96 項研究中的 32 項)執行整合分析,其中大多數使用同質數據(臨床或組學)。- 值得注意的是,只有 8.3%(96 項研究中的 8 項)使用外部和多元數據集驗證了其模型,強調了加強模型驗證的必要性,以及- 將 AIA 納入癌症數據分析仍處於非常早期的階段;只有 2.1%(96 項研究中的 2 項)透過可解釋性明確探討了 AIA。

Representing visual classification as a linear combination of words

2311.10933v1 by Shobhit Agarwal, Yevgeniy R. Semenov, William Lotter

Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model's decision. Humans, however, heavily rely on language to convey explanations of not only "where" but "what". Additionally, most explainability approaches focus on explaining individual AI predictions, rather than describing the features used by an AI model in general. The latter would be especially useful for model and dataset auditing, and potentially even knowledge generation as AI is increasingly being used in novel tasks. Here, we present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words, resulting in a weight for each word that indicates its alignment with the vision-based classifier. We assess our approach using two medical imaging classification tasks, where we find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training. However, our approach also identifies the potential for 'shortcut connections' in the public datasets used. Towards a functional measure of explainability, we perform a pilot reader study where we find that the AI-identified words can enable non-expert humans to perform a specialized medical task at a non-trivial level. Altogether, our results emphasize the potential of using multimodal foundational models to deliver intuitive, language-based explanations of visual tasks.

摘要:解釋性是深度學習中長期的挑戰,特別是在醫療保健等高風險領域。常見的解釋性方法會強調驅動 AI 模型決策的影像區域。然而,人類很大程度依賴語言來傳達不僅是「在哪裡」,還有「是什麼」的解釋。此外,大多數解釋性方法都專注於解釋個別 AI 預測,而不是描述 AI 模型一般使用的特徵。後者對於模型和資料集稽核特別有用,甚至可能在 AI 愈來愈用於新穎任務時產生知識。在此,我們提出一個使用視覺語言模型來辨識視覺分類任務的語言描述符的解釋性策略。透過利用影像和文字之間預先訓練的聯合嵌入空間,我們的做法將新的分類任務估計為一個線性文字組合,導致每個文字都有權重,表示它與基於視覺的分類器對齊。我們使用兩個醫學影像分類任務來評估我們的做法,我們發現產生的描述符在很大程度上與臨床知識一致,儘管缺乏特定領域的語言訓練。然而,我們的做法也發現了所用公開資料集中的「捷徑連線」的可能性。為了達到解釋性的功能性衡量,我們進行了一項試驗讀者研究,發現 AI 識別的文字能讓非專家人類在非平凡的層級執行專業的醫療任務。總之,我們的結果強調了使用多模式基礎模型來提供直觀的、基於語言的視覺任務解釋的潛力。

Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging

2311.02115v2 by Emma A. M. Stanley, Raissa Souza, Anthony Winder, Vedant Gulve, Kimberly Amador, Matthias Wilms, Nils D. Forkert

Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of disparities in performance between subgroups. Since not all sources of biases in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess how those biases are encoded in models, and how capable bias mitigation methods are at ameliorating performance disparities. In this article, we introduce a novel analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models. We developed and tested this framework for conducting controlled in silico trials to assess bias in medical imaging AI using a tool for generating synthetic magnetic resonance images with known disease effects and sources of bias. The feasibility is showcased by using three counterfactual bias scenarios to measure the impact of simulated bias effects on a convolutional neural network (CNN) classifier and the efficacy of three bias mitigation strategies. The analysis revealed that the simulated biases resulted in expected subgroup performance disparities when the CNN was trained on the synthetic datasets. Moreover, reweighing was identified as the most successful bias mitigation strategy for this setup, and we demonstrated how explainable AI methods can aid in investigating the manifestation of bias in the model using this framework. Developing fair AI models is a considerable challenge given that many and often unknown sources of biases can be present in medical imaging datasets. In this work, we present a novel methodology to objectively study the impact of biases and mitigation strategies on deep learning pipelines, which can support the development of clinical AI that is robust and responsible.

摘要:使用醫療影像訓練的人工智慧 (AI) 模型,用於臨床任務時,常會在效能上展現出次群體之間的差異,形成偏見。由於並非所有真實世界醫療影像資料中的偏見來源都容易辨識,因此全面評估這些偏見是如何編碼到模型中,以及偏見緩解方法在改善效能差異方面的能力,是一項挑戰。在本文中,我們介紹了一個新穎的分析架構,用於系統化且客觀地調查醫療影像中的偏見對 AI 模型的影響。我們開發並測試了這個架構,以進行受控的電腦模擬試驗,使用一個工具來評估醫療影像 AI 中的偏見,該工具用於產生具有已知疾病影響和偏見來源的合成磁共振影像。可行性透過使用三個反事實偏見情境來衡量模擬偏見效應對卷積神經網路 (CNN) 分類器和三個偏見緩解策略的影響,並展示出來。分析顯示,當 CNN 在合成資料集上受訓時,模擬偏見會導致預期的次群體效能差異。此外,重新加權被認為是此設定中最成功的偏見緩解策略,我們展示了解釋性 AI 方法如何協助使用這個架構調查模型中偏見的表現。開發公平的 AI 模型是一項重大的挑戰,因為醫療影像資料集中可能存在許多且經常未知的偏見來源。在這項工作中,我們提出了一種新穎的方法,用於客觀地研究偏見和緩解策略對深度學習管線的影響,這可以支援健全且負責任的臨床 AI 的開發。

Predicting recovery following stroke: deep learning, multimodal data and feature selection using explainable AI

2310.19174v1 by Adam White, Margarita Saranti, Artur d'Avila Garcez, Thomas M. H. Hope, Cathy J. Price, Howard Bowman

Machine learning offers great potential for automated prediction of post-stroke symptoms and their response to rehabilitation. Major challenges for this endeavour include the very high dimensionality of neuroimaging data, the relatively small size of the datasets available for learning, and how to effectively combine neuroimaging and tabular data (e.g. demographic information and clinical characteristics). This paper evaluates several solutions based on two strategies. The first is to use 2D images that summarise MRI scans. The second is to select key features that improve classification accuracy. Additionally, we introduce the novel approach of training a convolutional neural network (CNN) on images that combine regions-of-interest extracted from MRIs, with symbolic representations of tabular data. We evaluate a series of CNN architectures (both 2D and a 3D) that are trained on different representations of MRI and tabular data, to predict whether a composite measure of post-stroke spoken picture description ability is in the aphasic or non-aphasic range. MRI and tabular data were acquired from 758 English speaking stroke survivors who participated in the PLORAS study. The classification accuracy for a baseline logistic regression was 0.678 for lesion size alone, rising to 0.757 and 0.813 when initial symptom severity and recovery time were successively added. The highest classification accuracy 0.854 was observed when 8 regions-of-interest was extracted from each MRI scan and combined with lesion size, initial severity and recovery time in a 2D Residual Neural Network.Our findings demonstrate how imaging and tabular data can be combined for high post-stroke classification accuracy, even when the dataset is small in machine learning terms. We conclude by proposing how the current models could be improved to achieve even higher levels of accuracy using images from hospital scanners.

摘要:機器學習為自動預測中風後症狀及其對復健的反應提供了極大的潛力。這項工作的重大挑戰包括神經影像資料的維度非常高、可用於學習的資料集規模相對較小,以及如何有效結合神經影像和表格資料(例如人口統計資訊和臨床特徵)。本文根據兩種策略評估了多種解決方案。第一種是使用總結 MRI 掃描的 2D 影像。第二種是選擇有助於提高分類精確度的關鍵特徵。此外,我們引入了在結合從 MRI 中提取的感興趣區域與表格資料的符號表示的影像上訓練卷積神經網路 (CNN) 的新穎方法。我們評估了一系列 CNN 架構(2D 和 3D),這些架構在 MRI 和表格資料的不同表示上進行訓練,以預測中風後口述圖片描述能力的綜合測量是否在失語症或非失語症範圍內。MRI 和表格資料來自 758 名參與 PLORAS 研究的英語中風倖存者。僅針對病灶大小的基線邏輯迴歸分類準確度為 0.678,當依序加入初始症狀嚴重程度和恢復時間時,上升至 0.757 和 0.813。在從每個 MRI 掃描中提取 8 個感興趣區域並在 2D 殘差神經網路中與病灶大小、初始嚴重程度和恢復時間結合時,觀察到最高的分類準確度 0.854。我們的研究結果展示了如何將影像和表格資料結合起來以獲得高於中風後分類準確度,即使在機器學習術語中資料集很小的情況下也是如此。最後,我們提出如何改進目前的模型,以使用來自醫院掃描儀的影像來實現更高的準確度。

Trainable Noise Model as an XAI evaluation method: application on Sobol for remote sensing image segmentation

2310.01828v2 by Hossein Shreim, Abdul Karim Gizzini, Ali J. Ghandour

eXplainable Artificial Intelligence (XAI) has emerged as an essential requirement when dealing with mission-critical applications, ensuring transparency and interpretability of the employed black box AI models. The significance of XAI spans various domains, from healthcare to finance, where understanding the decision-making process of deep learning algorithms is essential. Most AI-based computer vision models are often black boxes; hence, providing explainability of deep neural networks in image processing is crucial for their wide adoption and deployment in medical image analysis, autonomous driving, and remote sensing applications. Recently, several XAI methods for image classification tasks have been introduced. On the contrary, image segmentation has received comparatively less attention in the context of explainability, although it is a fundamental task in computer vision applications, especially in remote sensing. Only some research proposes gradient-based XAI algorithms for image segmentation. This paper adapts the recent gradient-free Sobol XAI method for semantic segmentation. To measure the performance of the Sobol method for segmentation, we propose a quantitative XAI evaluation method based on a learnable noise model. The main objective of this model is to induce noise on the explanation maps, where higher induced noise signifies low accuracy and vice versa. A benchmark analysis is conducted to evaluate and compare performance of three XAI methods, including Seg-Grad-CAM, Seg-Grad-CAM++ and Seg-Sobol using the proposed noise-based evaluation technique. This constitutes the first attempt to run and evaluate XAI methods using high-resolution satellite images.

摘要:可解釋人工智慧 (XAI) 已成為處理任務關鍵應用程式時的一項基本需求,確保採用黑盒 AI 模型的透明度和可解釋性。XAI 的重要性涵蓋從醫療保健到金融的各種領域,在這些領域中,了解深度學習演算法的決策制定過程至關重要。大多數基於 AI 的電腦視覺模型通常是黑盒子;因此,在影像處理中提供深度神經網路的可解釋性對於其在醫學影像分析、自動駕駛和遙測應用中的廣泛採用和部署至關重要。最近,已針對影像分類任務引入了多種 XAI 方法。相反地,影像分割在可解釋性的背景下受到的關注相對較少,儘管它是電腦視覺應用中的一項基本任務,特別是在遙測中。只有部分研究提出用於影像分割的基於梯度的 XAI 演算法。本文改編了最近的無梯度 Sobol XAI 方法以進行語意分割。為了衡量 Sobol 方法在分割中的效能,我們提出了一種基於可學習雜訊模型的定量 XAI 評估方法。此模型的主要目的是在解釋圖上誘發雜訊,其中較高的誘發雜訊表示較低的準確度,反之亦然。進行基準分析以評估和比較三種 XAI 方法的效能,包括 Seg-Grad-CAM、Seg-Grad-CAM++ 和 Seg-Sobol,並使用所提出的基於雜訊的評估技術。這構成了使用高解析度衛星影像執行和評估 XAI 方法的首次嘗試。

Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI

2311.01463v1 by Muhammad Aurangzeb Ahmad, Ilker Yaramis, Taposh Dutta Roy

Large language models have proliferated across multiple domains in as short period of time. There is however hesitation in the medical and healthcare domain towards their adoption because of issues like factuality, coherence, and hallucinations. Give the high stakes nature of healthcare, many researchers have even cautioned against its usage until these issues are resolved. The key to the implementation and deployment of LLMs in healthcare is to make these models trustworthy, transparent (as much possible) and explainable. In this paper we describe the key elements in creating reliable, trustworthy, and unbiased models as a necessary condition for their adoption in healthcare. Specifically we focus on the quantification, validation, and mitigation of hallucinations in the context in healthcare. Lastly, we discuss how the future of LLMs in healthcare may look like.

摘要:大型語言模型在短時間內已在多個領域中大量激增。然而,由於事實性、連貫性和幻覺等問題,醫療和保健領域對其採用猶豫不決。鑑於醫療保健的高風險性質,許多研究人員甚至警告不要使用它,直到這些問題得到解決。在醫療保健中實施和部署 LLM 的關鍵是使這些模型值得信賴、透明(盡可能多)且可解釋。在本文中,我們描述了建立可靠、值得信賴和無偏見模型的關鍵要素,作為它們在醫療保健中得到採用的必要條件。具體來說,我們專注於在醫療保健背景下對幻覺進行量化、驗證和緩解。最後,我們討論了 LLM 在醫療保健中的未來可能是什麼樣子。

When to Trust AI: Advances and Challenges for Certification of Neural Networks

2309.11196v1 by Marta Kwiatkowska, Xiyue Zhang

Artificial intelligence (AI) has been advancing at a fast pace and it is now poised for deployment in a wide range of applications, such as autonomous systems, medical diagnosis and natural language processing. Early adoption of AI technology for real-world applications has not been without problems, particularly for neural networks, which may be unstable and susceptible to adversarial examples. In the longer term, appropriate safety assurance techniques need to be developed to reduce potential harm due to avoidable system failures and ensure trustworthiness. Focusing on certification and explainability, this paper provides an overview of techniques that have been developed to ensure safety of AI decisions and discusses future challenges.

摘要:人工智慧(AI)已快速進步,現已準備部署於廣泛的應用程式中,例如自主系統、醫療診斷和自然語言處理。及早採用 AI 技術於實際應用程式並非沒有問題,特別是對於神經網路,它可能不穩定且容易受到對抗性範例的影響。從長遠來看,需要開發適當的安全保證技術,以減少因可避免的系統故障而造成的潛在傷害,並確保可信賴性。本文著重於認證和可解釋性,概述了已開發用於確保 AI 決策安全的技術,並討論未來的挑戰。