Medical explainable AI
Medical explainable AI
Publish Date | Title | Authors | Homepage | Code |
---|---|---|---|---|
2025-02-21 | A Knowledge Distillation-Based Approach to Enhance Transparency of Classifier Models | Yuchen Jiang et.al. | 2502.15959v1 | link |
2025-02-21 | ML-Driven Approaches to Combat Medicare Fraud: Advances in Class Imbalance Solutions, Feature Engineering, Adaptive Learning, and Business Impact | Dorsa Farahmandazad et.al. | 2502.15898v1 | null |
2025-02-20 | Utilizing AI and Machine Learning for Predictive Analysis of Post-Treatment Cancer Recurrence | Muhammad Umer Qayyum et.al. | 2502.15825v1 | null |
2025-02-19 | Towards a perturbation-based explanation for medical AI as differentiable programs | Takeshi Abe et.al. | 2502.14001v1 | null |
2025-02-14 | 3D ReX: Causal Explanations in 3D Neuroimaging Classification | Melane Navaratnarajah et.al. | 2502.12181v1 | null |
2025-02-13 | Data2Concept2Text: An Explainable Multilingual Framework for Data Analysis Narration | Flavio Bertini et.al. | 2502.09218v1 | null |
2025-02-10 | Foundation Model of Electronic Medical Records for Adaptive Risk Estimation | Pawel Renc et.al. | 2502.06124v1 | null |
2025-01-27 | An Explainable Disease Surveillance System for Early Prediction of Multiple Chronic Diseases | Shaheer Ahmad Khan et.al. | 2501.15969v1 | null |
2025-01-23 | Ensuring Medical AI Safety: Explainable AI-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data | Frederik Pahde et.al. | 2501.13818v1 | link |
2025-01-19 | Enhanced Suicidal Ideation Detection from Social Media Using a CNN-BiLSTM Hybrid Model | Mohaiminul Islam Bhuiyan et.al. | 2501.11094v1 | null |
2025-01-17 | SEANN: A Domain-Informed Neural Network for Epidemiological Insights | Jean-Baptiste Guimbaud et.al. | 2501.10273v1 | null |
2025-01-16 | Artificial Intelligence-Driven Clinical Decision Support Systems | Muhammet Alkan et.al. | 2501.09628v2 | null |
2025-01-12 | MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis | Sadia Kamal et.al. | 2501.06887v1 | null |
2025-01-06 | Explaining Humour Style Classifications: An XAI Approach to Understanding Computational Humour Analysis | Mary Ogbuka Kenneth et.al. | 2501.02891v1 | null |
2024-12-28 | The Emotional Spectrum of LLMs: Leveraging Empathy and Emotion-Based Markers for Mental Health Support | Alessandro De Grandi et.al. | 2412.20068v1 | null |
2024-12-27 | A Review on the Integration of Artificial Intelligence and Medical Imaging in IVF Ovarian Stimulation | Jana Zakall et.al. | 2412.19688v1 | null |
2024-12-23 | Enhancing Cancer Diagnosis with Explainable & Trustworthy Deep Learning Models | Badaru I. Olumuyiwa et.al. | 2412.17527v1 | null |
2024-12-20 | Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | Hasan Md Tusfiqur Alam et.al. | 2412.16086v2 | link |
2024-12-20 | Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models | Shamus Sim et.al. | 2412.15748v1 | null |
2024-12-18 | Cognition Chain for Explainable Psychological Stress Detection on Social Media | Xin Wang et.al. | 2412.14009v1 | null |
2024-11-30 | 2-Factor Retrieval for Improved Human-AI Decision Making in Radiology | Jim Solomon et.al. | 2412.00372v1 | null |
2024-11-28 | Mapping Public Perception of Artificial Intelligence: Expectations, Risk-Benefit Tradeoffs, and Value As Determinants for Societal Acceptance | Philipp Brauner et.al. | 2411.19356v1 | null |
2024-11-26 | Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset | Yujie Dai et.al. | 2411.17645v2 | null |
2024-11-18 | Exploring the Requirements of Clinicians for Explainable AI Decision Support Systems in Intensive Care | Jeffrey N. Clark et.al. | 2411.11774v1 | null |
2024-11-15 | Artificial Intelligence in Pediatric Echocardiography: Exploring Challenges, Opportunities, and Clinical Applications with Explainable AI and Federated Learning | Mohammed Yaseen Jabarulla et.al. | 2411.10255v1 | null |
2024-11-01 | Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering | Mehdi Hosseini Chagahi et.al. | 2411.00916v2 | null |
2024-10-25 | A Review of Deep Learning Approaches for Non-Invasive Cognitive Impairment Detection | Muath Alsuhaibani et.al. | 2410.19898v1 | null |
2024-10-23 | An Ontology-Enabled Approach For User-Centered and Knowledge-Enabled Explanations of AI Systems | Shruthi Chari et.al. | 2410.17504v1 | link |
2024-10-22 | Contrasting Attitudes Towards Current and Future AI Applications for Computerised Interpretation of ECG: A Clinical Stakeholder Interview Study | Lukas Hughes-Noehrer et.al. | 2410.16879v1 | null |
2024-10-19 | Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer | Gesa Mittmann et.al. | 2410.15012v1 | null |
2024-10-15 | Explainable AI Methods for Multi-Omics Analysis: A Survey | Ahmad Hussein et.al. | 2410.11910v1 | null |
2024-10-14 | Study on the Helpfulness of Explainable Artificial Intelligence | Tobias Labarta et.al. | 2410.11896v1 | link |
2024-10-12 | Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health | Abdullah Mamun et.al. | 2410.09635v1 | link |
2024-10-10 | Artificial intelligence techniques in inherited retinal diseases: A review | Han Trinh et.al. | 2410.09105v1 | null |
2024-10-07 | CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures | Ekaterina Sviridova et.al. | 2410.05235v2 | link |
2024-10-01 | Explainable Diagnosis Prediction through Neuro-Symbolic Integration | Qiuhao Lu et.al. | 2410.01855v2 | null |
2024-10-01 | Easydiagnos: a framework for accurate feature selection for automatic diagnosis in smart healthcare | Prasenjit Maji et.al. | 2410.00366v1 | null |
2024-09-20 | Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: eye-tracking study | Tirtha Chanda et.al. | 2409.13476v1 | null |
2024-09-19 | Explainable AI for Autism Diagnosis: Identifying Critical Brain Regions Using fMRI Data | Suryansh Vidya et.al. | 2409.15374v1 | null |
2024-09-19 | Improving Prototypical Parts Abstraction for Case-Based Reasoning Explanations Designed for the Kidney Stone Type Recognition | Daniel Flores-Araiza et.al. | 2409.12883v1 | null |
2024-09-18 | Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques | Yubo Li et.al. | 2409.12087v3 | null |
2024-09-13 | Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases | Mercy Asiedu et.al. | 2409.09201v3 | null |
2024-09-09 | Explainable AI: Definition and attributes of a good explanation for health AI | Evangelia Kyrimi et.al. | 2409.15338v1 | null |
2024-08-30 | Exploring the Effect of Explanation Content and Format on User Comprehension and Trust in Healthcare | Antonio Rago et.al. | 2408.17401v2 | null |
2024-08-29 | A Survey for Large Language Models in Biomedicine | Chong Wang et.al. | 2409.00133v1 | null |
2024-08-27 | Aligning XAI with EU Regulations for Smart Biomedical Devices: A Methodology for Compliance Analysis | Francesco Sovrano et.al. | 2408.15121v1 | null |
2024-08-24 | Towards Case-based Interpretability for Medical Federated Learning | Laura Latorre et.al. | 2408.13626v1 | null |
2024-08-22 | AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines | Douwe J. Spaanderman et.al. | 2408.12491v1 | null |
2024-08-14 | Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy | Kimji N. Pellano et.al. | 2409.00001v1 | null |
2024-08-06 | MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy | Hanchen David Wang et.al. | 2408.11837v1 | null |
2024-08-05 | The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development | Joshua Morriss et.al. | 2408.05239v1 | null |
2024-08-05 | Enhancing Medical Learning and Reasoning Systems: A Boxology-Based Comparative Analysis of Design Patterns | Chi Him Ng et.al. | 2408.02709v1 | null |
2024-08-05 | Bayesian Kolmogorov Arnold Networks (Bayesian_KANs): A Probabilistic Approach to Enhance Accuracy and Interpretability | Masoud Muhammed Hassan et.al. | 2408.02706v1 | null |
2024-07-26 | MLtoGAI: Semantic Web based with Machine Learning for Enhanced Disease Prediction and Personalized Recommendations using Generative AI | Shyam Dongre et.al. | 2407.20284v1 | null |
2024-07-25 | Introducing δ-XAI: a novel sensitivity-based method for local AI explanations | Alessandro De Carlo et.al. | 2407.18343v2 | null |
2024-07-24 | Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population | Nikolaos Ntampakis et.al. | 2407.17324v2 | null |
2024-07-24 | Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition | Michele Fiori et.al. | 2408.06352v1 | null |
2024-07-21 | Explainable AI-based Intrusion Detection System for Industry 5.0: An Overview of the Literature, associated Challenges, the existing Solutions, and Potential Research Directions | Naseem Khan et.al. | 2408.03335v1 | null |
2024-07-18 | A Comparative Study on Automatic Coding of Medical Letters with Explainability | Jamie Glen et.al. | 2407.13638v1 | link |
2024-07-09 | Explainable AI for Enhancing Efficiency of DL-based Channel Estimation | Abdul Karim Gizzini et.al. | 2407.07009v1 | null |
2024-07-07 | Explainable AI: Comparative Analysis of Normal and Dilated ResNet Models for Fundus Disease Classification | P. N. Karthikayan et.al. | 2407.05440v2 | null |
2024-07-03 | A Survey on Trustworthiness in Foundation Models for Medical Image Analysis | Congzhen Shi et.al. | 2407.15851v2 | null |
2024-07-01 | The Impact of an XAI-Augmented Approach on Binary Classification with Scarce Data | Ximing Wen et.al. | 2407.06206v1 | null |
2024-06-28 | Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach | Sai Krishna Revanth Vuruma et.al. | 2407.00167v1 | null |
2024-06-25 | Towards Compositional Interpretability for XAI | Sean Tull et.al. | 2406.17583v1 | null |
2024-06-17 | Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods | Vincent Olesen et.al. | 2406.12142v2 | link |
2024-06-11 | Unlocking the Potential of Metaverse in Innovative and Immersive Digital Health | Fatemeh Ebrahimzadeh et.al. | 2406.07114v2 | null |
2024-06-10 | AI-Driven Predictive Analytics Approach for Early Prognosis of Chronic Kidney Disease Using Ensemble Learning and Explainable AI | K M Tawsik Jawad et.al. | 2406.06728v2 | null |
2024-06-10 | Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook | Yusif Ibrahimov et.al. | 2406.05984v1 | null |
2024-06-09 | Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance | Zhan Zhang et.al. | 2406.05746v1 | null |
2024-06-07 | Advancing Histopathology-Based Breast Cancer Diagnosis: Insights into Multi-Modality and Explainability | Faseela Abdullakutty et.al. | 2406.12897v1 | null |
2024-06-04 | Using Explainable AI for EEG-based Reduced Montage Neonatal Seizure Detection | Dinuka Sandun Udayantha et.al. | 2406.16908v3 | link |
2024-06-01 | Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques | Samita Bai et.al. | 2406.00532v1 | null |
2024-06-01 | Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition | Alaa Nfissi et.al. | 2406.01624v2 | link |
2024-05-31 | The Explanation Necessity for Healthcare AI | Michail Mamalakis et.al. | 2406.00216v1 | null |
2024-05-29 | Interdisciplinary Expertise to Advance Equitable Explainable AI | Chloe R. Bennett et.al. | 2406.18563v1 | null |
2024-05-27 | "It depends": Configuring AI to Improve Clinical Usefulness Across Contexts | Hubert D. Zając et.al. | 2407.11978v1 | null |
2024-05-26 | Improving Health Professionals' Onboarding with AI and XAI for Trustworthy Human-AI Collaborative Decision Making | Min Hun Lee et.al. | 2405.16424v1 | null |
2024-05-26 | Exploring Nutritional Impact on Alzheimer's Mortality: An Explainable AI Approach | Ziming Liu et.al. | 2405.17502v1 | null |
2024-05-24 | Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone | Catalina Gomez et.al. | 2407.11974v1 | null |
2024-05-23 | Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery | Yingying Fang et.al. | 2406.18552v1 | null |
2024-05-21 | The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach | Mohsen Jozani et.al. | 2405.13099v1 | null |
2024-05-17 | ChatGPT in Classrooms: Transforming Challenges into Opportunities in Education | Harris Bin Munawar et.al. | 2405.10645v1 | null |
2024-05-13 | Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data | Camelia Oprea et.al. | 2405.07590v1 | null |
2024-05-10 | XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare | Fatemeh Nazary et.al. | 2405.06270v3 | null |
2024-05-09 | To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems | Miquel Miró-Nicolau et.al. | 2405.05766v1 | null |
2024-05-05 | Region-specific Risk Quantification for Interpretable Prognosis of COVID-19 | Zhusi Zhong et.al. | 2405.02815v1 | link |
2024-04-26 | Rad4XCNN: a new agnostic method for post-hoc global explanation of CNN-derived features by means of radiomics | Francesco Prinzi et.al. | 2405.02334v2 | null |
2024-04-25 | Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability | Yunfei Ge et.al. | 2404.16957v1 | null |
2024-04-19 | Explainable AI for Fair Sepsis Mortality Predictive Model | Chia-Hsuan Chang et.al. | 2404.13139v1 | null |
2024-04-19 | Multi Class Depression Detection Through Tweets using Artificial Intelligence | Muhammad Osama Nusrat et.al. | 2404.13104v1 | link |
2024-04-19 | COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images | Dmytro Shvetsov et.al. | 2404.12832v2 | link |
2024-04-15 | Hybrid Intelligence for Digital Humanities | Victor de Boer et.al. | 2406.15374v1 | null |
2024-04-14 | Ethical Framework for Responsible Foundational Models in Medical Imaging | Abhijit Das et.al. | 2406.11868v1 | null |
2024-04-09 | Advancements in Radiomics and Artificial Intelligence for Thyroid Cancer Diagnosis | Milad Yousefi et.al. | 2404.07239v1 | null |
2024-04-06 | Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI | Taminul Islam et.al. | 2404.04686v1 | null |
2024-04-05 | Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI | Maryam Ahmed et.al. | 2404.03892v3 | null |
2024-03-30 | Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives | Xingrui Gu et.al. | 2404.00320v2 | null |
2024-03-26 | Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach | Andrea Ferrario et.al. | 2403.17873v1 | null |
2024-03-26 | Clinical Domain Knowledge-Derived Template Improves Post Hoc AI Explanations in Pneumothorax Classification | Han Yuan et.al. | 2403.18871v1 | link |
Abstracts
A Knowledge Distillation-Based Approach to Enhance Transparency of Classifier Models
2502.15959v1 by Yuchen Jiang, Xinyuan Zhao, Yihang Wu, Ahmad Chaddad
With the rapid development of artificial intelligence (AI), especially in the medical field, the need for its explainability has grown. In medical image analysis, a high degree of transparency and model interpretability can help clinicians better understand and trust the decision-making process of AI models. In this study, we propose a Knowledge Distillation (KD)-based approach that aims to enhance the transparency of the AI model in medical image analysis. The initial step is to use traditional CNN to obtain a teacher model and then use KD to simplify the CNN architecture, retain most of the features of the data set, and reduce the number of network layers. It also uses the feature map of the student model to perform hierarchical analysis to identify key features and decision-making processes. This leads to intuitive visual explanations. We selected three public medical data sets (brain tumor, eye disease, and Alzheimer's disease) to test our method. It shows that even when the number of layers is reduced, our model provides a remarkable result in the test set and reduces the time required for the interpretability analysis.
摘要:隨著人工智慧 (AI) 的快速發展,特別是在醫療領域中,對於其可解釋性的需求也日益增長。在醫學影像分析中,高度的透明度和模型可解釋性可以幫助臨床醫生更了解並信賴 AI 模型的決策過程。在本研究中,我們提出了一個基於知識蒸餾 (KD) 的方法,旨在增強 AI 模型在醫學影像分析中的透明度。第一步是使用傳統的 CNN 來獲得一個教師模型,然後使用 KD 來簡化 CNN 架構,保留資料集的大部分特徵,並減少網路層數。它還使用學生模型的特徵圖來執行階層分析,以識別關鍵特徵和決策過程。這會產生直觀的視覺解釋。我們選擇了三個公開的醫學資料集(腦瘤、眼疾和阿茲海默症)來測試我們的模型。結果顯示,即使在減少層數的情況下,我們的模型在測試集中也提供了顯著的結果,並減少了可解釋性分析所需的時間。
ML-Driven Approaches to Combat Medicare Fraud: Advances in Class Imbalance Solutions, Feature Engineering, Adaptive Learning, and Business Impact
2502.15898v1 by Dorsa Farahmandazad, Kasra Danesh
Medicare fraud poses a substantial challenge to healthcare systems, resulting in significant financial losses and undermining the quality of care provided to legitimate beneficiaries. This study investigates the use of machine learning (ML) to enhance Medicare fraud detection, addressing key challenges such as class imbalance, high-dimensional data, and evolving fraud patterns. A dataset comprising inpatient claims, outpatient claims, and beneficiary details was used to train and evaluate five ML models: Random Forest, KNN, LDA, Decision Tree, and AdaBoost. Data preprocessing techniques included resampling SMOTE method to address the class imbalance, feature selection for dimensionality reduction, and aggregation of diagnostic and procedural codes. Random Forest emerged as the best-performing model, achieving a training accuracy of 99.2% and validation accuracy of 98.8%, and F1-score (98.4%). The Decision Tree also performed well, achieving a validation accuracy of 96.3%. KNN and AdaBoost demonstrated moderate performance, with validation accuracies of 79.2% and 81.1%, respectively, while LDA struggled with a validation accuracy of 63.3% and a low recall of 16.6%. The results highlight the importance of advanced resampling techniques, feature engineering, and adaptive learning in detecting Medicare fraud effectively. This study underscores the potential of machine learning in addressing the complexities of fraud detection. Future work should explore explainable AI and hybrid models to improve interpretability and performance, ensuring scalable and reliable fraud detection systems that protect healthcare resources and beneficiaries.
摘要:
Utilizing AI and Machine Learning for Predictive Analysis of Post-Treatment Cancer Recurrence
2502.15825v1 by Muhammad Umer Qayyum, Muhammad Fahad, Nasrullah Abbasi
In oncology, recurrence after treatment is one of the major challenges, related to patients' survival and quality of life. Conventionally, prediction of cancer relapse has always relied on clinical observation with statistical model support, which almost fails to explain the complex, multifactorial nature of tumor recurrence. This research explores how AI and ML models may increase the accuracy and reliability of recurrence prediction in cancer. Therefore, AI and ML create new opportunities not only for personalized medicine but also for proactive management of patients through analyzing large volumes of data on genetics, clinical manifestations, and treatment. The paper describes the various AI and ML techniques for pattern identification and outcome prediction in cancer patients using supervised and unsupervised learning. Clinical implications provide an opportunity to review how early interventions could happen and the design of treatment planning.
摘要:在腫瘤學中,治療後的復發是主要的挑戰之一, 與患者的存活率和生活品質相關。傳統上, 癌症復發的預測一直依賴於統計模型支持的臨床觀察,這幾乎無法解釋腫瘤復發的複雜多因素性質。本研究探討了 AI 和 ML 模型如何提高癌症復發預測的準確性和可靠性。因此,AI 和 ML 不僅為個人化醫療創造了新的機會,還通過分析大量遺傳學、臨床表現和治療數據來創造了對患者的積極管理。本文描述了各種 AI 和 ML 技術,用於癌症患者的模式識別和結果預測,使用監督式和非監督式學習。臨床意義提供了回顧早期干預如何發生和治療計劃設計的機會。
Towards a perturbation-based explanation for medical AI as differentiable programs
2502.14001v1 by Takeshi Abe, Yoshiyuki Asai
Recent advancement in machine learning algorithms reaches a point where medical devices can be equipped with artificial intelligence (AI) models for diagnostic support and routine automation in clinical settings. In medicine and healthcare, there is a particular demand for sufficient and objective explainability of the outcome generated by AI models. However, AI models are generally considered as black boxes due to their complexity, and the computational process leading to their response is often opaque. Although several methods have been proposed to explain the behavior of models by evaluating the importance of each feature in discrimination and prediction, they may suffer from biases and opacities arising from the scale and sampling protocol of the dataset used for training or testing. To overcome the shortcomings of existing methods, we explore an alternative approach to provide an objective explanation of AI models that can be defined independently of the learning process and does not require additional data. As a preliminary study for this direction of research, this work examines a numerical availability of the Jacobian matrix of deep learning models that measures how stably a model responses against small perturbations added to the input. The indicator, if available, are calculated from a trained AI model for a given target input. This is a first step towards a perturbation-based explanation, which will assist medical practitioners in understanding and interpreting the response of the AI model in its clinical application.
摘要:機器學習演算法的最新進展已達到一個階段,醫療裝置可以配備人工智慧 (AI) 模型,以在臨床環境中提供診斷支援和例行自動化。在醫學和保健領域,對於 AI 模型產生的結果有足夠且客觀的可解釋性有特別的需求。然而,由於 AI 模型的複雜性,它們通常被視為黑盒子,而導致其反應的運算過程通常是不透明的。儘管已經提出多種方法來解釋模型的行為,方法是評估每個特徵在判別和預測中的重要性,但它們可能會受到訓練或測試所用資料集的規模和抽樣協定的偏差和不透明性的影響。為了克服現有方法的缺點,我們探索一種替代方法,以提供 AI 模型的客觀解釋,這種方法可以獨立於學習過程定義,而且不需要額外的資料。作為這個研究方向的初步研究,這項工作探討了深度學習模型的雅可比矩陣的數值可用性,它衡量了模型對輸入中新增的小擾動的穩定反應程度。如果可用,指標會從訓練好的 AI 模型計算得出,以取得給定的目標輸入。這是基於擾動的解釋的第一步,它將協助醫療從業人員了解和詮釋 AI 模型在其臨床應用中的反應。
3D ReX: Causal Explanations in 3D Neuroimaging Classification
2502.12181v1 by Melane Navaratnarajah, Sophie A. Martin, David A. Kelly, Nathan Blake, Hana Chocker
Explainability remains a significant problem for AI models in medical imaging, making it challenging for clinicians to trust AI-driven predictions. We introduce 3D ReX, the first causality-based post-hoc explainability tool for 3D models. 3D ReX uses the theory of actual causality to generate responsibility maps which highlight the regions most crucial to the model's decision. We test 3D ReX on a stroke detection model, providing insight into the spatial distribution of features relevant to stroke.
摘要:解釋性仍然是醫療影像中 AI 模型的一大問題,這使得臨床醫生難以信任 AI 驅動的預測。 我們引入了 3D ReX,這是第一個用於 3D 模型的基於因果關係的事後解釋性工具。3D ReX 使用實際因果關係理論來生成責任圖,該圖突出了對模型決策至關重要的區域。我們在中風檢測模型上測試了 3D ReX,提供了與中風相關特徵的空間分佈的見解。
Data2Concept2Text: An Explainable Multilingual Framework for Data Analysis Narration
2502.09218v1 by Flavio Bertini, Alessandro Dal Palù, Federica Zaglio, Francesco Fabiano, Andrea Formisano
This paper presents a complete explainable system that interprets a set of data, abstracts the underlying features and describes them in a natural language of choice. The system relies on two crucial stages: (i) identifying emerging properties from data and transforming them into abstract concepts, and (ii) converting these concepts into natural language. Despite the impressive natural language generation capabilities demonstrated by Large Language Models, their statistical nature and the intricacy of their internal mechanism still force us to employ these techniques as black boxes, forgoing trustworthiness. Developing an explainable pipeline for data interpretation would allow facilitating its use in safety-critical environments like processing medical information and allowing non-experts and visually impaired people to access narrated information. To this end, we believe that the fields of knowledge representation and automated reasoning research could present a valid alternative. Expanding on prior research that tackled the first stage (i), we focus on the second stage, named Concept2Text. Being explainable, data translation is easily modeled through logic-based rules, once again emphasizing the role of declarative programming in achieving AI explainability. This paper explores a Prolog/CLP-based rewriting system to interpret concepts-articulated in terms of classes and relations, plus common knowledge-derived from a generic ontology, generating natural language text. Its main features include hierarchical tree rewritings, modular multilingual generation, support for equivalent variants across semantic, grammar, and lexical levels, and a transparent rule-based system. We outline the architecture and demonstrate its flexibility through some examples capable of generating numerous diverse and equivalent rewritings based on the input concept.
摘要:
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation
2502.06124v1 by Pawel Renc, Michal K. Grzeszczyk, Nassim Oufattole, Deirdre Goode, Yugang Jia, Szymon Bieganski, Matthew B. A. McDermott, Jaroslaw Was, Anthony E. Samir, Jonathan W. Cunningham, David W. Bates, Arkadiusz Sitek
We developed the Enhanced Transformer for Health Outcome Simulation (ETHOS), an AI model that tokenizes patient health timelines (PHTs) from EHRs. ETHOS predicts future PHTs using transformer-based architectures. The Adaptive Risk Estimation System (ARES) employs ETHOS to compute dynamic and personalized risk probabilities for clinician-defined critical events. ARES incorporates a personalized explainability module that identifies key clinical factors influencing risk estimates for individual patients. ARES was evaluated on the MIMIC-IV v2.2 dataset in emergency department (ED) settings, benchmarking its performance against traditional early warning systems and machine learning models. We processed 299,721 unique patients from MIMIC-IV into 285,622 PHTs, with 60% including hospital admissions. The dataset contained over 357 million tokens. ETHOS outperformed benchmark models in predicting hospital admissions, ICU admissions, and prolonged hospital stays, achieving superior AUC scores. ETHOS-based risk estimates demonstrated robustness across demographic subgroups with strong model reliability, confirmed via calibration curves. The personalized explainability module provides insights into patient-specific factors contributing to risk. ARES, powered by ETHOS, advances predictive healthcare AI by providing dynamic, real-time, and personalized risk estimation with patient-specific explainability to enhance clinician trust. Its adaptability and superior accuracy position it as a transformative tool for clinical decision-making, potentially improving patient outcomes and resource allocation in emergency and inpatient settings. We release the full code at github.com/ipolharvard/ethos-ares to facilitate future research.
摘要:我們開發了增強型健康結果模擬轉換器 (ETHOS), 一種從電子健康紀錄 (EHR) 中將患者健康時間軸 (PHT) 標記化的 AI 模型。ETHOS 使用基於轉換器的架構預測未來的 PHT。自適應風險評估系統 (ARES) 使用 ETHOS 計算由臨床醫生定義的危急事件的動態且個人化的風險機率。ARES 結合了個人化的可解釋性模組,可找出影響個別患者風險評估的主要臨床因素。ARES 在急診部門 (ED) 設定中針對 MIMIC-IV v2.2 資料集進行評估,並將其效能與傳統的預警系統和機器學習模型進行基準測試。我們將 299,721 位 MIMIC-IV 的獨特患者處理成 285,622 個 PHT,其中 60% 包含住院記錄。該資料集包含超過 3.57 億個標記。ETHOS 在預測住院、加護病房 (ICU) 住院和延長住院時間方面表現優於基準模型,並獲得了較高的 AUC 分數。基於 ETHOS 的風險評估顯示出跨人口統計子群的穩健性,並通過校準曲線確認了強大的模型可靠性。個人化的可解釋性模組提供了對導致風險的患者特定因素的見解。由 ETHOS 驅動的 ARES 透過提供動態、即時且個人化的風險評估,以及患者特定的可解釋性來增強臨床醫生的信任,從而推動了預測性醫療保健 AI 的發展。其適應性和卓越的準確性使其成為臨床決策制定的一種變革性工具,有可能改善緊急和住院環境中的患者結果和資源分配。我們在 github.com/ipolharvard/ethos-ares 上釋出完整程式碼,以利未來的研究。
An Explainable Disease Surveillance System for Early Prediction of Multiple Chronic Diseases
2501.15969v1 by Shaheer Ahmad Khan, Muhammad Usamah Shahid, Ahmad Abdullah, Ibrahim Hashmat, Muddassar Farooq
This study addresses a critical gap in the healthcare system by developing a clinically meaningful, practical, and explainable disease surveillance system for multiple chronic diseases, utilizing routine EHR data from multiple U.S. practices integrated with CureMD's EMR/EHR system. Unlike traditional systems--using AI models that rely on features from patients' labs--our approach focuses on routinely available data, such as medical history, vitals, diagnoses, and medications, to preemptively assess the risks of chronic diseases in the next year. We trained three distinct models for each chronic disease: prediction models that forecast the risk of a disease 3, 6, and 12 months before a potential diagnosis. We developed Random Forest models, which were internally validated using F1 scores and AUROC as performance metrics and further evaluated by a panel of expert physicians for clinical relevance based on inferences grounded in medical knowledge. Additionally, we discuss our implementation of integrating these models into a practical EMR system. Beyond using Shapley attributes and surrogate models for explainability, we also introduce a new rule-engineering framework to enhance the intrinsic explainability of Random Forests.
摘要:本研究透過開發一個臨床有意義、實用且可解釋的多重慢性疾病疾病監測系統,來解決醫療保健系統中的重大缺口,利用整合 CureMD 的 EMR/EHR 系統,來自多個美國實務的例行 EHR 資料。與傳統系統不同的是,我們的做法著重在例行可得的資料,例如病歷、生命徵象、診斷和藥物,以預先評估未來一年慢性疾病的風險,而非仰賴病患實驗室特徵的 AI 模型。我們針對每種慢性疾病訓練了三個不同的模型:預測模型,用以預測在潛在診斷前 3、6 和 12 個月的疾病風險。我們開發了隨機森林模型,並使用 F1 分數和 AUROC 作為效能指標,進行內部驗證,並進一步由專家醫師小組根據植基於醫學知識的推論,評估其臨床相關性。此外,我們討論了將這些模型整合到實用 EMR 系統中的實作方式。除了使用 Shapley 屬性和代理模型來解釋外,我們還引進了一個新的規則工程架構,以增強隨機森林的內在可解釋性。
Ensuring Medical AI Safety: Explainable AI-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data
2501.13818v1 by Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Detecting and mitigating shortcut behavior is a challenging task that often requires significant labeling efforts from domain experts. To alleviate this problem, we introduce a semi-automated framework for the identification of spurious behavior from both data and model perspective by leveraging insights from eXplainable Artificial Intelligence (XAI). This allows the retrieval of spurious data points and the detection of model circuits that encode the associated prediction rules. Moreover, we demonstrate how these shortcut encodings can be used for XAI-based sample- and pixel-level data annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of our framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks.
摘要:深度神经网络越来越多地用于高风险医疗应用中,尽管它们在存在虚假相关性的情况下倾向于捷径学习,这在实践中可能产生致命的后果。检测和缓解捷径行为是一项艰巨的任务,通常需要领域专家的大量标记工作。为了缓解这个问题,我们引入了一个半自动框架,用于从数据和模型的角度识别虚假行为,方法是利用可解释人工智能 (XAI) 的见解。这允许检索虚假数据点并检测对关联预测规则进行编码的模型电路。此外,我们演示了如何使用这些捷径编码进行基于 XAI 的样本和像素级数据注释,为偏差缓解方法提供有价值的信息,以消除不需要的捷径行为。我们使用跨越两种方式的四个医学数据集展示了我们框架的适用性,这些数据集具有由数据伪像引起的受控和真实世界虚假相关性。我们成功地识别并减轻了 VGG16、ResNet50 和当代 Vision Transformer 模型中的这些偏差,最终提高了它们的鲁棒性和在真实世界医疗任务中的适用性。
Enhanced Suicidal Ideation Detection from Social Media Using a CNN-BiLSTM Hybrid Model
2501.11094v1 by Mohaiminul Islam Bhuiyan, Nur Shazwani Kamarudin, Nur Hafieza Ismail
Suicidal ideation detection is crucial for preventing suicides, a leading cause of death worldwide. Many individuals express suicidal thoughts on social media, offering a vital opportunity for early detection through advanced machine learning techniques. The identification of suicidal ideation in social media text is improved by utilising a hybrid framework that integrates Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM), enhanced with an attention mechanism. To enhance the interpretability of the model's predictions, Explainable AI (XAI) methods are applied, with a particular focus on SHapley Additive exPlanations (SHAP), are incorporated. At first, the model managed to reach an accuracy of 92.81%. By applying fine-tuning and early stopping techniques, the accuracy improved to 94.29%. The SHAP analysis revealed key features influencing the model's predictions, such as terms related to mental health struggles. This level of transparency boosts the model's credibility while helping mental health professionals understand and trust the predictions. This work highlights the potential for improving the accuracy and interpretability of detecting suicidal tendencies, making a valuable contribution to the progress of mental health monitoring systems. It emphasizes the significance of blending powerful machine learning methods with explainability to develop reliable and impactful mental health solutions.
摘要:自殺意念偵測對於預防自殺至關重要,而自殺是全球主要的死亡原因。許多人在社群媒體上表達自殺念頭,這提供了透過進階機器學習技術進行早期偵測的重要機會。透過整合卷積神經網路 (CNN) 和雙向長短期記憶 (BiLSTM) 的混合架構,並加入注意力機制,可以提升在社群媒體文字中辨識自殺意念的能力。為了加強模型預測的可解釋性,我們採用可解釋人工智慧 (XAI) 方法,特別著重於 SHapley 加法解釋 (SHAP)。一開始,模型成功達到 92.81% 的準確度。透過套用微調和早期停止技術,準確度提升至 94.29%。SHAP 分析揭露了影響模型預測的關鍵特徵,例如與心理健康困境相關的詞彙。這種透明度提升了模型的可信度,同時協助心理健康專業人員理解和信賴預測結果。這項工作突顯了提升偵測自殺傾向的準確度和可解釋性的潛力,為心理健康監控系統的進展做出寶貴的貢獻。它強調了將強大的機器學習方法與可解釋性相結合以開發可靠且有影響力的心理健康解決方案的重要性。
SEANN: A Domain-Informed Neural Network for Epidemiological Insights
2501.10273v1 by Jean-Baptiste Guimbaud, Marc Plantevit, Léa Maître, Rémy Cazabet
In epidemiology, traditional statistical methods such as logistic regression, linear regression, and other parametric models are commonly employed to investigate associations between predictors and health outcomes. However, non-parametric machine learning techniques, such as deep neural networks (DNNs), coupled with explainable AI (XAI) tools, offer new opportunities for this task. Despite their potential, these methods face challenges due to the limited availability of high-quality, high-quantity data in this field. To address these challenges, we introduce SEANN, a novel approach for informed DNNs that leverages a prevalent form of domain-specific knowledge: Pooled Effect Sizes (PES). PESs are commonly found in published Meta-Analysis studies, in different forms, and represent a quantitative form of a scientific consensus. By direct integration within the learning procedure using a custom loss, we experimentally demonstrate significant improvements in the generalizability of predictive performances and the scientific plausibility of extracted relationships compared to a domain-knowledge agnostic neural network in a scarce and noisy data setting.
摘要:在流行病學中,傳統的統計方法,例如邏輯迴歸、線性迴歸和其他參數模型通常用於調查預測因子與健康結果之間的關聯。然而,非參數機器學習技術,例如深度神經網路 (DNN),結合可解釋的 AI (XAI) 工具,為這項任務提供了新的機會。儘管這些方法具有潛力,但由於該領域缺乏高品質、高數量資料,因此這些方法面臨挑戰。為了應對這些挑戰,我們引入了 SEANN,這是一種新穎的方法,用於獲取知識的 DNN,它利用了一種流行的領域特定知識形式:彙總效應量 (PES)。PES 通常以不同的形式出現在已發表的 Meta 分析研究中,並代表科學共識的量化形式。通過使用自訂損失函數直接整合在學習程序中,我們以實驗方式證明了預測效能的概括性以及與從缺乏領域知識的神經網路中提取的關係相比,科學合理性的顯著提升,且是在稀少且有雜訊的資料設定中。
Artificial Intelligence-Driven Clinical Decision Support Systems
2501.09628v2 by Muhammet Alkan, Idris Zakariyya, Samuel Leighton, Kaushik Bhargav Sivangi, Christos Anagnostopoulos, Fani Deligianni
As artificial intelligence (AI) becomes increasingly embedded in healthcare delivery, this chapter explores the critical aspects of developing reliable and ethical Clinical Decision Support Systems (CDSS). Beginning with the fundamental transition from traditional statistical models to sophisticated machine learning approaches, this work examines rigorous validation strategies and performance assessment methods, including the crucial role of model calibration and decision curve analysis. The chapter emphasizes that creating trustworthy AI systems in healthcare requires more than just technical accuracy; it demands careful consideration of fairness, explainability, and privacy. The challenge of ensuring equitable healthcare delivery through AI is stressed, discussing methods to identify and mitigate bias in clinical predictive models. The chapter then delves into explainability as a cornerstone of human-centered CDSS. This focus reflects the understanding that healthcare professionals must not only trust AI recommendations but also comprehend their underlying reasoning. The discussion advances in an analysis of privacy vulnerabilities in medical AI systems, from data leakage in deep learning models to sophisticated attacks against model explanations. The text explores privacy-preservation strategies such as differential privacy and federated learning, while acknowledging the inherent trade-offs between privacy protection and model performance. This progression, from technical validation to ethical considerations, reflects the multifaceted challenges of developing AI systems that can be seamlessly and reliably integrated into daily clinical practice while maintaining the highest standards of patient care and data protection.
摘要:隨著人工智慧(AI)在醫療保健服務中日益普及,本章探討了開發可靠且符合道德的臨床決策支援系統 (CDSS) 的關鍵面向。從傳統統計模型轉變到複雜機器學習方法的基本原理開始,這項工作探討了嚴謹的驗證策略和效能評估方法,包括模型校準和決策曲線分析的關鍵角色。本章強調,在醫療保健中建立值得信賴的 AI 系統不僅需要技術準確性;它需要仔細考量公平性、可解釋性和隱私。本章強調了透過 AI 確保公平醫療保健服務的挑戰,並討論了識別和減輕臨床預測模型中偏差的方法。接著,本章深入探討可解釋性作為以人為中心的 CDSS 的基石。這種關注反映了對醫療保健專業人員不僅必須信任 AI 建議,還必須理解其背後推理的理解。討論進展到對醫療 AI 系統中隱私漏洞的分析,從深度學習模型中的資料外洩到針對模型解釋的複雜攻擊。本文探討了隱私保護策略,例如差分隱私和聯合學習,同時承認隱私保護和模型效能之間的固有權衡。從技術驗證到道德考量,這種進展反映了開發 AI 系統的多方面挑戰,這些系統可以無縫且可靠地整合到日常臨床實務中,同時維持最高標準的患者照護和資料保護。
MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis
2501.06887v1 by Sadia Kamal, Tim Oates
As deep learning models gain attraction in medical data, ensuring transparent and trustworthy decision-making is essential. In skin cancer diagnosis, while advancements in lesion detection and classification have improved accuracy, the black-box nature of these methods poses challenges in understanding their decision processes, leading to trust issues among physicians. This study leverages the CLIP (Contrastive Language-Image Pretraining) model, trained on different skin lesion datasets, to capture meaningful relationships between visual features and diagnostic criteria terms. To further enhance transparency, we propose a method called MedGrad E-CLIP, which builds on gradient-based E-CLIP by incorporating a weighted entropy mechanism designed for complex medical imaging like skin lesions. This approach highlights critical image regions linked to specific diagnostic descriptions. The developed integrated pipeline not only classifies skin lesions by matching corresponding descriptions but also adds an essential layer of explainability developed especially for medical data. By visually explaining how different features in an image relates to diagnostic criteria, this approach demonstrates the potential of advanced vision-language models in medical image analysis, ultimately improving transparency, robustness, and trust in AI-driven diagnostic systems.
摘要:随着深度学习模型在医学数据中获得关注,确保透明且值得信赖的决策至关重要。在皮肤癌诊断中,虽然病灶检测和分类的进步提高了准确性,但这些方法的黑盒性质对理解其决策过程构成了挑战,导致医生之间的信任问题。本研究利用在不同皮肤病变数据集上训练的 CLIP(对比语言图像预训练)模型,以捕捉视觉特征和诊断标准术语之间的有意义关系。为了进一步提高透明度,我们提出了一种名为 MedGrad E-CLIP 的方法,该方法通过结合专为皮肤病变等复杂医学影像设计的加权熵机制,建立在基于梯度的 E-CLIP 之上。此方法突出了与特定诊断描述相关联的关键图像区域。开发的集成管道不仅通过匹配相应的描述对皮肤病变进行分类,还添加了一层专门为医学数据开发的基本可解释性。通过直观地解释图像中不同特征与诊断标准的关系,这种方法展示了高级视觉语言模型在医学图像分析中的潜力,最终提高了透明度、稳健性和对人工智能驱动的诊断系统的信任。
Explaining Humour Style Classifications: An XAI Approach to Understanding Computational Humour Analysis
2501.02891v1 by Mary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat
Humour styles can have either a negative or a positive impact on well-being. Given the importance of these styles to mental health, significant research has been conducted on their automatic identification. However, the automated machine learning models used for this purpose are black boxes, making their prediction decisions opaque. Clarity and transparency are vital in the field of mental health. This paper presents an explainable AI (XAI) framework for understanding humour style classification, building upon previous work in computational humour analysis. Using the best-performing single model (ALI+XGBoost) from prior research, we apply comprehensive XAI techniques to analyse how linguistic, emotional, and semantic features contribute to humour style classification decisions. Our analysis reveals distinct patterns in how different humour styles are characterised and misclassified, with particular emphasis on the challenges in distinguishing affiliative humour from other styles. Through detailed examination of feature importance, error patterns, and misclassification cases, we identify key factors influencing model decisions, including emotional ambiguity, context misinterpretation, and target identification. The framework demonstrates significant utility in understanding model behaviour, achieving interpretable insights into the complex interplay of features that define different humour styles. Our findings contribute to both the theoretical understanding of computational humour analysis and practical applications in mental health, content moderation, and digital humanities research.
摘要:幽默風格對幸福感可能產生負面或正面的影響。 鑑於這些風格對心理健康的重要性,已經對其自動識別進行了大量研究。然而,用於此目的的自動機器學習模型是黑盒子,使得其預測決策不透明。清晰度和透明度在心理健康領域至關重要。本文提出了一個可解釋的 AI (XAI) 框架,用於理解幽默風格分類,建立在計算幽默分析的先前工作之上。使用先前研究中表現最好的單一模型 (ALI+XGBoost),我們應用全面的 XAI 技術來分析語言、情緒和語義特徵如何影響幽默風格分類決策。我們的分析揭示了不同幽默風格如何被表徵和錯誤分類的不同模式,特別強調了區分聯屬幽默與其他風格的挑戰。通過仔細檢查特徵重要性、錯誤模式和錯誤分類案例,我們確定了影響模型決策的關鍵因素,包括情緒模糊、情境誤解和目標識別。該框架展示了在理解模型行為方面的顯著效用,實現了對定義不同幽默風格的特徵之間複雜相互作用的可解釋見解。我們的發現有助於計算幽默分析的理論理解和心理健康、內容審核和數字人文研究中的實際應用。
The Emotional Spectrum of LLMs: Leveraging Empathy and Emotion-Based Markers for Mental Health Support
2412.20068v1 by Alessandro De Grandi, Federico Ravenda, Andrea Raballo, Fabio Crestani
The increasing demand for mental health services has highlighted the need for innovative solutions, particularly in the realm of psychological conversational AI, where the availability of sensitive data is scarce. In this work, we explored the development of a system tailored for mental health support with a novel approach to psychological assessment based on explainable emotional profiles in combination with empathetic conversational models, offering a promising tool for augmenting traditional care, particularly where immediate expertise is unavailable. Our work can be divided into two main parts, intrinsecaly connected to each other. First, we present RACLETTE, a conversational system that demonstrates superior emotional accuracy compared to state-of-the-art benchmarks in both understanding users' emotional states and generating empathetic responses during conversations, while progressively building an emotional profile of the user through their interactions. Second, we show how the emotional profiles of a user can be used as interpretable markers for mental health assessment. These profiles can be compared with characteristic emotional patterns associated with different mental disorders, providing a novel approach to preliminary screening and support.
摘要:隨著對心理健康服務需求的增加,凸顯了創新解決方案的需求,特別是在心理對話式人工智慧領域,那裡缺乏敏感資料。在這項工作中,我們探索了開發一個針對心理健康支持的系統,採用一種基於可解釋的情緒特徵的新方法進行心理評估,結合同理心對話模式,提供了一個有前途的工具,用於擴充傳統照護,特別是在無法立即獲得專業知識的情況下。我們的工作可以分為兩個主要部分,彼此內在相關。首先,我們展示了 RACLETTE,一個對話系統,與最先進的基準相比,在理解使用者情緒狀態和在對話中產生同理心回應方面表現出優越的情緒準確性,同時透過他們的互動逐漸建立使用者的情緒特徵。其次,我們展示了使用者的情緒特徵如何可用作心理健康評估的可解釋標記。這些特徵可以與與不同心理疾病相關的典型情緒模式進行比較,提供了一種初步篩選和支持的新方法。
A Review on the Integration of Artificial Intelligence and Medical Imaging in IVF Ovarian Stimulation
2412.19688v1 by Jana Zakall, Birgit Pohn, Antonia Graf, Daniel Kovatchki, Arezoo Borji, Ragib Shahriar Islam, Hossam Haick, Heinz Strohmer, Sepideh Hatamikia
Artificial intelligence (AI) has emerged as a powerful tool to enhance decision-making and optimize treatment protocols in in vitro fertilization (IVF). In particular, AI shows significant promise in supporting decision-making during the ovarian stimulation phase of the IVF process. This review evaluates studies focused on the applications of AI combined with medical imaging in ovarian stimulation, examining methodologies, outcomes, and current limitations. Our analysis of 13 studies on this topic reveals that, reveal that while AI algorithms demonstrated notable potential in predicting optimal hormonal dosages, trigger timing, and oocyte retrieval outcomes, the medical imaging data utilized predominantly came from two-dimensional (2D) ultrasound which mainly involved basic quantifications, such as follicle size and number, with limited use of direct feature extraction or advanced image analysis techniques. This points to an underexplored opportunity where advanced image analysis approaches, such as deep learning, and more diverse imaging modalities, like three-dimensional (3D) ultrasound, could unlock deeper insights. Additionally, the lack of explainable AI (XAI) in most studies raises concerns about the transparency and traceability of AI-driven decisions - key factors for clinical adoption and trust. Furthermore, many studies relied on single-center designs and small datasets, which limit the generalizability of their findings. This review highlights the need for integrating advanced imaging analysis techniques with explainable AI methodologies, as well as the importance of leveraging multicenter collaborations and larger datasets. Addressing these gaps has the potential to enhance ovarian stimulation management, paving the way for efficient, personalized, and data-driven treatment pathways that improve IVF outcomes.
摘要:人工智慧(AI)已成為增強體外受精(IVF)決策制定和優化治療方案的強大工具。特別是,AI 在支持 IVF 過程中卵巢刺激階段的決策制定方面顯示出顯著的前景。本綜述評估了專注於 AI 結合卵巢刺激中的醫學影像應用、檢驗方法、結果和當前限制的研究。我們對 13 項關於此主題的研究分析顯示,雖然 AI 演算法在預測最佳荷爾蒙劑量、觸發時機和卵子取出結果方面表現出顯著的潛力,但所利用的醫學影像數據主要來自於二次元(2D)超音波,而二次元超音波主要涉及基本量化,例如濾泡大小和數量,且有限使用直接特徵提取或進階影像分析技術。這指向一個尚未探索的機會,例如深度學習等進階影像分析方法,以及更多元的影像模式,例如三維(3D)超音波,可以解鎖更深入的見解。此外,大多數研究缺乏可解釋 AI(XAI),這引起了人們對 AI 驅動決策的透明度和可追溯性的擔憂,而透明度和可追溯性是臨床採用和信任的關鍵因素。此外,許多研究依賴於單中心設計和小型數據集,這限制了其發現的普遍性。本綜述強調了將進階影像分析技術與可解釋 AI 方法整合起來的必要性,以及利用多中心合作和大型數據集的重要性。解決這些差距有可能增強卵巢刺激管理,為有效、個人化和數據驅動的治療途徑鋪平道路,進而改善 IVF 結果。
Enhancing Cancer Diagnosis with Explainable & Trustworthy Deep Learning Models
2412.17527v1 by Badaru I. Olumuyiwa, The Anh Han, Zia U. Shamszaman
This research presents an innovative approach to cancer diagnosis and prediction using explainable Artificial Intelligence (XAI) and deep learning techniques. With cancer causing nearly 10 million deaths globally in 2020, early and accurate diagnosis is crucial. Traditional methods often face challenges in cost, accuracy, and efficiency. Our study develops an AI model that provides precise outcomes and clear insights into its decision-making process, addressing the "black box" problem of deep learning models. By employing XAI techniques, we enhance interpretability and transparency, building trust among healthcare professionals and patients. Our approach leverages neural networks to analyse extensive datasets, identifying patterns for cancer detection. This model has the potential to revolutionise diagnosis by improving accuracy, accessibility, and clarity in medical decision-making, possibly leading to earlier detection and more personalised treatment strategies. Furthermore, it could democratise access to high-quality diagnostics, particularly in resource-limited settings, contributing to global health equity. The model's applications extend beyond cancer diagnosis, potentially transforming various aspects of medical decision-making and saving millions of lives worldwide.
摘要:本研究提出了一個創新的癌症診斷和預測方法,使用可解釋的人工智慧 (XAI) 和深度學習技術。由於癌症在 2020 年造成全球近 1,000 萬人死亡,因此早期準確的診斷至關重要。傳統方法通常面臨成本、準確性和效率方面的挑戰。我們的研究開發了一個 AI 模型,它提供精確的結果並清楚地了解其決策過程,解決了深度學習模型的「黑箱」問題。通過採用 XAI 技術,我們增強了解釋性和透明度,在醫療專業人員和患者之間建立信任。我們的做法利用神經網路分析廣泛的數據集,識別癌症檢測模式。這個模型有可能通過提高醫療決策的準確性、可及性和清晰度來革新診斷,可能導致更早的檢測和更個性化的治療策略。此外,它可以使更多人獲得高品質的診斷,特別是在資源有限的環境中,有助於全球健康公平。該模型的應用範圍不僅限於癌症診斷,還可能轉變醫療決策的各個方面,並拯救全球數百萬人的生命。
Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG
2412.16086v2 by Hasan Md Tusfiqur Alam, Devansh Srivastav, Md Abdul Kadir, Daniel Sonntag
Deep learning has advanced medical image classification, but interpretability challenges hinder its clinical adoption. This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports, enhancing clinical relevance, explainability, and transparency. Evaluation of the generated reports using an LLM-as-a-judge confirmed the interpretability and clinical utility of our model's outputs. On the COVID-QU dataset, our model achieved 81% classification accuracy and demonstrated robust report generation performance, with five key metrics ranging between 84% and 90%. This interpretable multi-agent framework bridges the gap between high-performance AI and the explainability required for reliable AI-driven CXR analysis in clinical settings. Our code is available at https://github.com/tifat58/IRR-with-CBM-RAG.git.
摘要:深度學習已提升醫學影像分類,但可解釋性挑戰阻礙其臨床應用。本研究透過使用概念瓶頸模型 (CBM) 和多代理檢索增強生成 (RAG) 系統進行報告生成,來增強胸部 X 光 (CXR) 分類的可解釋性。透過建模視覺特徵與臨床概念之間的關係,我們建立可解釋的概念向量,引導多代理 RAG 系統生成放射報告,增強臨床相關性、可解釋性和透明度。使用 LLM 作為評審員對生成報告進行評估,確認了我們模型輸出的可解釋性和臨床效用。在 COVID-QU 資料集上,我們的模型達到了 81% 的分類準確率,並展示了穩健的報告生成效能,五項關鍵指標介於 84% 至 90% 之間。這個可解釋的多代理架構彌合了高性能 AI 與臨床環境中可靠的 AI 驅動 CXR 分析所需的解釋性之間的差距。我們的程式碼可於 https://github.com/tifat58/IRR-with-CBM-RAG.git 取得。
Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models
2412.15748v1 by Shamus Sim, Tyrone Chen
Background: Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context. In particular, achieving XAI in medical LLMs used in the clinical domain will have a significant impact across the healthcare sector. Results: Therefore, we define the concept of reasoning behaviour in the specific context of medical LLMs. We then categorise and discuss the current state of the art of methods which evaluate reasoning behaviour in medical LLMs. Finally, we propose theoretical frameworks which can empower medical professionals or machine learning engineers to gain insight into the low-level reasoning operations of these previously obscure models. Conclusion: The subsequent increased transparency and trust in medical machine learning models by clinicians as well as patients will accelerate the integration, application as well as further development of medical AI for the healthcare system as a whole
摘要:背景:儘管大型語言模型 (LLM) 目前在醫療領域無所不在,但令人驚訝的是,探討其推理行為的研究卻相當缺乏。我們強調了解推理行為而非高層級的預測準確度非常重要,因為在這種情況下,這等同於可解釋 AI (XAI)。尤其是在臨床領域中使用的醫療 LLM 中實現 XAI,將對整個醫療保健產業產生重大影響。結果:因此,我們在醫療 LLM 的特定背景下定義了推理行為的概念。接著我們分類並探討當前評估醫療 LLM 中推理行為的方法的最新技術。最後,我們提出理論架構,讓醫療專業人員或機器學習工程師得以深入了解這些先前模糊模型的低層級推理運算。結論:臨床醫生和患者對醫療機器學習模型的透明度和信任度隨之提升,將加速醫療 AI 在整個醫療保健系統中的整合、應用和進一步發展。
Cognition Chain for Explainable Psychological Stress Detection on Social Media
2412.14009v1 by Xin Wang, Boyan Gao, Yi Dai, Lei Cao, Liang Zhao, Yibo Yang, David Clifton
Stress is a pervasive global health issue that can lead to severe mental health problems. Early detection offers timely intervention and prevention of stress-related disorders. The current early detection models perform "black box" inference suffering from limited explainability and trust which blocks the real-world clinical application. Thanks to the generative properties introduced by the Large Language Models (LLMs), the decision and the prediction from such models are semi-interpretable through the corresponding description. However, the existing LLMs are mostly trained for general purposes without the guidance of psychological cognitive theory. To this end, we first highlight the importance of prior theory with the observation of performance boosted by the chain-of-thoughts tailored for stress detection. This method termed Cognition Chain explicates the generation of stress through a step-by-step cognitive perspective based on cognitive appraisal theory with a progress pipeline: Stimulus $\rightarrow$ Evaluation $\rightarrow$ Reaction $\rightarrow$ Stress State, guiding LLMs to provide comprehensive reasoning explanations. We further study the benefits brought by the proposed Cognition Chain format by utilising it as a synthetic dataset generation template for LLMs instruction-tuning and introduce CogInstruct, an instruction-tuning dataset for stress detection. This dataset is developed using a three-stage self-reflective annotation pipeline that enables LLMs to autonomously generate and refine instructional data. By instruction-tuning Llama3 with CogInstruct, we develop CogLLM, an explainable stress detection model. Evaluations demonstrate that CogLLM achieves outstanding performance while enhancing explainability. Our work contributes a novel approach by integrating cognitive theories into LLM reasoning processes, offering a promising direction for future explainable AI research.
摘要:壓力是一個普遍的全球性健康問題,可能會導致嚴重的精神 健康問題。早期發現提供及時的干預和預防 壓力相關疾病。目前的早期發現模型執行「黑 盒子」推論,存在可解釋性和信任度有限的問題,阻礙了 現實世界的臨床應用。多虧了大型語言模型 (LLM) 引入的生成屬性,此類 模型的決策和預測通過對應描述具有半可解釋性。然而, 現有的 LLM 主要針對一般用途進行訓練,沒有心理認知理論的指導。為此,我們首先強調 先驗理論的重要性,並觀察到針對壓力檢測量身定制的思想鏈提升了性能。這種方法稱為認知 鏈通過基於認知評估理論的循序漸進的認知視角闡明了壓力的產生,並具有進度管道: 刺激 $\rightarrow$ 評估 $\rightarrow$ 反應 $\rightarrow$ 壓力 狀態,指導 LLM 提供全面的推理解釋。我們進一步 通過將其用作 LLM 指令調整的合成數據集生成模板來研究所提出的認知鏈格式帶來的優點,並介紹 CogInstruct,這是一個針對壓力檢測的指令調整數據集。這個 數據集是使用一個三階段的自省標註管道開發的,使 LLM 能夠自主生成和優化指令數據。通過 使用 CogInstruct 對 Llama3 進行指令調整,我們開發了 CogLLM,這是一個可解釋的 壓力檢測模型。評估表明,CogLLM 在提高可解釋性的同時實現了出色的性能。我們的研究通過將認知理論整合到 LLM 推理過程中,提出了一種新穎的方法, 為未來的可解釋人工智能研究提供了一個有希望的方向。
2-Factor Retrieval for Improved Human-AI Decision Making in Radiology
2412.00372v1 by Jim Solomon, Laleh Jalilian, Alexander Vilesov, Meryl Mathew, Tristan Grogan, Arash Bedayat, Achuta Kadambi
Human-machine teaming in medical AI requires us to understand to what degree a trained clinician should weigh AI predictions. While previous work has shown the potential of AI assistance at improving clinical predictions, existing clinical decision support systems either provide no explainability of their predictions or use techniques like saliency and Shapley values, which do not allow for physician-based verification. To address this gap, this study compares previously used explainable AI techniques with a newly proposed technique termed '2-factor retrieval (2FR)', which is a combination of interface design and search retrieval that returns similarly labeled data without processing this data. This results in a 2-factor security blanket where: (a) correct images need to be retrieved by the AI; and (b) humans should associate the retrieved images with the current pathology under test. We find that when tested on chest X-ray diagnoses, 2FR leads to increases in clinician accuracy, with particular improvements when clinicians are radiologists and have low confidence in their decision. Our results highlight the importance of understanding how different modes of human-AI decision making may impact clinician accuracy in clinical decision support systems.
摘要:人機協作在醫療 AI 中,需要我們理解受過訓練的臨床醫生在多大程度上應重視 AI 預測。雖然先前的研究顯示 AI 輔助在改善臨床預測方面的潛力,但現有的臨床決策支援系統,要不就沒有提供預測的可解釋性,要不就是使用像顯著性和 Shapley 值之類的技術,這些技術不允許基於醫生的驗證。為了解決這個差距,本研究將先前使用的可解釋 AI 技術與一種新提出的稱為「2 因子檢索 (2FR)」的技術進行比較,後者是一種介面設計和搜尋檢索的組合,它會傳回標籤相似的資料,而不會處理這些資料。這會產生一個 2 因子安全機制,其中:(a) 正確的影像需要由 AI 檢索;(b) 人類應將檢索的影像與正在測試中的病理聯想起來。我們發現,當在胸部 X 光診斷上進行測試時,2FR 會提高臨床醫生的準確度,特別是在臨床醫生是放射科醫生且對其決策信心不足時,會有顯著的改善。我們的結果強調了理解人機決策的不同模式如何影響臨床醫生在臨床決策支援系統中的準確性的重要性。
Mapping Public Perception of Artificial Intelligence: Expectations, Risk-Benefit Tradeoffs, and Value As Determinants for Societal Acceptance
2411.19356v1 by Philipp Brauner, Felix Glawe, Gian Luca Liehner, Luisa Vervier, Martina Ziefle
Understanding public perception of artificial intelligence (AI) and the tradeoffs between potential risks and benefits is crucial, as these perceptions might shape policy decisions, influence innovation trajectories for successful market strategies, and determine individual and societal acceptance of AI technologies. Using a representative sample of 1100 participants from Germany, this study examines mental models of AI. Participants quantitatively evaluated 71 statements about AI's future capabilities (e.g., autonomous driving, medical care, art, politics, warfare, and societal divides), assessing the expected likelihood of occurrence, perceived risks, benefits, and overall value. We present rankings of these projections alongside visual mappings illustrating public risk-benefit tradeoffs. While many scenarios were deemed likely, participants often associated them with high risks, limited benefits, and low overall value. Across all scenarios, 96.4% ($r^2=96.4\%$) of the variance in value assessment can be explained by perceived risks ($\beta=-.504$) and perceived benefits ($\beta=+.710$), with no significant relation to expected likelihood. Demographics and personality traits influenced perceptions of risks, benefits, and overall evaluations, underscoring the importance of increasing AI literacy and tailoring public information to diverse user needs. These findings provide actionable insights for researchers, developers, and policymakers by highlighting critical public concerns and individual factors essential to align AI development with individual values.
摘要:
Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset
2411.17645v2 by Yujie Dai, Brian Sullivan, Axel Montout, Amy Dillon, Chris Waller, Peter Acs, Rachel Denholm, Philip Williams, Alastair D Hay, Raul Santos-Rodriguez, Andrew Dowsey
The use of machine learning and AI on electronic health records (EHRs) holds substantial potential for clinical insight. However, this approach faces challenges due to data heterogeneity, sparsity, temporal misalignment, and limited labeled outcomes. In this context, we leverage a linked EHR dataset of approximately one million de-identified individuals from Bristol, North Somerset, and South Gloucestershire, UK, to characterize urinary tract infections (UTIs). We implemented a data pre-processing and curation pipeline that transforms the raw EHR data into a structured format suitable for developing predictive models focused on data fairness, accountability and transparency. Given the limited availability and biases of ground truth UTI outcomes, we introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines. Pairwise XGBoost models are trained using this framework to differentiate UTI risk categories with explainable AI techniques applied to identify key predictors and support interpretability. Our findings reveal differences in clinical and demographic predictors across risk groups. While this study highlights the potential of AI-driven insights to support UTI clinical decision-making, further investigation of patient sub-strata and extensive validation are needed to ensure robustness and applicability in clinical practice.
摘要:電子健康紀錄 (EHR) 中機器學習和 AI 的使用對於臨床見解具有相當大的潛力。然而,由於資料異質性、稀疏性、時間錯位和標籤結果有限,此方法面臨挑戰。在此背景下,我們利用來自英國布里斯托、北薩默塞特和南格洛斯特郡約一百萬名去識別個人連結的 EHR 資料集,來描述尿路感染 (UTI)。我們實施了將原始 EHR 資料轉換為結構化格式的資料前處理和整理管線,適合開發專注於資料公平性、問責制和透明度的預測模型。鑑於 UTI 真實結果的可用性有限和偏差,我們引入了由臨床專業知識告知的 UTI 風險評估架構,以估計個別患者時間軸上的 UTI 風險。成對的 XGBoost 模型使用此架構進行訓練,以區分 UTI 風險類別,並應用可解釋的 AI 技術來識別關鍵預測因子並支持可解釋性。我們的研究結果揭示了不同風險群組在臨床和人口統計預測因子上的差異。雖然這項研究強調了 AI 驅動見解在支援 UTI 臨床決策制定方面的潛力,但仍需要進一步調查患者子群體和廣泛驗證,以確保在臨床實務中的穩健性和適用性。
Exploring the Requirements of Clinicians for Explainable AI Decision Support Systems in Intensive Care
2411.11774v1 by Jeffrey N. Clark, Matthew Wragg, Emily Nielsen, Miquel Perello-Nieto, Nawid Keshtmand, Michael Ambler, Shiv Sharma, Christopher P. Bourdeaux, Amberly Brigden, Raul Santos-Rodriguez
There is a growing need to understand how digital systems can support clinical decision-making, particularly as artificial intelligence (AI) models become increasingly complex and less human-interpretable. This complexity raises concerns about trustworthiness, impacting safe and effective adoption of such technologies. Improved understanding of decision-making processes and requirements for explanations coming from decision support tools is a vital component in providing effective explainable solutions. This is particularly relevant in the data-intensive, fast-paced environments of intensive care units (ICUs). To explore these issues, group interviews were conducted with seven ICU clinicians, representing various roles and experience levels. Thematic analysis revealed three core themes: (T1) ICU decision-making relies on a wide range of factors, (T2) the complexity of patient state is challenging for shared decision-making, and (T3) requirements and capabilities of AI decision support systems. We include design recommendations from clinical input, providing insights to inform future AI systems for intensive care.
摘要:隨著人工智慧 (AI) 模型變得越來越複雜,且越來越難以被人理解,了解數位系統如何支援臨床決策的需求也日益增加。這種複雜性引發了對可信度的疑慮,影響了此類技術的安全且有效採用。改善對決策制定流程的理解,以及對決策支援工具所提供說明的要求,是提供有效可解釋解決方案的重要組成部分。這在資料密集、快節奏的加護病房 (ICU) 環境中特別相關。為了探討這些問題,對七位 ICU 臨床醫師進行了小組訪談,這些醫師代表了不同的角色和經驗層級。主題分析揭露了三個核心主題:(T1) ICU 決策制定依賴於廣泛的因素,(T2) 病患狀態的複雜性對共同決策制定構成挑戰,以及 (T3) AI 決策支援系統的要求和能力。我們納入了臨床輸入的設計建議,提供見解以提供資訊給未來用於加護的 AI 系統。
Artificial Intelligence in Pediatric Echocardiography: Exploring Challenges, Opportunities, and Clinical Applications with Explainable AI and Federated Learning
2411.10255v1 by Mohammed Yaseen Jabarulla, Theodor Uden, Thomas Jack, Philipp Beerbaum, Steffen Oeltze-Jafra
Pediatric heart diseases present a broad spectrum of congenital and acquired diseases. More complex congenital malformations require a differentiated and multimodal decision-making process, usually including echocardiography as a central imaging method. Artificial intelligence (AI) offers considerable promise for clinicians by facilitating automated interpretation of pediatric echocardiography data. However, adapting AI technologies for pediatric echocardiography analysis has challenges such as limited public data availability, data privacy, and AI model transparency. Recently, researchers have focused on disruptive technologies, such as federated learning (FL) and explainable AI (XAI), to improve automatic diagnostic and decision support workflows. This study offers a comprehensive overview of the limitations and opportunities of AI in pediatric echocardiography, emphasizing the synergistic workflow and role of XAI and FL, identifying research gaps, and exploring potential future developments. Additionally, three relevant clinical use cases demonstrate the functionality of XAI and FL with a focus on (i) view recognition, (ii) disease classification, (iii) segmentation of cardiac structures, and (iv) quantitative assessment of cardiac function.
摘要:小兒心臟疾病呈現先天性與後天性疾病的廣泛光譜。較複雜的先天性畸形需要一個差異化且多模式的決策過程,通常包括超音波檢查作為主要的影像方法。人工智慧 (AI) 為臨床醫生提供了相當大的希望,因為它可以促進小兒超音波檢查資料的自動化解讀。然而,將人工智慧技術應用於小兒超音波檢查分析有許多挑戰,例如有限的公開資料可用性、資料隱私和人工智慧模型透明度。最近,研究人員專注於破壞性技術,例如聯合學習 (FL) 和可解釋人工智慧 (XAI),以改善自動診斷和決策支援工作流程。本研究提供了人工智慧在小兒超音波檢查中的限制和機會的全面概述,強調了 XAI 和 FL 的協同工作流程和角色,找出研究差距並探討潛在的未來發展。此外,三個相關的臨床使用案例展示了 XAI 和 FL 的功能,重點在於 (i) 檢視辨識、(ii) 疾病分類、(iii) 心臟結構分割和 (iv) 心臟功能的量化評估。
Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering
2411.00916v2 by Mehdi Hosseini Chagahi, Saeed Mohammadi Dashtaki, Niloufar Delfan, Nadia Mohammadi, Alireza Samari, Behzad Moshiri, Md. Jalil Piran, Oliver Faust
Osteoporosis is a common condition that increases fracture risk, especially in older adults. Early diagnosis is vital for preventing fractures, reducing treatment costs, and preserving mobility. However, healthcare providers face challenges like limited labeled data and difficulties in processing medical images. This study presents a novel multi-modal learning framework that integrates clinical and imaging data to improve diagnostic accuracy and model interpretability. The model utilizes three pre-trained networks-VGG19, InceptionV3, and ResNet50-to extract deep features from X-ray images. These features are transformed using PCA to reduce dimensionality and focus on the most relevant components. A clustering-based selection process identifies the most representative components, which are then combined with preprocessed clinical data and processed through a fully connected network (FCN) for final classification. A feature importance plot highlights key variables, showing that Medical History, BMI, and Height were the main contributors, emphasizing the significance of patient-specific data. While imaging features were valuable, they had lower importance, indicating that clinical data are crucial for accurate predictions. This framework promotes precise and interpretable predictions, enhancing transparency and building trust in AI-driven diagnoses for clinical integration.
摘要:骨質疏鬆症是一種常見的疾病,會增加骨折的風險,特別是老年人。早期診斷對於預防骨折、降低治療成本和維持行動能力至關重要。然而,醫療保健提供者面臨著標記數據有限和處理醫學影像困難等挑戰。本研究提出了一個新穎的多模式學習框架,該框架整合了臨床和影像數據,以提高診斷準確性和模型可解釋性。該模型利用三個預訓練的網路,VGG19、InceptionV3 和 ResNet50,從 X 射線影像中提取深度特徵。這些特徵使用 PCA 轉換以降低維度並專注於最相關的組成部分。基於聚類的選擇過程識別出最具代表性的組成部分,然後將這些組成部分與預處理的臨床數據結合,並通過全連接網路 (FCN) 進行最終分類。特徵重要性圖突出了關鍵變數,表明病史、BMI 和身高是主要貢獻因素,強調了患者特定數據的重要性。雖然影像特徵很有價值,但它們的重要性較低,這表明臨床數據對於準確預測至關重要。此框架促进了準確且可解釋的預測,提高了透明度,並建立了對 AI 驅動診斷在臨床整合中的信任。
A Review of Deep Learning Approaches for Non-Invasive Cognitive Impairment Detection
2410.19898v1 by Muath Alsuhaibani, Ali Pourramezan Fard, Jian Sun, Farida Far Poor, Peter S. Pressman, Mohammad H. Mahoor
This review paper explores recent advances in deep learning approaches for non-invasive cognitive impairment detection. We examine various non-invasive indicators of cognitive decline, including speech and language, facial, and motoric mobility. The paper provides an overview of relevant datasets, feature-extracting techniques, and deep-learning architectures applied to this domain. We have analyzed the performance of different methods across modalities and observed that speech and language-based methods generally achieved the highest detection performance. Studies combining acoustic and linguistic features tended to outperform those using a single modality. Facial analysis methods showed promise for visual modalities but were less extensively studied. Most papers focused on binary classification (impaired vs. non-impaired), with fewer addressing multi-class or regression tasks. Transfer learning and pre-trained language models emerged as popular and effective techniques, especially for linguistic analysis. Despite significant progress, several challenges remain, including data standardization and accessibility, model explainability, longitudinal analysis limitations, and clinical adaptation. Lastly, we propose future research directions, such as investigating language-agnostic speech analysis methods, developing multi-modal diagnostic systems, and addressing ethical considerations in AI-assisted healthcare. By synthesizing current trends and identifying key obstacles, this review aims to guide further development of deep learning-based cognitive impairment detection systems to improve early diagnosis and ultimately patient outcomes.
摘要:本篇評論探討了深度學習方法在非侵入式認知功能障礙檢測上的最新進展。我們檢視了各種非侵入式的認知衰退指標,包括語言和語言、面部和運動機能。本文概述了與此領域相關的資料集、特徵提取技術和深度學習架構。我們分析了不同方法在不同方式上的表現,並觀察到基於語言和語言的方法通常能達到最高的檢測表現。結合聲學和語言特徵的研究往往優於使用單一方式的研究。面部分析方法顯示出視覺方式的潛力,但研究較少。大多數論文專注於二元分類(受損與未受損),較少探討多類或回歸任務。遷移學習和預訓練語言模型已成為流行且有效的技術,特別是對於語言分析。儘管取得了重大進展,但仍存在一些挑戰,包括資料標準化和可及性、模型可解釋性、縱向分析限制和臨床適應性。最後,我們提出了未來的研究方向,例如調查與語言無關的語音分析方法、開發多模式診斷系統,以及解決人工智慧輔助醫療保健中的倫理考量。透過綜合目前的趨勢和找出關鍵障礙,本篇評論旨在引導深度學習為基礎的認知功能障礙檢測系統的進一步發展,以改善早期診斷,並最終改善患者的治療結果。
An Ontology-Enabled Approach For User-Centered and Knowledge-Enabled Explanations of AI Systems
2410.17504v1 by Shruthi Chari
Explainable Artificial Intelligence (AI) focuses on helping humans understand the working of AI systems or their decisions and has been a cornerstone of AI for decades. Recent research in explainability has focused on explaining the workings of AI models or model explainability. There have also been several position statements and review papers detailing the needs of end-users for user-centered explainability but fewer implementations. Hence, this thesis seeks to bridge some gaps between model and user-centered explainability. We create an explanation ontology (EO) to represent literature-derived explanation types via their supporting components. We implement a knowledge-augmented question-answering (QA) pipeline to support contextual explanations in a clinical setting. Finally, we are implementing a system to combine explanations from different AI methods and data modalities. Within the EO, we can represent fifteen different explanation types, and we have tested these representations in six exemplar use cases. We find that knowledge augmentations improve the performance of base large language models in the contextualized QA, and the performance is variable across disease groups. In the same setting, clinicians also indicated that they prefer to see actionability as one of the main foci in explanations. In our explanations combination method, we plan to use similarity metrics to determine the similarity of explanations in a chronic disease detection setting. Overall, through this thesis, we design methods that can support knowledge-enabled explanations across different use cases, accounting for the methods in today's AI era that can generate the supporting components of these explanations and domain knowledge sources that can enhance them.
摘要:可解釋人工智慧(AI)專注於協助人類了解 AI 系統運作或其決策,數十年來一直是 AI 的基石。最近的可解釋性研究專注於解釋 AI 模型或模型可解釋性的運作。也有幾份立場聲明和評論論文詳細說明了最終使用者對以使用者為中心的可解釋性的需求,但實作較少。因此,本論文旨在彌補模型和以使用者為中心的可解釋性之間的一些差距。我們建立一個解釋本體(EO)以透過其支援元件來表示從文獻中衍生的解釋類型。我們實作一個知識增強的問答(QA)管線,以在臨床環境中支援情境解釋。最後,我們正在實作一個系統,以結合來自不同 AI 方法和資料模式的解釋。在 EO 中,我們可以表示 15 種不同的解釋類型,並且我們已在六個範例使用案例中測試這些表示。我們發現,知識增強改善了基礎大型語言模型在情境化 QA 中的效能,並且效能因疾病群組而異。在相同的環境中,臨床醫生也表示他們希望將可操作性視為解釋中的主要焦點之一。在我們的解釋組合方法中,我們計畫使用相似性指標來確定慢性病偵測環境中解釋的相似性。總體而言,透過本論文,我們設計了可以在不同使用案例中支援知識啟用解釋的方法,考量到當今 AI 時代中可以產生這些解釋的支援元件和可以增強這些解釋的領域知識來源的方法。
Contrasting Attitudes Towards Current and Future AI Applications for Computerised Interpretation of ECG: A Clinical Stakeholder Interview Study
2410.16879v1 by Lukas Hughes-Noehrer, Leda Channer, Gabriel Strain, Gregory Yates, Richard Body, Caroline Jay
Objectives: To investigate clinicians' attitudes towards current automated interpretation of ECG and novel AI technologies and their perception of computer-assisted interpretation. Materials and Methods: We conducted a series of interviews with clinicians in the UK. Our study: (i) explores the potential for AI, specifically future 'human-like' computing approaches, to facilitate ECG interpretation and support clinical decision making, and (ii) elicits their opinions about the importance of explainability and trustworthiness of AI algorithms. Results: We performed inductive thematic analysis on interview transcriptions from 23 clinicians and identified the following themes: (i) a lack of trust in current systems, (ii) positive attitudes towards future AI applications and requirements for these, (iii) the relationship between the accuracy and explainability of algorithms, and (iv) opinions on education, possible deskilling, and the impact of AI on clinical competencies. Discussion: Clinicians do not trust current computerised methods, but welcome future 'AI' technologies. Where clinicians trust future AI interpretation to be accurate, they are less concerned that it is explainable. They also preferred ECG interpretation that demonstrated the results of the algorithm visually. Whilst clinicians do not fear job losses, they are concerned about deskilling and the need to educate the workforce to use AI responsibly. Conclusion: Clinicians are positive about the future application of AI in clinical decision-making. Accuracy is a key factor of uptake and visualisations are preferred over current computerised methods. This is viewed as a potential means of training and upskilling, in contrast to the deskilling that automation might be perceived to bring.
摘要:
Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer
2410.15012v1 by Gesa Mittmann, Sara Laiouar-Pedari, Hendrik A. Mehrtens, Sarah Haggenmüller, Tabea-Clara Bucher, Tirtha Chanda, Nadine T. Gaisa, Mathias Wagner, Gilbert Georg Klamminger, Tilman T. Rau, Christina Neppl, Eva Maria Compérat, Andreas Gocht, Monika Hämmerle, Niels J. Rupp, Jula Westhoff, Irene Krücken, Maximillian Seidl, Christian M. Schürch, Marcus Bauer, Wiebke Solass, Yu Chun Tam, Florian Weber, Rainer Grobholz, Jaroslaw Augustyniak, Thomas Kalinski, Christian Hörner, Kirsten D. Mertz, Constanze Döring, Andreas Erbersdobler, Gabriele Deubler, Felix Bremmer, Ulrich Sommer, Michael Brodhun, Jon Griffin, Maria Sarah L. Lenon, Kiril Trpkov, Liang Cheng, Fei Chen, Angelique Levi, Guoping Cai, Tri Q. Nguyen, Ali Amin, Alessia Cimadamore, Ahmed Shabaik, Varsha Manucha, Nazeel Ahmad, Nidia Messias, Francesca Sanguedolce, Diana Taheri, Ezra Baraban, Liwei Jia, Rajal B. Shah, Farshid Siadat, Nicole Swarbrick, Kyung Park, Oudai Hassan, Siamak Sakhaie, Michelle R. Downes, Hiroshi Miyamoto, Sean R. Williamson, Tim Holland-Letz, Carolin V. Schneider, Jakob Nikolas Kather, Yuri Tolkach, Titus J. Brinker
The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue, we introduce a novel dataset of 1,015 tissue microarray core images, annotated by an international group of 54 pathologists. The annotations provide detailed localized pattern descriptions for Gleason grading in line with international guidelines. Utilizing this dataset, we develop an inherently explainable AI system based on a U-Net architecture that provides predictions leveraging pathologists' terminology. This approach circumvents post-hoc explainability methods while maintaining or exceeding the performance of methods trained directly for Gleason pattern segmentation (Dice score: 0.713 $\pm$ 0.003 trained on explanations vs. 0.691 $\pm$ 0.010 trained on Gleason patterns). By employing soft labels during training, we capture the intrinsic uncertainty in the data, yielding strong results in Gleason pattern segmentation even in the context of high interobserver variability. With the release of this dataset, we aim to encourage further research into segmentation in medical tasks with high levels of subjectivity and to advance the understanding of pathologists' reasoning processes.
摘要:前列腺癌是全球男性最常見的癌症,其惡性程度主要根據 Gleason 評分系統使用組織病理學數據進行評估。雖然人工智慧 (AI) 在準確預測 Gleason 評分方面已展現潛力,但這些預測通常缺乏內在的可解釋性,可能會導致對人機互動的不信任。為了解決這個問題,我們引進了一個由 54 位病理學家組成的國際團隊註解的 1,015 個組織微陣列核心影像的新穎資料集。這些註解提供了詳細的局部模式描述,用於符合國際準則的 Gleason 分級。利用這個資料集,我們開發了一個基於 U-Net 架構的內在可解釋 AI 系統,該系統提供了利用病理學家術語進行預測。這種方法規避了事後可解釋性方法,同時維持或超越了直接訓練用於 Gleason 模式分割的方法的效能(Dice 分數:0.713 ± 0.003,訓練於解釋,相對於 0.691 ± 0.010,訓練於 Gleason 模式)。透過在訓練期間採用軟標籤,我們捕捉了資料中的內在不確定性,即使在觀察者間變異性高的情況下,也能在 Gleason 模式分割中產生強大的結果。透過釋出這個資料集,我們旨在鼓勵進一步研究主觀性高的醫療任務中的分割,並增進對病理學家推理過程的理解。
Explainable AI Methods for Multi-Omics Analysis: A Survey
2410.11910v1 by Ahmad Hussein, Mukesh Prasad, Ali Braytee
Advancements in high-throughput technologies have led to a shift from traditional hypothesis-driven methodologies to data-driven approaches. Multi-omics refers to the integrative analysis of data derived from multiple 'omes', such as genomics, proteomics, transcriptomics, metabolomics, and microbiomics. This approach enables a comprehensive understanding of biological systems by capturing different layers of biological information. Deep learning methods are increasingly utilized to integrate multi-omics data, offering insights into molecular interactions and enhancing research into complex diseases. However, these models, with their numerous interconnected layers and nonlinear relationships, often function as black boxes, lacking transparency in decision-making processes. To overcome this challenge, explainable artificial intelligence (xAI) methods are crucial for creating transparent models that allow clinicians to interpret and work with complex data more effectively. This review explores how xAI can improve the interpretability of deep learning models in multi-omics research, highlighting its potential to provide clinicians with clear insights, thereby facilitating the effective application of such models in clinical settings.
摘要:高通量技術的進步導致從傳統的假設驅動方法轉變為資料驅動的方法。多組學是指整合分析來自多個「組學」的資料,例如基因組學、蛋白質組學、轉錄組學、代謝組學和微生物組學。此方法透過擷取生物資訊的不同層面,能全面了解生物系統。深度學習方法愈來愈常被用於整合多組學資料,提供分子交互作用的洞察力,並加強對複雜疾病的研究。然而,這些模型具有許多相互連接的層級和非線性關係,通常會像黑盒子一樣運作,缺乏決策過程的透明度。為了克服此挑戰,可解釋人工智慧 (xAI) 方法對於建立透明模型至關重要,讓臨床醫生可以更有效地解釋和處理複雜資料。此評論探討 xAI 如何能改善多組學研究中深度學習模型的可解釋性,強調其提供臨床醫生明確見解的潛力,進而促進此類模型在臨床環境中的有效應用。
Study on the Helpfulness of Explainable Artificial Intelligence
2410.11896v1 by Tobias Labarta, Elizaveta Kulicheva, Ronja Froelian, Christian Geißler, Xenia Melman, Julian von Klitzing
Explainable Artificial Intelligence (XAI) is essential for building advanced machine learning-powered applications, especially in critical domains such as medical diagnostics or autonomous driving. Legal, business, and ethical requirements motivate using effective XAI, but the increasing number of different methods makes it challenging to pick the right ones. Further, as explanations are highly context-dependent, measuring the effectiveness of XAI methods without users can only reveal a limited amount of information, excluding human factors such as the ability to understand it. We propose to evaluate XAI methods via the user's ability to successfully perform a proxy task, designed such that a good performance is an indicator for the explanation to provide helpful information. In other words, we address the helpfulness of XAI for human decision-making. Further, a user study on state-of-the-art methods was conducted, showing differences in their ability to generate trust and skepticism and the ability to judge the rightfulness of an AI decision correctly. Based on the results, we highly recommend using and extending this approach for more objective-based human-centered user studies to measure XAI performance in an end-to-end fashion.
摘要:可解釋人工智慧 (XAI) 對於建構先進的機器學習驅動應用程式至關重要,特別是在醫療診斷或自動駕駛等關鍵領域。法律、商業和倫理要求促使使用有效的 XAI,但數量日益增加的不同方法使得挑選正確的方法具有挑戰性。此外,由於解釋高度依賴於背景,在沒有使用者的情況下衡量 XAI 方法的有效性只能揭示有限的資訊,排除人類因素,例如理解它的能力。我們建議透過使用者成功執行代理任務的能力來評估 XAI 方法,設計使得良好的執行表現是解釋提供有用資訊的指標。換句話說,我們探討 XAI 對人類決策制定的幫助。此外,對最先進的方法進行使用者研究,顯示出它們在產生信任和懷疑的能力以及正確判斷 AI 決策是否正確的能力方面存在差異。根據結果,我們強烈建議使用和擴充這種方法,以進行更多以目標為基礎的人為中心使用者研究,以終端到終端的方式衡量 XAI 效能。
Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health
2410.09635v1 by Abdullah Mamun, Lawrence D. Devoe, Mark I. Evans, David W. Britt, Judith Klein-Seetharaman, Hassan Ghasemzadeh
Early detection of intrapartum risk enables interventions to potentially prevent or mitigate adverse labor outcomes such as cerebral palsy. Currently, there is no accurate automated system to predict such events to assist with clinical decision-making. To fill this gap, we propose "Artificial Intelligence (AI) for Modeling and Explaining Neonatal Health" (AIMEN), a deep learning framework that not only predicts adverse labor outcomes from maternal, fetal, obstetrical, and intrapartum risk factors but also provides the model's reasoning behind the predictions made. The latter can provide insights into what modifications in the input variables of the model could have changed the predicted outcome. We address the challenges of imbalance and small datasets by synthesizing additional training data using Adaptive Synthetic Sampling (ADASYN) and Conditional Tabular Generative Adversarial Networks (CTGAN). AIMEN uses an ensemble of fully-connected neural networks as the backbone for its classification with the data augmentation supported by either ADASYN or CTGAN. AIMEN, supported by CTGAN, outperforms AIMEN supported by ADASYN in classification. AIMEN can predict a high risk for adverse labor outcomes with an average F1 score of 0.784. It also provides counterfactual explanations that can be achieved by changing 2 to 3 attributes on average. Resources available: https://github.com/ab9mamun/AIMEN.
摘要:產程中風險的早期偵測有助於進行干預措施,以預防或減輕不利的生產結果,例如腦性麻痺。目前,沒有準確的自動化系統可以預測此類事件,以協助臨床決策。為了填補這一空白,我們提出「用於建模和解釋新生兒健康的人工智慧」(AIMEN),這是一個深度學習架構,它不僅可以根據孕產婦、胎兒、產科和產程風險因素預測不利的生產結果,還能提供模型做出預測背後的原因。後者可以提供見解,說明模型輸入變數中的哪些修改可能會改變預測結果。我們透過使用適應性合成抽樣 (ADASYN) 和條件表格生成對抗網路 (CTGAN) 來合成額外的訓練資料,以解決不平衡和小型資料集的挑戰。AIMEN 使用全連接神經網路的集合作為其分類的骨幹,並透過 ADASYN 或 CTGAN 支援資料擴充。由 CTGAN 支援的 AIMEN 在分類方面優於由 ADASYN 支援的 AIMEN。AIMEN 可以預測不利的生產結果的高風險,平均 F1 分數為 0.784。它還提供反事實解釋,可透過平均變更 2 至 3 個屬性來達成。可用資源:https://github.com/ab9mamun/AIMEN。
Artificial intelligence techniques in inherited retinal diseases: A review
2410.09105v1 by Han Trinh, Jordan Vice, Jason Charng, Zahra Tajbakhsh, Khyber Alam, Fred K. Chen, Ajmal Mian
Inherited retinal diseases (IRDs) are a diverse group of genetic disorders that lead to progressive vision loss and are a major cause of blindness in working-age adults. The complexity and heterogeneity of IRDs pose significant challenges in diagnosis, prognosis, and management. Recent advancements in artificial intelligence (AI) offer promising solutions to these challenges. However, the rapid development of AI techniques and their varied applications have led to fragmented knowledge in this field. This review consolidates existing studies, identifies gaps, and provides an overview of AI's potential in diagnosing and managing IRDs. It aims to structure pathways for advancing clinical applications by exploring AI techniques like machine learning and deep learning, particularly in disease detection, progression prediction, and personalized treatment planning. Special focus is placed on the effectiveness of convolutional neural networks in these areas. Additionally, the integration of explainable AI is discussed, emphasizing its importance in clinical settings to improve transparency and trust in AI-based systems. The review addresses the need to bridge existing gaps in focused studies on AI's role in IRDs, offering a structured analysis of current AI techniques and outlining future research directions. It concludes with an overview of the challenges and opportunities in deploying AI for IRDs, highlighting the need for interdisciplinary collaboration and the continuous development of robust, interpretable AI models to advance clinical applications.
摘要:遺傳性視網膜疾病 (IRD) 是一組多樣化的遺傳疾病, 會導致視力逐漸喪失,是工作年齡成人失明的主要原因。IRD 的複雜性和異質性對診斷、預後和管理提出了重大挑戰。最近人工智能 (AI) 的進步為這些挑戰提供了有希望的解決方案。 然而,AI 技術的快速發展及其多種應用導致了該領域的知識分散。本綜述整合了現有研究,找出差距,並概述了 AI 在診斷和管理 IRD 中的潛力。它旨在通過探索機器學習和深度學習等 AI 技術,特別是在疾病檢測、進程預測和個性化治療計劃中,為推進臨床應用構建途徑。特別關注這些領域中卷積神經網路的有效性。此外,討論了可解釋 AI 的整合,強調了其在臨床環境中提高透明度和對基於 AI 的系統的信任的重要性。該綜述解決了彌合 AI 在 IRD 中作用的重點研究中現有差距的必要性,提供了對當前 AI 技術的結構化分析,並概述了未來的研究方向。最後概述了在 IRD 中部署 AI 的挑戰和機遇,強調了跨學科合作和持續開發強大、可解釋的 AI 模型以推進臨床應用的必要性。
CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures
2410.05235v2 by Ekaterina Sviridova, Anar Yeginbergen, Ainara Estarrona, Elena Cabrio, Serena Villata, Rodrigo Agerri
Explaining Artificial Intelligence (AI) decisions is a major challenge nowadays in AI, in particular when applied to sensitive scenarios like medicine and law. However, the need to explain the rationale behind decisions is a main issue also for human-based deliberation as it is important to justify \textit{why} a certain decision has been taken. Resident medical doctors for instance are required not only to provide a (possibly correct) diagnosis, but also to explain how they reached a certain conclusion. Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. These explanations have been manually annotated with argument components (i.e., premise, claim) and argument relations (i.e., attack, support), resulting in the Multilingual CasiMedicos-Arg dataset which consists of 558 clinical cases in four languages (English, Spanish, French, Italian) with explanations, where we annotated 5021 claims, 2313 premises, 2431 support relations, and 1106 attack relations. We conclude by showing how competitive baselines perform over this challenging dataset for the argument mining task.
摘要:解釋人工智慧 (AI) 的決策是現在 AI 的一項重大挑戰,特別是應用於像醫學和法律等敏感情境時。然而,解釋決策背後理由的需求也是基於人類的考量的一個主要問題,因為有必要證明為什麼做出某個決策。例如,住院醫師不僅需要提供(可能是正確的)診斷,還需要解釋他們如何達成某個結論。因此,開發新的工具來幫助住院醫師訓練他們的解釋技巧是教育中 AI 的一項核心目標。在本文中,我們遵循這個方向,並且根據我們的了解,提出第一個多語言醫學問答資料集,其中臨床病例的正確和不正確診斷都附有由醫生撰寫的自然語言解釋。這些解釋已使用論證組成(即前提、主張)和論證關係(即攻擊、支持)進行手動註解,產生多語言 CasiMedicos-Arg 資料集,其中包含 558 個具有解釋的四種語言(英語、西班牙語、法語、義大利語)的臨床病例,我們註解了 5021 個主張、2313 個前提、2431 個支持關係和 1106 個攻擊關係。我們最後展示了競爭基準如何針對論證探勘任務執行此具挑戰性的資料集。
Explainable Diagnosis Prediction through Neuro-Symbolic Integration
2410.01855v2 by Qiuhao Lu, Rui Li, Elham Sagheb, Andrew Wen, Jinlian Wang, Liwei Wang, Jungwei W. Fan, Hongfang Liu
Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.
摘要:診斷預測是醫療保健中的關鍵任務,及時且準確地識別醫療狀況會顯著影響患者的結果。傳統的機器學習和深度學習模型已在這個領域取得顯著成功,但通常缺乏可解釋性,這在臨床環境中是一項關鍵要求。在本研究中,我們探討了神經符號方法的應用,特別是邏輯神經網路 (LNN),以開發用於診斷預測的可解釋模型。基本上,我們設計並實作了基於 LNN 的模型,這些模型透過具有可學習閾值的邏輯規則整合領域特定知識。我們的模型,特別是 $M_{\text{multi-pathway}}$ 和 $M_{\text{comprehensive}}$,表現出優於傳統模型(例如邏輯迴歸、SVM 和隨機森林)的優異效能,在糖尿病預測的案例研究中達到了更高的準確度(高達 80.52%)和 AUROC 分數(高達 0.8457)。LNN 模型中學習到的權重和閾值提供了對特徵貢獻的直接見解,增強了可解釋性,同時不影響預測能力。這些發現突顯了神經符號方法在彌合醫療保健 AI 應用中準確性和可解釋性差距方面的潛力。透過提供透明且適應性強的診斷模型,我們的研究有助於推進精準醫療,並支援公平醫療保健解決方案的開發。未來的研究將專注於將這些方法擴展到更大且更多樣化的資料集,以進一步驗證其在不同醫療狀況和人群中的適用性。
Easydiagnos: a framework for accurate feature selection for automatic diagnosis in smart healthcare
2410.00366v1 by Prasenjit Maji, Amit Kumar Mondal, Hemanta Kumar Mondal, Saraju P. Mohanty
The rapid advancements in artificial intelligence (AI) have revolutionized smart healthcare, driving innovations in wearable technologies, continuous monitoring devices, and intelligent diagnostic systems. However, security, explainability, robustness, and performance optimization challenges remain critical barriers to widespread adoption in clinical environments. This research presents an innovative algorithmic method using the Adaptive Feature Evaluator (AFE) algorithm to improve feature selection in healthcare datasets and overcome problems. AFE integrating Genetic Algorithms (GA), Explainable Artificial Intelligence (XAI), and Permutation Combination Techniques (PCT), the algorithm optimizes Clinical Decision Support Systems (CDSS), thereby enhancing predictive accuracy and interpretability. The proposed method is validated across three diverse healthcare datasets using six distinct machine learning algorithms, demonstrating its robustness and superiority over conventional feature selection techniques. The results underscore the transformative potential of AFE in smart healthcare, enabling personalized and transparent patient care. Notably, the AFE algorithm, when combined with a Multi-layer Perceptron (MLP), achieved an accuracy of up to 98.5%, highlighting its capability to improve clinical decision-making processes in real-world healthcare applications.
摘要:人工智慧 (AI) 的快速進展徹底改變了智慧醫療保健,推動了可穿戴技術、持續監控裝置和智慧診斷系統的創新。然而,安全性、可解釋性、穩健性和效能最佳化挑戰仍然是臨床環境中廣泛採用的關鍵障礙。本研究提出一個創新的演算法方法,使用自適應特徵評估器 (AFE) 演算法來改善醫療保健資料集中的特徵選取並克服問題。AFE 整合了遺傳演算法 (GA)、可解釋人工智慧 (XAI) 和排列組合技術 (PCT),該演算法最佳化了臨床決策支援系統 (CDSS),從而提高了預測準確性和可解釋性。所提出的方法使用六種不同的機器學習演算法驗證了三個不同的醫療保健資料集,證明了其穩健性和優於傳統特徵選取技術。結果強調了 AFE 在智慧醫療保健中的轉變潛力,實現了個人化和透明的患者照護。值得注意的是,AFE 演算法與多層感知器 (MLP) 結合使用時,準確度高達 98.5%,突顯了其改善實際醫療保健應用中臨床決策制定流程的能力。
Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: eye-tracking study
2409.13476v1 by Tirtha Chanda, Sarah Haggenmueller, Tabea-Clara Bucher, Tim Holland-Letz, Harald Kittler, Philipp Tschandl, Markus V. Heppt, Carola Berking, Jochen S. Utikal, Bastian Schilling, Claudia Buerger, Cristian Navarrete-Dechent, Matthias Goebeler, Jakob Nikolas Kather, Carolin V. Schneider, Benjamin Durani, Hendrike Durani, Martin Jansen, Juliane Wacker, Joerg Wacker, Reader Study Consortium, Titus J. Brinker
Artificial intelligence (AI) systems have substantially improved dermatologists' diagnostic accuracy for melanoma, with explainable AI (XAI) systems further enhancing clinicians' confidence and trust in AI-driven decisions. Despite these advancements, there remains a critical need for objective evaluation of how dermatologists engage with both AI and XAI tools. In this study, 76 dermatologists participated in a reader study, diagnosing 16 dermoscopic images of melanomas and nevi using an XAI system that provides detailed, domain-specific explanations. Eye-tracking technology was employed to assess their interactions. Diagnostic performance was compared with that of a standard AI system lacking explanatory features. Our findings reveal that XAI systems improved balanced diagnostic accuracy by 2.8 percentage points relative to standard AI. Moreover, diagnostic disagreements with AI/XAI systems and complex lesions were associated with elevated cognitive load, as evidenced by increased ocular fixations. These insights have significant implications for clinical practice, the design of AI tools for visual tasks, and the broader development of XAI in medical diagnostics.
摘要:人工智慧 (AI) 系統已大幅改善皮膚科醫師對黑色素瘤的診斷準確度,而可解釋 AI (XAI) 系統進一步提升臨床醫師對 AI 驅動決策的信心與信賴。儘管有這些進展,對於皮膚科醫師如何使用 AI 和 XAI 工具,仍有客觀評估的迫切需求。在這項研究中,76 位皮膚科醫師參與了一項讀者研究,使用 XAI 系統診斷 16 張黑色素瘤和痣的皮膚鏡影像,該系統提供詳細的領域特定說明。採用眼球追蹤技術來評估他們的互動。將診斷表現與缺乏說明功能的標準 AI 系統進行比較。我們的研究結果顯示,XAI 系統相較於標準 AI,將平衡診斷準確度提升了 2.8 個百分點。此外,與 AI/XAI 系統的診斷分歧和複雜的病灶與認知負擔升高有關,這由增加的眼睛注視次數所證實。這些見解對臨床實務、視覺任務 AI 工具的設計和醫學診斷中 XAI 的廣泛發展具有重大意義。
Explainable AI for Autism Diagnosis: Identifying Critical Brain Regions Using fMRI Data
2409.15374v1 by Suryansh Vidya, Kush Gupta, Amir Aly, Andy Wills, Emmanuel Ifeachor, Rohit Shankar
Early diagnosis and intervention for Autism Spectrum Disorder (ASD) has been shown to significantly improve the quality of life of autistic individuals. However, diagnostics methods for ASD rely on assessments based on clinical presentation that are prone to bias and can be challenging to arrive at an early diagnosis. There is a need for objective biomarkers of ASD which can help improve diagnostic accuracy. Deep learning (DL) has achieved outstanding performance in diagnosing diseases and conditions from medical imaging data. Extensive research has been conducted on creating models that classify ASD using resting-state functional Magnetic Resonance Imaging (fMRI) data. However, existing models lack interpretability. This research aims to improve the accuracy and interpretability of ASD diagnosis by creating a DL model that can not only accurately classify ASD but also provide explainable insights into its working. The dataset used is a preprocessed version of the Autism Brain Imaging Data Exchange (ABIDE) with 884 samples. Our findings show a model that can accurately classify ASD and highlight critical brain regions differing between ASD and typical controls, with potential implications for early diagnosis and understanding of the neural basis of ASD. These findings are validated by studies in the literature that use different datasets and modalities, confirming that the model actually learned characteristics of ASD and not just the dataset. This study advances the field of explainable AI in medical imaging by providing a robust and interpretable model, thereby contributing to a future with objective and reliable ASD diagnostics.
摘要:自閉症譜系障礙 (ASD) 的早期診斷和介入已被證實能顯著改善自閉症患者的生活品質。然而,ASD 的診斷方法依賴於基於臨床表現的評估,容易產生偏見,且可能難以做出早期診斷。有必要找出 ASD 的客觀生物標記,以幫助提高診斷準確性。深度學習 (DL) 在從醫學影像資料診斷疾病和病症方面取得傑出的表現。已經針對建立使用靜態功能性磁振造影 (fMRI) 資料對 ASD 進行分類的模型進行廣泛的研究。然而,現有的模型缺乏可解釋性。本研究旨在透過建立一個不僅能準確分類 ASD,還能提供可解釋見解說明其運作原理的 DL 模型,來改善 ASD 診斷的準確性和可解釋性。所使用的資料集是自閉症大腦影像資料交換 (ABIDE) 的預處理版本,包含 884 個樣本。我們的研究結果顯示,該模型能準確分類 ASD,並強調 ASD 與典型對照組之間存在差異的關鍵腦區,對於 ASD 的早期診斷和神經基礎的理解具有潛在的意義。這些研究結果已由使用不同資料集和方式的文獻研究驗證,證實該模型實際上學習了 ASD 的特徵,而不僅僅是資料集。本研究透過提供一個強健且可解釋的模型,推動了醫學影像中可解釋 AI 的領域,從而為未來提供客觀且可靠的 ASD 診斷做出貢獻。
Improving Prototypical Parts Abstraction for Case-Based Reasoning Explanations Designed for the Kidney Stone Type Recognition
2409.12883v1 by Daniel Flores-Araiza, Francisco Lopez-Tiro, Clément Larose, Salvador Hinojosa, Andres Mendez-Vazquez, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz, Christian Daul
The in-vivo identification of the kidney stone types during an ureteroscopy would be a major medical advance in urology, as it could reduce the time of the tedious renal calculi extraction process, while diminishing infection risks. Furthermore, such an automated procedure would make possible to prescribe anti-recurrence treatments immediately. Nowadays, only few experienced urologists are able to recognize the kidney stone types in the images of the videos displayed on a screen during the endoscopy. Thus, several deep learning (DL) models have recently been proposed to automatically recognize the kidney stone types using ureteroscopic images. However, these DL models are of black box nature whicl limits their applicability in clinical settings. This contribution proposes a case-based reasoning DL model which uses prototypical parts (PPs) and generates local and global descriptors. The PPs encode for each class (i.e., kidney stone type) visual feature information (hue, saturation, intensity and textures) similar to that used by biologists. The PPs are optimally generated due a new loss function used during the model training. Moreover, the local and global descriptors of PPs allow to explain the decisions ("what" information, "where in the images") in an understandable way for biologists and urologists. The proposed DL model has been tested on a database including images of the six most widespread kidney stone types. The overall average classification accuracy was 90.37. When comparing this results with that of the eight other DL models of the kidney stone state-of-the-art, it can be seen that the valuable gain in explanability was not reached at the expense of accuracy which was even slightly increased with respect to that (88.2) of the best method of the literature. These promising and interpretable results also encourage urologists to put their trust in AI-based solutions.
摘要:尿路鏡檢查中腎結石類型的體內識別將是泌尿科的一項重大進展,因為它可以減少繁瑣的腎結石取出過程的時間,同時降低感染風險。此外,這種自動化程序將使立即開立抗復發治療成為可能。如今,只有少數經驗豐富的泌尿科醫生能夠在內視鏡檢查期間屏幕上顯示的視頻圖像中識別腎結石類型。因此,最近已提出多種深度學習 (DL) 模型,以使用輸尿管鏡圖像自動識別腎結石類型。然而,這些 DL 模型本質上是黑盒子,這限制了它們在臨床環境中的應用性。本文提出了一個基於案例推理的 DL 模型,它使用原型部分 (PP) 並生成局部和全局描述符。PP 為每種類型(即腎結石類型)編碼視覺特徵信息(色調、飽和度、強度和紋理),類似於生物學家使用的信息。由於在模型訓練期間使用的新損失函數,PP 得到了最佳生成。此外,PP 的局部和全局描述符允許以生物學家和泌尿科醫生可以理解的方式解釋決策(“什麼”信息,“圖像中的什麼位置”)。所提出的 DL 模型已在一個包含六種最廣泛的腎結石類型圖像的數據庫上進行了測試。總體平均分類準確率為 90.37。將此結果與腎結石最先進的八個其他 DL 模型的結果進行比較時,可以看出,可解釋性的寶貴增益並未以準確性為代價,甚至略有增加與文獻中最好的方法 (88.2) 相比。這些有希望且可解釋的結果也鼓勵泌尿科醫生相信基於人工智能的解決方案。
Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques
2409.12087v3 by Yubo Li, Saba Al-Sayouri, Rema Padman
This study explores the potential of utilizing administrative claims data, combined with advanced machine learning and deep learning techniques, to predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal Disease (ESRD). We analyze a comprehensive, 10-year dataset provided by a major health insurance organization to develop prediction models for multiple observation windows using traditional machine learning methods such as Random Forest and XGBoost as well as deep learning approaches such as Long Short-Term Memory (LSTM) networks. Our findings demonstrate that the LSTM model, particularly with a 24-month observation window, exhibits superior performance in predicting ESRD progression, outperforming existing models in the literature. We further apply SHapley Additive exPlanations (SHAP) analysis to enhance interpretability, providing insights into the impact of individual features on predictions at the individual patient level. This study underscores the value of leveraging administrative claims data for CKD management and predicting ESRD progression.
摘要:本研究探討利用行政申報資料,結合先進機器學習與深度學習技術,預測慢性腎臟病 (CKD) 進展至末期腎臟疾病 (ESRD) 的可能性。我們分析一家大型健康保險組織提供的 10 年綜合資料集,使用傳統機器學習方法(例如隨機森林和 XGBoost)以及深度學習方法(例如長期短期記憶 (LSTM) 網路)開發多個觀察視窗的預測模型。我們的研究結果顯示,LSTM 模型(尤其是 24 個月觀察視窗)在預測 ESRD 進展方面表現優異,優於文獻中的現有模型。我們進一步應用 SHapley 可加性解釋 (SHAP) 分析以增強可解釋性,深入了解個別特徵對個別患者層級預測的影響。本研究強調了利用行政申報資料進行 CKD 管理和預測 ESRD 進展的價值。
Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases
2409.09201v3 by Mercy Asiedu, Nenad Tomasev, Chintan Ghate, Tiya Tiyasirichokchai, Awa Dieng, Oluwatosin Akande, Geoffrey Siwo, Steve Adudans, Sylvanus Aitkins, Odianosen Ehiakhamen, Eric Ndombi, Katherine Heller
While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific exploration. We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts. We demonstrate through systematic experimentation, the benefit of contextual information such as demographics, location, gender, risk factors for optimal LLM response. Finally we develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.
摘要:儘管大型語言模型 (LLM) 在醫療問題解答方面展現出前景,但專注於熱帶和傳染病特定探索的研究有限。我們建立在一個開放原始碼熱帶和傳染病 (TRINDs) 資料集上,並將其擴展為納入人口統計和語義臨床和消費者擴充,產生超過 11000 個提示。我們評估了 LLM 在這些方面的效能,比較了通才和醫療 LLM,以及 LLM 結果與人類專家的比較。我們透過系統性實驗證明了背景資訊(例如人口統計、位置、性別、最佳 LLM 回應的風險因素)的好處。最後,我們開發了 TRINDs-LM 的原型,這是一個研究工具,提供一個探索背景如何影響 LLM 健康輸出的平台。
Explainable AI: Definition and attributes of a good explanation for health AI
2409.15338v1 by Evangelia Kyrimi, Scott McLachlan, Jared M Wohlgemut, Zane B Perkins, David A. Lagnado, William Marsh, the ExAIDSS Expert Group
Proposals of artificial intelligence (AI) solutions based on increasingly complex and accurate predictive models are becoming ubiquitous across many disciplines. As the complexity of these models grows, transparency and users' understanding often diminish. This suggests that accurate prediction alone is insufficient for making an AI-based solution truly useful. In the development of healthcare systems, this introduces new issues related to accountability and safety. Understanding how and why an AI system makes a recommendation may require complex explanations of its inner workings and reasoning processes. Although research on explainable AI (XAI) has significantly increased in recent years and there is high demand for XAI in medicine, defining what constitutes a good explanation remains ad hoc, and providing adequate explanations continues to be challenging. To fully realize the potential of AI, it is critical to address two fundamental questions about explanations for safety-critical AI applications, such as health-AI: (1) What is an explanation in health-AI? and (2) What are the attributes of a good explanation in health-AI? In this study, we examined published literature and gathered expert opinions through a two-round Delphi study. The research outputs include (1) a definition of what constitutes an explanation in health-AI and (2) a comprehensive list of attributes that characterize a good explanation in health-AI.
摘要:隨著越來越複雜且準確的預測模型,基於人工智慧 (AI) 解決方案的提案在許多領域中變得無處不在。隨著這些模型複雜性的增加,透明度和使用者的理解力往往會降低。這表示僅有準確的預測並不足以讓 AI 解決方案真正有用。在醫療保健系統的開發中,這引入了與問責制和安全性相關的新問題。瞭解 AI 系統如何以及為何提出建議可能需要對其內部運作和推理過程進行複雜的說明。儘管近年來對可解釋 AI (XAI) 的研究已大幅增加,且醫學領域對 XAI 有很高的需求,但定義什麼構成一個好的解釋仍是臨時性的,而提供適當的解釋仍然具有挑戰性。為了充分發揮 AI 的潛力,對於安全關鍵型 AI 應用(例如健康 AI)的解釋,探討兩個基本問題至關重要:(1) 什麼是健康 AI 中的解釋?以及 (2) 健康 AI 中一個好的解釋有哪些屬性?在本研究中,我們檢視了已發表的文獻,並透過兩輪德爾菲研究收集了專家意見。研究成果包括:(1) 健康 AI 中什麼構成解釋的定義,以及 (2) 健康 AI 中一個好解釋的屬性清單。
Exploring the Effect of Explanation Content and Format on User Comprehension and Trust in Healthcare
2408.17401v2 by Antonio Rago, Bence Palfi, Purin Sukpanichnant, Hannibal Nabli, Kavyesh Vivek, Olga Kostopoulou, James Kinross, Francesca Toni
AI-driven tools for healthcare are widely acknowledged as potentially beneficial to health practitioners and patients, e.g. the QCancer regression tool for cancer risk prediction. However, for these tools to be trusted, they need to be supplemented with explanations. We examine how explanations' content and format affect user comprehension and trust when explaining QCancer's predictions. Regarding content, we deploy SHAP and Occlusion-1. Regarding format, we present SHAP explanations, conventionally, as charts (SC) and Occlusion-1 explanations as charts (OC) as well as text (OT), to which their simpler nature lends itself. We conduct experiments with two sets of stakeholders: the general public (representing patients) and medical students (representing healthcare practitioners). Our experiments showed higher subjective comprehension and trust for Occlusion-1 over SHAP explanations based on content. However, when controlling for format, only OT outperformed SC, suggesting this trend is driven by preferences for text. Other findings corroborated that explanation format, rather than content, is often the critical factor.
摘要:由 AI 驅動的醫療保健工具被廣泛認為對醫療從業者和患者有潛在好處,例如用於癌症風險預測的 QCancer 回歸工具。然而,對於這些工具,如果要讓人們信賴,就需要補充說明。我們研究了說明的內容和格式如何影響使用者在解釋 QCancer 預測時的理解和信任。關於內容,我們部署了 SHAP 和 Occlusion-1。關於格式,我們以圖表 (SC) 的形式呈現 SHAP 說明,以圖表 (OC) 和文字 (OT) 的形式呈現 Occlusion-1 說明,因為它們的性質較為簡單。我們對兩組利害關係人進行了實驗:一般民眾(代表患者)和醫學生(代表醫療從業者)。我們的實驗結果顯示,基於內容,Occlusion-1 比 SHAP 說明具有更高的主觀理解和信任。然而,在控制格式時,只有 OT 優於 SC,這表明這種趨勢是由對文字的偏好所驅動的。其他發現證實了說明格式,而不是內容,通常是關鍵因素。
A Survey for Large Language Models in Biomedicine
2409.00133v1 by Chong Wang, Mengyao Li, Junjun He, Zhongruo Wang, Erfan Darzi, Zan Chen, Jin Ye, Tianbin Li, Yanzhou Su, Jing Ke, Kaili Qu, Shuxin Li, Yi Yu, Pietro Liò, Tianyun Wang, Yu Guang Wang, Yiqing Shen
Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv, provides an in-depth examination of the current landscape, applications, challenges, and prospects of LLMs in biomedicine, distinguishing itself by focusing on the practical implications of these models in real-world biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine, among others, with insights drawn from 137 key studies. Then, we discuss adaptation strategies of LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to enhance their performance in specialized biomedical contexts where zero-shot fails to achieve, such as medical question answering and efficient processing of biomedical literature. Finally, we discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics due to the sensitive nature of biomedical data, the need for highly reliable model outputs, and the ethical implications of deploying AI in healthcare. To address these challenges, we also identify future research directions of LLM in biomedicine including federated learning methods to preserve data privacy and integrating explainable AI methodologies to enhance the transparency of LLMs.
摘要:大型語言模型 (LLM) 的最新突破提供了前所未有的自然語言理解和生成能力。然而,現有關於生物醫學中 LLM 的調查通常專注於特定應用或模型架構,缺乏整合各種生物醫學領域最新進展的全面分析。本綜述基於對來自 PubMed、Web of Science 和 arXiv 等數據庫的 484 篇出版物的分析,深入探討了生物醫學中 LLM 的當前現況、應用、挑戰和前景,其特點是關注這些模型在現實世界生物醫學背景中的實際應用。首先,我們探討了 LLM 在廣泛的生物醫學任務中的零次學習能力,包括診斷輔助、藥物發現和個性化醫療等,並從 137 項關鍵研究中汲取見解。然後,我們討論了 LLM 的適應策略,包括單模態和多模態 LLM 的微調方法,以增強它們在零次學習無法實現的專業生物醫學背景中的性能,例如醫療問題解答和生物醫學文獻的有效處理。最後,我們討論了 LLM 在生物醫學領域面臨的挑戰,包括數據隱私問題、模型可解釋性有限、數據集質量問題以及由於生物醫學數據的敏感性、對高度可靠模型輸出的需求以及在醫療保健中部署 AI 的倫理影響而產生的倫理問題。為了應對這些挑戰,我們還確定了生物醫學中 LLM 未來的研究方向,包括用於保護數據隱私的聯合學習方法以及整合可解釋 AI 方法以增強 LLM 的透明度。
Aligning XAI with EU Regulations for Smart Biomedical Devices: A Methodology for Compliance Analysis
2408.15121v1 by Francesco Sovrano, Michael Lognoul, Giulia Vilone
Significant investment and development have gone into integrating Artificial Intelligence (AI) in medical and healthcare applications, leading to advanced control systems in medical technology. However, the opacity of AI systems raises concerns about essential characteristics needed in such sensitive applications, like transparency and trustworthiness. Our study addresses these concerns by investigating a process for selecting the most adequate Explainable AI (XAI) methods to comply with the explanation requirements of key EU regulations in the context of smart bioelectronics for medical devices. The adopted methodology starts with categorising smart devices by their control mechanisms (open-loop, closed-loop, and semi-closed-loop systems) and delving into their technology. Then, we analyse these regulations to define their explainability requirements for the various devices and related goals. Simultaneously, we classify XAI methods by their explanatory objectives. This allows for matching legal explainability requirements with XAI explanatory goals and determining the suitable XAI algorithms for achieving them. Our findings provide a nuanced understanding of which XAI algorithms align better with EU regulations for different types of medical devices. We demonstrate this through practical case studies on different neural implants, from chronic disease management to advanced prosthetics. This study fills a crucial gap in aligning XAI applications in bioelectronics with stringent provisions of EU regulations. It provides a practical framework for developers and researchers, ensuring their AI innovations advance healthcare technology and adhere to legal and ethical standards.
摘要:人工智慧(AI)在醫療和保健應用中投入了大量的投資和開發,進而導致醫療技術中的先進控制系統。然而,AI 系統的不透明性引發了對此類敏感應用中所需基本特性的擔憂,例如透明度和可信度。我們的研究透過調查一個程序來解決這些問題,用於選擇最充分的可解釋 AI(XAI)方法,以符合歐盟法規在醫療器材的智慧型生物電子學中的說明要求。採用的方法從透過其控制機制(開迴路、閉迴路和半閉迴路系統)對智慧型裝置進行分類,並深入探討其技術開始。然後,我們分析這些法規以定義其對各種裝置和相關目標的可解釋性要求。同時,我們透過其說明目標對 XAI 方法進行分類。這允許將法律可解釋性要求與 XAI 說明目標相匹配,並確定適當的 XAI 演算法來達成它們。我們的研究結果提供了對哪些 XAI 演算法更符合歐盟法規以適用於不同類型的醫療器材的細緻理解。我們透過不同神經植入物的實際案例研究來證明這一點,從慢性疾病管理到先進的義肢。這項研究填補了將生物電子學中的 XAI 應用與歐盟法規的嚴格規定相符的重要空白。它為開發人員和研究人員提供了一個實用的架構,確保其 AI 創新能促進醫療技術並遵守法律和道德標準。
Towards Case-based Interpretability for Medical Federated Learning
2408.13626v1 by Laura Latorre, Liliana Petrychenko, Regina Beets-Tan, Taisiya Kopytova, Wilson Silva
We explore deep generative models to generate case-based explanations in a medical federated learning setting. Explaining AI model decisions through case-based interpretability is paramount to increasing trust and allowing widespread adoption of AI in clinical practice. However, medical AI training paradigms are shifting towards federated learning settings in order to comply with data protection regulations. In a federated scenario, past data is inaccessible to the current user. Thus, we use a deep generative model to generate synthetic examples that protect privacy and explain decisions. Our proof-of-concept focuses on pleural effusion diagnosis and uses publicly available Chest X-ray data.
摘要:我們探索深度生成模型,在醫療聯邦學習設置中生成基於案例的說明。透過基於案例的可解釋性來解釋 AI 模型決策,對於增加信任並允許 AI 在臨床實務中廣泛採用至關重要。然而,醫療 AI 訓練範例正轉向聯邦學習設置,以符合資料保護法規。在聯邦情境中,過去的資料對目前的使用者而言是無法取得的。因此,我們使用深度生成模型來產生保護隱私和解釋決策的合成範例。我們的概念驗證著重於胸腔積液診斷,並使用公開可取得的胸部 X 光資料。
AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines
2408.12491v1 by Douwe J. Spaanderman, Matthew Marzetti, Xinyi Wan, Andrew F. Scarsbrook, Philip Robinson, Edwin H. G. Oei, Jacob J. Visser, Robert Hemke, Kirsten van Langevelde, David F. Hanff, Geert J. L. H. van Leenders, Cornelis Verhoef, Dirk J. Gruühagen, Wiro J. Niessen, Stefan Klein, Martijn P. A. Starmans
Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches. This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours, highlighting challenges in clinical translation, and evaluating study alignment with the Checklist for AI in Medical Imaging (CLAIM) and the FUTURE-AI international consensus guidelines for trustworthy and deployable AI to promote the clinical translation of AI methods. The review covered literature from several bibliographic databases, including papers published before 17/07/2024. Original research in peer-reviewed journals focused on radiology-based AI for diagnosing or prognosing primary STBT was included. Exclusion criteria were animal, cadaveric, or laboratory studies, and non-English papers. Abstracts were screened by two of three independent reviewers for eligibility. Eligible papers were assessed against guidelines by one of three independent reviewers. The search identified 15,015 abstracts, from which 325 articles were included for evaluation. Most studies performed moderately on CLAIM, averaging a score of 28.9$\pm$7.5 out of 53, but poorly on FUTURE-AI, averaging 5.1$\pm$2.1 out of 30. Imaging-AI tools for STBT remain at the proof-of-concept stage, indicating significant room for improvement. Future efforts by AI developers should focus on design (e.g. define unmet clinical need, intended clinical setting and how AI would be integrated in clinical workflow), development (e.g. build on previous work, explainability), evaluation (e.g. evaluating and addressing biases, evaluating AI against best practices), and data reproducibility and availability (making documented code and data publicly available). Following these recommendations could improve clinical translation of AI methods.
摘要:軟組織和骨骼腫瘤(STBT)是罕見、診斷具有挑戰性的病灶,其臨床行為和治療方法各不相同。這篇系統性回顧提供了使用放射影像進行診斷和預後的人工智慧 (AI) 方法的概觀,重點說明了臨床轉譯的挑戰,並評估研究與醫療影像 AI 核查表 (CLAIM) 和 FUTURE-AI 可信賴且可部署 AI 的國際共識準則的一致性,以促進 AI 方法的臨床轉譯。這篇回顧涵蓋了幾個書目資料庫中的文獻,包括在 2024 年 7 月 17 日之前發表的論文。納入了以放射為基礎的 AI 診斷或預後原發性 STBT 的同行評審期刊中的原始研究。排除標準是動物、屍體或實驗室研究,以及非英文論文。摘要由三位獨立審查員中的兩位篩選資格。合格的論文由三位獨立審查員中的一位根據準則進行評估。搜索識別出 15,015 篇摘要,其中 325 篇文章被納入評估。大多數研究在 CLAIM 中表現中等,平均得分為 53 分中的 28.9±7.5 分,但在 FUTURE-AI 中表現不佳,平均得分為 30 分中的 5.1±2.1 分。STBT 的影像 AI 工具仍處於概念驗證階段,表明有顯著的改進空間。AI 開發人員未來的努力應集中在設計(例如定義未滿足的臨床需求、預期的臨床環境以及 AI 如何整合到臨床工作流程中)、開發(例如建立在先前的工作、可解釋性)、評估(例如評估和解決偏差、評估 AI 與最佳實務)、以及數據可複製性和可用性(公開提供文件化的代碼和數據)。遵循這些建議可以改善 AI 方法的臨床轉譯。
Evaluating Explainable AI Methods in Deep Learning Models for Early Detection of Cerebral Palsy
2409.00001v1 by Kimji N. Pellano, Inga Strümke, Daniel Groos, Lars Adde, Espen Alexander F. Ihlen
Early detection of Cerebral Palsy (CP) is crucial for effective intervention and monitoring. This paper tests the reliability and applicability of Explainable AI (XAI) methods using a deep learning method that predicts CP by analyzing skeletal data extracted from video recordings of infant movements. Specifically, we use XAI evaluation metrics -- namely faithfulness and stability -- to quantitatively assess the reliability of Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) in this specific medical application. We utilize a unique dataset of infant movements and apply skeleton data perturbations without distorting the original dynamics of the infant movements. Our CP prediction model utilizes an ensemble approach, so we evaluate the XAI metrics performances for both the overall ensemble and the individual models. Our findings indicate that both XAI methods effectively identify key body points influencing CP predictions and that the explanations are robust against minor data perturbations. Grad-CAM significantly outperforms CAM in the RISv metric, which measures stability in terms of velocity. In contrast, CAM performs better in the RISb metric, which relates to bone stability, and the RRS metric, which assesses internal representation robustness. Individual models within the ensemble show varied results, and neither CAM nor Grad-CAM consistently outperform the other, with the ensemble approach providing a representation of outcomes from its constituent models.
摘要:腦性麻痺 (CP) 的早期偵測對於有效的介入和監測至關重要。本文測試了可解釋 AI (XAI) 方法的可靠性和適用性,使用深度學習方法,透過分析從嬰兒動作影片記錄中提取的骨骼資料來預測 CP。具體來說,我們使用 XAI 評估指標(即忠實度和穩定性)來量化評估類別激活映射 (CAM) 和梯度加權類別激活映射 (Grad-CAM) 在這個特定醫療應用中的可靠性。我們利用一個獨特的嬰兒動作資料集,並應用骨骼資料擾動,而不會扭曲嬰兒動作的原始動力。我們的 CP 預測模型利用整體方法,因此我們評估了整體整體和個別模型的 XAI 指標表現。我們的研究結果表明,兩種 XAI 方法都能有效識別影響 CP 預測的關鍵身體部位,並且這些解釋對於微小的資料擾動具有魯棒性。Grad-CAM 在 RISv 指標中顯著優於 CAM,該指標衡量速度方面的穩定性。相比之下,CAM 在 RISb 指標中表現得更好,該指標與骨骼穩定性有關,而 RRS 指標則評估內部表示的魯棒性。整體中的個別模型顯示出不同的結果,CAM 和 Grad-CAM 都不一致地優於另一種,整體方法提供了其組成模型結果的表示。
MicroXercise: A Micro-Level Comparative and Explainable System for Remote Physical Therapy
2408.11837v1 by Hanchen David Wang, Nibraas Khan, Anna Chen, Nilanjan Sarkar, Pamela Wisniewski, Meiyi Ma
Recent global estimates suggest that as many as 2.41 billion individuals have health conditions that would benefit from rehabilitation services. Home-based Physical Therapy (PT) faces significant challenges in providing interactive feedback and meaningful observation for therapists and patients. To fill this gap, we present MicroXercise, which integrates micro-motion analysis with wearable sensors, providing therapists and patients with a comprehensive feedback interface, including video, text, and scores. Crucially, it employs multi-dimensional Dynamic Time Warping (DTW) and attribution-based explainable methods to analyze the existing deep learning neural networks in monitoring exercises, focusing on a high granularity of exercise. This synergistic approach is pivotal, providing output matching the input size to precisely highlight critical subtleties and movements in PT, thus transforming complex AI analysis into clear, actionable feedback. By highlighting these micro-motions in different metrics, such as stability and range of motion, MicroXercise significantly enhances the understanding and relevance of feedback for end-users. Comparative performance metrics underscore its effectiveness over traditional methods, such as a 39% and 42% improvement in Feature Mutual Information (FMI) and Continuity. MicroXercise is a step ahead in home-based physical therapy, providing a technologically advanced and intuitively helpful solution to enhance patient care and outcomes.
摘要:最近的全球估計表明,多達 24.1 億人有 健康狀況可從復健服務中受益。居家 物理治療 (PT) 在提供互動式 回饋和有意義的觀察方面面臨重大挑戰,供治療師和患者使用。為了填補這 個缺口,我們提出 MicroXercise,它將微動作分析與 可穿戴式感測器整合在一起,為治療師和患者提供一個全面的 回饋介面,包括影片、文字和分數。至關重要的是,它採用 多維動態時間規整 (DTW) 和基於歸因的可解釋 方法來分析監控運動中現有的深度學習神經網路,專注於運動的高粒度。這種協同 方法至關重要,提供與輸入大小匹配的輸出,以精確地 突出 PT 中關鍵的細微差別和動作,從而將複雜的 AI 分析轉換為清晰、可操作的回饋。透過在不同指標中突顯這些微動作,例如穩定性和動作範圍,MicroXercise 顯著提升最終使用者對回饋的理解和相關性。比較效能指標強調其優於 傳統方法的有效性,例如特徵互惠資訊 (FMI) 和連續性分別提升了 39% 和 42%。MicroXercise 在居家 物理治療方面更進一步,提供技術先進且直覺有用的 解決方案,以提升患者照護和結果。
The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development
2408.05239v1 by Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz
Systematic literature reviews are the highest quality of evidence in research. However, the review process is hindered by significant resource and data constraints. The Literature Review Network (LRN) is the first of its kind explainable AI platform adhering to PRISMA 2020 standards, designed to automate the entire literature review process. LRN was evaluated in the domain of surgical glove practices using 3 search strings developed by experts to query PubMed. A non-expert trained all LRN models. Performance was benchmarked against an expert manual review. Explainability and performance metrics assessed LRN's ability to replicate the experts' review. Concordance was measured with the Jaccard index and confusion matrices. Researchers were blinded to the other's results until study completion. Overlapping studies were integrated into an LRN-generated systematic review. LRN models demonstrated superior classification accuracy without expert training, achieving 84.78% and 85.71% accuracy. The highest performance model achieved high interrater reliability (k = 0.4953) and explainability metrics, linking 'reduce', 'accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51% of the relevant literature despite diverging from the non-expert's judgments (k = 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRN outperformed the manual review (19,920 minutes over 11 months), reducing the entire process to 288.6 minutes over 5 days. This study demonstrates that explainable AI does not require expert training to successfully conduct PRISMA-compliant systematic literature reviews like an expert. LRN summarized the results of surgical glove studies and identified themes that were nearly identical to the clinical researchers' findings. Explainable AI can accurately expedite our understanding of clinical practices, potentially revolutionizing healthcare research.
摘要:系統性文獻回顧是研究中證據品質最高的。然而,回顧過程受到顯著資源和資料限制的阻礙。文獻回顧網路 (LRN) 是第一個遵循 PRISMA 2020 標準的可解釋 AI 平台,旨在自動化整個文獻回顧過程。LRN 在外科手套實務領域中進行評估,使用專家開發的 3 個搜尋字串來查詢 PubMed。非專家訓練所有 LRN 模型。效能以專家手動回顧作為基準。可解釋性和效能指標評估 LRN 複製專家回顧的能力。一致性以 Jaccard 指數和混淆矩陣測量。研究人員在研究完成前對彼此的結果保密。重疊的研究整合到 LRN 生成的系統性回顧中。LRN 模型在沒有專家訓練的情況下展現出優異的分類準確率,達到 84.78% 和 85.71% 的準確率。效能最高的模型達到了高評分者間信賴度 (k = 0.4953) 和可解釋性指標,將「減少」、「意外」和「銳利」與「雙重戴手套」連結在一起。另一個 LRN 模型涵蓋了 91.51% 的相關文獻,儘管與非專家的判斷不同 (k = 0.2174),但包含了「乳膠」、「雙重」(手套)和「適應症」等詞彙。LRN 優於手動回顧(11 個月超過 19,920 分鐘),將整個過程縮短為 5 天超過 288.6 分鐘。這項研究顯示,可解釋的 AI 不需要專家訓練即可成功進行專家等級的 PRISMA 相容系統性文獻回顧。LRN 總結了外科手套研究的結果,並找出與臨床研究人員發現幾乎相同的主题。可解釋的 AI 可以準確地加快我們對臨床實務的理解,有潛力革新醫療保健研究。
Enhancing Medical Learning and Reasoning Systems: A Boxology-Based Comparative Analysis of Design Patterns
2408.02709v1 by Chi Him Ng
This study analyzes hybrid AI systems' design patterns and their effectiveness in clinical decision-making using the boxology framework. It categorizes and copares various architectures combining machine learning and rule-based reasoning to provide insights into their structural foundations and healthcare applications. Addressing two main questions, how to categorize these systems againts established design patterns and how to extract insights through comparative analysis, the study uses design patterns from software engineering to understand and optimize healthcare AI systems. Boxology helps identify commonalities and create reusable solutions, enhancing these systems' scalability, reliability, and performance. Five primary architectures are examined: REML, MLRB, RBML, RMLT, and PERML. Each has unique strengths and weaknesses, highlighting the need for tailored approaches in clinical tasks. REML excels in high-accuracy prediction for datasets with limited data; MLRB in handling large datasets and complex data integration; RBML in explainability and trustworthiness; RMLT in managing high-dimensional data; and PERML, though limited in analysis, shows promise in urgent care scenarios. The study introduces four new patterns, creates five abstract categorization patterns, and refines those five further to specific systems. These contributions enhance Boxlogy's taxonomical organization and offer novel approaches to integrating expert knowledge with machine learning. Boxology's structured, modular apporach offers significant advantages in developing and analyzing hybrid AI systems, revealing commonalities, and promoting reusable solutions. In conclusion, this study underscores hybrid AI systems' crucial role in advancing healthcare and Boxology's potential to drive further innovation in AI integration, ultimately improving clinical decision support and patient outcomes.
摘要:本研究使用盒子學框架分析混合人工智慧系統的設計模式及其在臨床決策中的有效性。它分類並比較結合機器學習和基於規則的推理的各種架構,以深入了解其結構基礎和醫療保健應用。針對兩個主要問題,如何根據既定的設計模式對這些系統進行分類,以及如何通過比較分析提取見解,本研究使用軟體工程中的設計模式來了解和優化醫療保健人工智慧系統。盒子學有助於識別共性並建立可重複使用的解決方案,從而增強這些系統的可擴充性、可靠性和效能。檢查了五種主要的架構:REML、MLRB、RBML、RMLT 和 PERML。每種架構都有獨特的優缺點,強調了在臨床任務中需要量身打造的方法。REML 在資料有限的資料集中表現出高精度的預測;MLRB 在處理大型資料集和複雜資料整合方面表現出色;RBML 在可解釋性和可信度方面表現出色;RMLT 在管理高維資料方面表現出色;而 PERML 儘管在分析方面有限,但在緊急照護場景中表現出潛力。本研究引入了四種新模式,建立了五種抽象分類模式,並進一步將這五種模式細化為具體的系統。這些貢獻增強了盒子學的分類組織,並提供了將專家知識與機器學習整合的新方法。盒子學的結構化、模組化方法在開發和分析混合人工智慧系統、揭示共性以及推廣可重複使用的解決方案方面具有顯著優勢。總之,本研究強調了混合人工智慧系統在推進醫療保健中的關鍵作用,以及盒子學在推動人工智慧整合進一步創新方面的潛力,最終改善臨床決策支援和患者的治療成果。
Bayesian Kolmogorov Arnold Networks (Bayesian_KANs): A Probabilistic Approach to Enhance Accuracy and Interpretability
2408.02706v1 by Masoud Muhammed Hassan
Because of its strong predictive skills, deep learning has emerged as an essential tool in many industries, including healthcare. Traditional deep learning models, on the other hand, frequently lack interpretability and omit to take prediction uncertainty into account two crucial components of clinical decision making. In order to produce explainable and uncertainty aware predictions, this study presents a novel framework called Bayesian Kolmogorov Arnold Networks (BKANs), which combines the expressive capacity of Kolmogorov Arnold Networks with Bayesian inference. We employ BKANs on two medical datasets, which are widely used benchmarks for assessing machine learning models in medical diagnostics: the Pima Indians Diabetes dataset and the Cleveland Heart Disease dataset. Our method provides useful insights into prediction confidence and decision boundaries and outperforms traditional deep learning models in terms of prediction accuracy. Moreover, BKANs' capacity to represent aleatoric and epistemic uncertainty guarantees doctors receive more solid and trustworthy decision support. Our Bayesian strategy improves the interpretability of the model and considerably minimises overfitting, which is important for tiny and imbalanced medical datasets, according to experimental results. We present possible expansions to further use BKANs in more complicated multimodal datasets and address the significance of these discoveries for future research in building reliable AI systems for healthcare. This work paves the way for a new paradigm in deep learning model deployment in vital sectors where transparency and reliability are crucial.
摘要:由於其強大的預測能力,深度學習已成為許多產業中不可或缺的工具,包括醫療保健。然而,傳統的深度學習模型通常缺乏可解釋性,並且忽略了將預測不確定性納入考量,而這兩個因素是臨床決策制定的關鍵組成部分。為了產生可解釋且具有不確定性意識的預測,本研究提出了一個名為貝氏柯爾莫哥洛夫阿諾德網路 (BKAN) 的新架構,它結合了柯爾莫哥洛夫阿諾德網路的表達能力與貝氏推論。我們在兩個醫學資料集上使用 BKAN,這些資料集是評估機器學習模型在醫學診斷中的廣泛使用基準:皮馬印第安人糖尿病資料集和克里夫蘭心臟病資料集。我們的模型提供了對預測信心和決策邊界的有益見解,並且在預測準確度方面優於傳統的深度學習模型。此外,BKAN 表現隨機和認識不確定性的能力,可確保醫生獲得更可靠且值得信賴的決策支援。根據實驗結果,我們的貝氏策略提高了模型的可解釋性,並大幅減少了過度擬合,這對於小型且不平衡的醫學資料集非常重要。我們提出了可能的擴充功能,以進一步將 BKAN 用於更複雜的多模式資料集,並探討這些發現對於未來建立可靠的醫療保健 AI 系統研究的重要性。這項工作為深度學習模型部署在透明度和可靠性至關重要的重要領域中開啟了一個新的典範。
MLtoGAI: Semantic Web based with Machine Learning for Enhanced Disease Prediction and Personalized Recommendations using Generative AI
2407.20284v1 by Shyam Dongre, Ritesh Chandra, Sonali Agarwal
In modern healthcare, addressing the complexities of accurate disease prediction and personalized recommendations is both crucial and challenging. This research introduces MLtoGAI, which integrates Semantic Web technology with Machine Learning (ML) to enhance disease prediction and offer user-friendly explanations through ChatGPT. The system comprises three key components: a reusable disease ontology that incorporates detailed knowledge about various diseases, a diagnostic classification model that uses patient symptoms to detect specific diseases accurately, and the integration of Semantic Web Rule Language (SWRL) with ontology and ChatGPT to generate clear, personalized health advice. This approach significantly improves prediction accuracy and ensures results that are easy to understand, addressing the complexity of diseases and diverse symptoms. The MLtoGAI system demonstrates substantial advancements in accuracy and user satisfaction, contributing to developing more intelligent and accessible healthcare solutions. This innovative approach combines the strengths of ML algorithms with the ability to provide transparent, human-understandable explanations through ChatGPT, achieving significant improvements in prediction accuracy and user comprehension. By leveraging semantic technology and explainable AI, the system enhances the accuracy of disease prediction and ensures that the recommendations are relevant and easily understood by individual patients. Our research highlights the potential of integrating advanced technologies to overcome existing challenges in medical diagnostics, paving the way for future developments in intelligent healthcare systems. Additionally, the system is validated using 200 synthetic patient data records, ensuring robust performance and reliability.
摘要:在現代醫療保健中,解決準確疾病預測和個性化建議的複雜性既至關重要又具有挑戰性。本研究引入了 MLtoGAI,它將語義網路技術與機器學習 (ML) 相結合,以增強疾病預測並透過 ChatGPT 提供使用者友善的說明。該系統包含三個關鍵組成部分:一個可重複使用的疾病本体,其中包含有關各種疾病的詳細知識;一個診斷分類模型,它使用患者症狀來準確檢測特定疾病;以及語義網路規則語言 (SWRL) 與本体和 ChatGPT 的整合,以產生清晰、個性化的健康建議。這種方法顯著提高了預測準確性,並確保了易於理解的結果,解決了疾病和不同症狀的複雜性。MLtoGAI 系統展示了準確性和使用者滿意度的實質性進步,有助於開發更智慧且更易於取得的醫療保健解決方案。這種創新的方法結合了 ML 演算法的優點,以及透過 ChatGPT 提供透明且人類可以理解的說明的能力,在預測準確性和使用者理解方面取得了顯著的進步。透過利用語義技術和可解釋的 AI,該系統提高了疾病預測的準確性,並確保了建議與個別患者相關且易於理解。我們的研究強調了整合先進技術以克服醫療診斷中現有挑戰的潛力,為智慧醫療保健系統的未來發展鋪路。此外,該系統使用 200 個合成患者資料記錄進行驗證,確保了穩健的效能和可靠性。
Introducing δ-XAI: a novel sensitivity-based method for local AI explanations
2407.18343v2 by Alessandro De Carlo, Enea Parimbelli, Nicola Melillo, Giovanna Nicora
Explainable Artificial Intelligence (XAI) is central to the debate on integrating Artificial Intelligence (AI) and Machine Learning (ML) algorithms into clinical practice. High-performing AI/ML models, such as ensemble learners and deep neural networks, often lack interpretability, hampering clinicians' trust in their predictions. To address this, XAI techniques are being developed to describe AI/ML predictions in human-understandable terms. One promising direction is the adaptation of sensitivity analysis (SA) and global sensitivity analysis (GSA), which inherently rank model inputs by their impact on predictions. Here, we introduce a novel delta-XAI method that provides local explanations of ML model predictions by extending the delta index, a GSA metric. The delta-XAI index assesses the impact of each feature's value on the predicted output for individual instances in both regression and classification problems. We formalize the delta-XAI index and provide code for its implementation. The delta-XAI method was evaluated on simulated scenarios using linear regression models, with Shapley values serving as a benchmark. Results showed that the delta-XAI index is generally consistent with Shapley values, with notable discrepancies in models with highly impactful or extreme feature values. The delta-XAI index demonstrated higher sensitivity in detecting dominant features and handling extreme feature values. Qualitatively, the delta-XAI provides intuitive explanations by leveraging probability density functions, making feature rankings clearer and more explainable for practitioners. Overall, the delta-XAI method appears promising for robustly obtaining local explanations of ML model predictions. Further investigations in real-world clinical settings will be conducted to evaluate its impact on AI-assisted clinical workflows.
摘要:可解釋人工智慧 (XAI) 是將人工智慧 (AI) 和機器學習 (ML) 演算法整合到臨床實務中的辯論核心。高執行效能的 AI/ML 模型,例如整體學習器和深度神經網路,通常缺乏可解釋性,阻礙臨床醫生對其預測的信任。為了解決這個問題,正在開發 XAI 技術,以人類可以理解的術語描述 AI/ML 預測。一個有希望的方向是採用敏感度分析 (SA) 和全球敏感度分析 (GSA),它們本質上會依據模型輸入對預測的影響來對其進行排名。在此,我們介紹一種新的 delta-XAI 方法,透過擴充 GSA 指標 delta 指數來提供 ML 模型預測的局部解釋。delta-XAI 指數評估每個特徵值對回歸和分類問題中個別例項的預測輸出之影響。我們將 delta-XAI 指數形式化,並提供其實作的程式碼。使用線性回歸模型對模擬情境評估 delta-XAI 方法,並以 Shapley 值作為基準。結果顯示 delta-XAI 指數通常與 Shapley 值一致,但在具有高度影響力或極端特徵值的模型中存在顯著差異。delta-XAI 指數在偵測主要特徵和處理極端特徵值方面表現出更高的敏感度。定性地來說,delta-XAI 透過利用機率密度函數提供直觀的解釋,使特徵排名更清晰且對從業人員來說更具可解釋性。總體而言,delta-XAI 方法對於穩健地取得 ML 模型預測的局部解釋似乎很有希望。將在真實世界的臨床環境中進行進一步調查,以評估其對 AI 輔助臨床工作流程的影響。
Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population
2407.17324v2 by Nikolaos Ntampakis, Konstantinos Diamantaras, Ioanna Chouvarda, Vasileios Argyriou, Panagiotis Sarigianndis
Dementia, a debilitating neurological condition affecting millions worldwide, presents significant diagnostic challenges. In this work, we introduce a novel methodology for the classification of demented and non-demented elderly patients using 3D brain Magnetic Resonance Imaging (MRI) scans. Our approach features a unique technique for selectively processing MRI slices, focusing on the most relevant brain regions and excluding less informative sections. This methodology is complemented by a confidence-based classification committee composed of three custom deep learning models: Dem3D ResNet, Dem3D CNN, and Dem3D EfficientNet. These models work synergistically to enhance decision-making accuracy, leveraging their collective strengths. Tested on the Open Access Series of Imaging Studies(OASIS) dataset, our method achieved an impressive accuracy of 94.12%, surpassing existing methodologies. Furthermore, validation on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset confirmed the robustness and generalizability of our approach. The use of explainable AI (XAI) techniques and comprehensive ablation studies further substantiate the effectiveness of our techniques, providing insights into the decision-making process and the importance of our methodology. This research offers a significant advancement in dementia diagnosis, providing a highly accurate and efficient tool for clinical applications.
摘要:失智症是一種影響全球數百萬人的衰弱性神經疾病,在診斷上具有重大挑戰。在這項工作中,我們提出了一種新的方法,用於對失智和非失智老年患者進行分類,使用 3D 大腦磁振造影 (MRI) 掃描。我們的做法採用了一種獨特技術,用於選擇性處理 MRI 切片,重點關注最相關的大腦區域,並排除信息量較少的部分。這種方法由一個基於信心的分類委員會補充,該委員會由三個自定義深度學習模型組成:Dem3D ResNet、Dem3D CNN 和 Dem3D EfficientNet。這些模型協同工作以增強決策的準確性,利用它們的集體優勢。在影像研究開放存取系列 (OASIS) 資料集上進行測試,我們的模型達到了 94.12% 的驚人準確度,超過了現有方法。此外,在阿茲海默症神經影像倡議 (ADNI) 資料集上的驗證證實了我們方法的穩健性和普遍性。可解釋 AI (XAI) 技術和全面的消融研究進一步證實了我們技術的有效性,提供了對決策過程和我們方法重要性的見解。這項研究為失智症診斷提供了重大進展,為臨床應用提供了一個高度準確且高效的工具。
Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition
2408.06352v1 by Michele Fiori, Gabriele Civitarese, Claudio Bettini
Recognizing daily activities with unobtrusive sensors in smart environments enables various healthcare applications. Monitoring how subjects perform activities at home and their changes over time can reveal early symptoms of health issues, such as cognitive decline. Most approaches in this field use deep learning models, which are often seen as black boxes mapping sensor data to activities. However, non-expert users like clinicians need to trust and understand these models' outputs. Thus, eXplainable AI (XAI) methods for Human Activity Recognition have emerged to provide intuitive natural language explanations from these models. Different XAI methods generate different explanations, and their effectiveness is typically evaluated through user surveys, that are often challenging in terms of costs and fairness. This paper proposes an automatic evaluation method using Large Language Models (LLMs) to identify, in a pool of candidates, the best XAI approach for non-expert users. Our preliminary results suggest that LLM evaluation aligns with user surveys.
摘要:藉由智慧環境中不引人注目的感測器辨識日常活動,能啟用各種醫療保健應用。監控受試者在家中如何執行活動,以及其隨著時間的變化,可以揭示健康問題的早期症狀,例如認知能力下降。此領域中的大多數方法都使用深度學習模型,這些模型通常被視為將感測器資料對應至活動的黑盒子。然而,非專家使用者(例如臨床醫師)需要信任並了解這些模型的輸出。因此,人類活動辨識的可解釋 AI (XAI) 方法應運而生,以提供來自這些模型的直覺自然語言說明。不同的 XAI 方法會產生不同的說明,而其有效性通常透過使用者調查來評估,這在成本和公平性方面通常具有挑戰性。本文提出使用大型語言模型 (LLM) 的自動評估方法,以在候選者中找出最適合非專家使用者的 XAI 方法。我們的初步結果表明,LLM 評估與使用者調查一致。
Explainable AI-based Intrusion Detection System for Industry 5.0: An Overview of the Literature, associated Challenges, the existing Solutions, and Potential Research Directions
2408.03335v1 by Naseem Khan, Kashif Ahmad, Aref Al Tamimi, Mohammed M. Alani, Amine Bermak, Issa Khalil
Industry 5.0, which focuses on human and Artificial Intelligence (AI) collaboration for performing different tasks in manufacturing, involves a higher number of robots, Internet of Things (IoTs) devices and interconnections, Augmented/Virtual Reality (AR), and other smart devices. The huge involvement of these devices and interconnection in various critical areas, such as economy, health, education and defense systems, poses several types of potential security flaws. AI itself has been proven a very effective and powerful tool in different areas of cybersecurity, such as intrusion detection, malware detection, and phishing detection, among others. Just as in many application areas, cybersecurity professionals were reluctant to accept black-box ML solutions for cybersecurity applications. This reluctance pushed forward the adoption of eXplainable Artificial Intelligence (XAI) as a tool that helps explain how decisions are made in ML-based systems. In this survey, we present a comprehensive study of different XAI-based intrusion detection systems for industry 5.0, and we also examine the impact of explainability and interpretability on Cybersecurity practices through the lens of Adversarial XIDS (Adv-XIDS) approaches. Furthermore, we analyze the possible opportunities and challenges in XAI cybersecurity systems for industry 5.0 that elicit future research toward XAI-based solutions to be adopted by high-stakes industry 5.0 applications. We believe this rigorous analysis will establish a foundational framework for subsequent research endeavors within the specified domain.
摘要:工業 5.0 著重於人類與人工智慧 (AI) 合作執行製造中的不同任務,涉及更多機器人、物聯網 (IoT) 裝置和互連、擴增/虛擬實境 (AR) 和其他智慧裝置。這些裝置和互連在經濟、醫療保健、教育和國防系統等各種關鍵領域的廣泛參與,引發了多種類型的潛在安全漏洞。AI 本身已被證明是網路安全不同領域中非常有效且強大的工具,例如入侵偵測、惡意軟體偵測和網路釣魚偵測等。就像在許多應用領域一樣,網路安全專業人員不願意接受黑盒 ML 解決方案來應用於網路安全。這種不願意促使可解釋人工智慧 (XAI) 作為一種工具被採用,有助於說明在基於 ML 的系統中如何做出決策。在這項調查中,我們對工業 5.0 的不同基於 XAI 的入侵偵測系統進行了全面的研究,並且我們也透過對抗式 XIDS (Adv-XIDS) 方法的觀點來探討可解釋性和可詮釋性對網路安全實務的影響。此外,我們分析了工業 5.0 的 XAI 網路安全系統中可能存在的機會和挑戰,引發了未來針對 XAI 基礎解決方案的研究,以供高風險的工業 5.0 應用採用。我們相信這項嚴謹的分析將為指定領域內的後續研究工作建立基礎架構。
A Comparative Study on Automatic Coding of Medical Letters with Explainability
2407.13638v1 by Jamie Glen, Lifeng Han, Paul Rayson, Goran Nenadic
This study aims to explore the implementation of Natural Language Processing (NLP) and machine learning (ML) techniques to automate the coding of medical letters with visualised explainability and light-weighted local computer settings. Currently in clinical settings, coding is a manual process that involves assigning codes to each condition, procedure, and medication in a patient's paperwork (e.g., 56265001 heart disease using SNOMED CT code). There are preliminary research on automatic coding in this field using state-of-the-art ML models; however, due to the complexity and size of the models, the real-world deployment is not achieved. To further facilitate the possibility of automatic coding practice, we explore some solutions in a local computer setting; in addition, we explore the function of explainability for transparency of AI models. We used the publicly available MIMIC-III database and the HAN/HLAN network models for ICD code prediction purposes. We also experimented with the mapping between ICD and SNOMED CT knowledge bases. In our experiments, the models provided useful information for 97.98\% of codes. The result of this investigation can shed some light on implementing automatic clinical coding in practice, such as in hospital settings, on the local computers used by clinicians , project page \url{https://github.com/Glenj01/Medical-Coding}.
摘要:本研究旨在探討將自然語言處理 (NLP) 和機器學習 (ML) 技術實作於醫療信函編碼自動化,並具備視覺化說明能力和輕量化的本地電腦設定。目前在臨床環境中,編碼是一種手動流程,涉及為病患文件中的每項病症、程序和藥物指派代碼 (例如,使用 SNOMED CT 代碼 56265001 表示心臟病)。此領域有使用最新 ML 模型進行自動編碼的初步研究;然而,由於模型的複雜性和大小,並未實現實際部署。為了進一步促進自動編碼實務的可能性,我們在本地電腦設定中探討了一些解決方案;此外,我們探討了說明功能在 AI 模型透明度中的功能。我們使用公開的 MIMIC-III 資料庫和 HAN/HLAN 網路模型進行 ICD 代碼預測。我們還試驗了 ICD 和 SNOMED CT 知識庫之間的對應。在我們的實驗中,這些模型提供了 97.98% 代碼的有用資訊。這項調查結果可以為實務中的自動臨床編碼實作提供一些見解,例如在醫院環境中,由臨床醫生使用的本地電腦,專案頁面 \url{https://github.com/Glenj01/Medical-Coding}。
Explainable AI for Enhancing Efficiency of DL-based Channel Estimation
2407.07009v1 by Abdul Karim Gizzini, Yahia Medjahdi, Ali J. Ghandour, Laurent Clavier
The support of artificial intelligence (AI) based decision-making is a key element in future 6G networks, where the concept of native AI will be introduced. Moreover, AI is widely employed in different critical applications such as autonomous driving and medical diagnosis. In such applications, using AI as black-box models is risky and challenging. Hence, it is crucial to understand and trust the decisions taken by these models. Tackling this issue can be achieved by developing explainable AI (XAI) schemes that aim to explain the logic behind the black-box model behavior, and thus, ensure its efficient and safe deployment. Recently, we proposed a novel perturbation-based XAI-CHEST framework that is oriented toward channel estimation in wireless communications. The core idea of the XAI-CHEST framework is to identify the relevant model inputs by inducing high noise on the irrelevant ones. This manuscript provides the detailed theoretical foundations of the XAI-CHEST framework. In particular, we derive the analytical expressions of the XAI-CHEST loss functions and the noise threshold fine-tuning optimization problem. Hence the designed XAI-CHEST delivers a smart input feature selection methodology that can further improve the overall performance while optimizing the architecture of the employed model. Simulation results show that the XAI-CHEST framework provides valid interpretations, where it offers an improved bit error rate performance while reducing the required computational complexity in comparison to the classical DL-based channel estimation.
摘要:人工智能 (AI) 支持的決策制定是未來 6G 網路中的關鍵元素,其中將引入原生 AI 的概念。此外,AI 廣泛用於不同的關鍵應用中,例如自動駕駛和醫療診斷。在這些應用中,使用 AI 作為黑盒模型是有風險且具有挑戰性的。因此,理解和信任這些模型做出的決策至關重要。解決此問題的方法是開發可解釋 AI (XAI) 架構,旨在解釋黑盒模型行為背後的邏輯,從而確保其有效且安全的部署。最近,我們提出了一個新的基於擾動的 XAI-CHEST 框架,該框架面向無線通信中的信道估計。XAI-CHEST 框架的核心思想是通過在無關輸入上引入高噪聲來識別相關模型輸入。這份手稿提供了 XAI-CHEST 框架的詳細理論基礎。特別是,我們推導了 XAI-CHEST 損失函數和噪聲閾值微調優化問題的解析表達式。因此,設計的 XAI-CHEST 提供了一種智能輸入特徵選擇方法,可以在優化所用模型的架構的同時進一步提高整體性能。模擬結果表明,XAI-CHEST 框架提供了有效的解釋,在降低所需的計算複雜度的同時,提供了改進的比特錯誤率性能,而這與基於傳統 DL 的信道估計相比。
Explainable AI: Comparative Analysis of Normal and Dilated ResNet Models for Fundus Disease Classification
2407.05440v2 by P. N. Karthikayan, Yoga Sri Varshan V, Hitesh Gupta Kattamuri, Umarani Jayaraman
This paper presents dilated Residual Network (ResNet) models for disease classification from retinal fundus images. Dilated convolution filters are used to replace normal convolution filters in the higher layers of the ResNet model (dilated ResNet) in order to improve the receptive field compared to the normal ResNet model for disease classification. This study introduces computer-assisted diagnostic tools that employ deep learning, enhanced with explainable AI techniques. These techniques aim to make the tool's decision-making process transparent, thereby enabling medical professionals to understand and trust the AI's diagnostic decision. They are particularly relevant in today's healthcare landscape, where there is a growing demand for transparency in AI applications to ensure their reliability and ethical use. The dilated ResNet is used as a replacement for the normal ResNet to enhance the classification accuracy of retinal eye diseases and reduce the required computing time. The dataset used in this work is the Ocular Disease Intelligent Recognition (ODIR) dataset which is a structured ophthalmic database with eight classes covering most of the common retinal eye diseases. The evaluation metrics used in this work include precision, recall, accuracy, and F1 score. In this work, a comparative study has been made between normal ResNet models and dilated ResNet models on five variants namely ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The dilated ResNet model shows promising results as compared to normal ResNet with an average F1 score of 0.71, 0.70, 0.69, 0.67, and 0.70 respectively for the above respective variants in ODIR multiclass disease classification.
摘要:这篇论文提出了用于从视网膜眼底图像进行疾病分类的扩张残差网络 (ResNet) 模型。扩张卷积滤波器用于替换 ResNet 模型较高层中的正常卷积滤波器(扩张 ResNet),以改善感知场,从而针对疾病分类对正常 ResNet 模型进行改进。本研究引入了采用深度学习的计算机辅助诊断工具,并通过可解释的 AI 技术进行了增强。这些技术旨在使该工具的决策过程透明化,从而使医学专业人士能够理解和信任 AI 的诊断决策。它们与当今的医疗保健领域尤为相关,在该领域,对 AI 应用的透明度需求不断增长,以确保其可靠性和合乎道德的使用。扩张 ResNet 用作正常 ResNet 的替代品,以提高视网膜眼部疾病的分类准确性并减少所需的计算时间。本工作中使用的数据集是眼科疾病智能识别 (ODIR) 数据集,这是一个结构化的眼科数据库,包含八类涵盖大多数常见视网膜眼部疾病。本工作中使用的评估指标包括精确度、召回率、准确度和 F1 得分。在这项工作中,对 ResNet-18、ResNet-34、ResNet-50、ResNet-101 和 ResNet-152 五个变体的正常 ResNet 模型和扩张 ResNet 模型进行了比较研究。与正常 ResNet 相比,扩张 ResNet 模型显示出有希望的结果,在 ODIR 多类疾病分类中,上述各个变体的平均 F1 得分为 0.71、0.70、0.69、0.67 和 0.70。
A Survey on Trustworthiness in Foundation Models for Medical Image Analysis
2407.15851v2 by Congzhen Shi, Ryan Rezai, Jiaxi Yang, Qi Dou, Xiaoxiao Li
The rapid advancement of foundation models in medical imaging represents a significant leap toward enhancing diagnostic accuracy and personalized treatment. However, the deployment of foundation models in healthcare necessitates a rigorous examination of their trustworthiness, encompassing privacy, robustness, reliability, explainability, and fairness. The current body of survey literature on foundation models in medical imaging reveals considerable gaps, particularly in the area of trustworthiness. Additionally, existing surveys on the trustworthiness of foundation models do not adequately address their specific variations and applications within the medical imaging domain. This survey aims to fill that gap by presenting a novel taxonomy of foundation models used in medical imaging and analyzing the key motivations for ensuring their trustworthiness. We review current research on foundation models in major medical imaging applications, focusing on segmentation, medical report generation, medical question and answering (Q\&A), and disease diagnosis. These areas are highlighted because they have seen a relatively mature and substantial number of foundation models compared to other applications. We focus on literature that discusses trustworthiness in medical image analysis manuscripts. We explore the complex challenges of building trustworthy foundation models for each application, summarizing current concerns and strategies for enhancing trustworthiness. Furthermore, we examine the potential of these models to revolutionize patient care. Our analysis underscores the imperative for advancing towards trustworthy AI in medical image analysis, advocating for a balanced approach that fosters innovation while ensuring ethical and equitable healthcare delivery.
摘要:基礎模型在醫學影像方面的快速進展,代表著在加強診斷準確性和個人化治療方面邁出一大步。然而,基礎模型在醫療保健中的部署需要對其可信度進行嚴格的審查,包括隱私、穩健性、可靠性、可解釋性和公平性。目前關於醫學影像中基礎模型的調查文獻中顯示出相當大的差距,特別是在可信度方面。此外,現有關於基礎模型可信度的調查並未充分解決其在醫學影像領域中的特定變化和應用。本調查旨在通過提出醫學影像中使用的基礎模型的新分類法並分析確保其可信度的關鍵動機,來填補這一空白。我們回顧了基礎模型在主要醫學影像應用中的當前研究,重點關注分割、醫療報告生成、醫療問題和回答 (Q&A) 以及疾病診斷。這些領域之所以被強調,是因為與其他應用相比,它們已經看到相對成熟且大量的基礎模型。我們專注於探討醫學影像分析手稿中可信度的文獻。我們探討了為每個應用構建可信基礎模型的複雜挑戰,總結了當前關注點和增強可信度的策略。此外,我們探討了這些模型在革新患者護理方面的潛力。我們的分析強調了在醫學影像分析中朝著可信賴的人工智慧邁進的必要性,並倡導一種平衡的方法,既能促進創新,又能確保道德和公平的醫療保健服務。
The Impact of an XAI-Augmented Approach on Binary Classification with Scarce Data
2407.06206v1 by Ximing Wen, Rosina O. Weber, Anik Sen, Darryl Hannan, Steven C. Nesbit, Vincent Chan, Alberto Goffi, Michael Morris, John C. Hunninghake, Nicholas E. Villalobos, Edward Kim, Christopher J. MacLellan
Point-of-Care Ultrasound (POCUS) is the practice of clinicians conducting and interpreting ultrasound scans right at the patient's bedside. However, the expertise needed to interpret these images is considerable and may not always be present in emergency situations. This reality makes algorithms such as machine learning classifiers extremely valuable to augment human decisions. POCUS devices are becoming available at a reasonable cost in the size of a mobile phone. The challenge of turning POCUS devices into life-saving tools is that interpretation of ultrasound images requires specialist training and experience. Unfortunately, the difficulty to obtain positive training images represents an important obstacle to building efficient and accurate classifiers. Hence, the problem we try to investigate is how to explore strategies to increase accuracy of classifiers trained with scarce data. We hypothesize that training with a few data instances may not suffice for classifiers to generalize causing them to overfit. Our approach uses an Explainable AI-Augmented approach to help the algorithm learn more from less and potentially help the classifier better generalize.
摘要:床邊超音波 (POCUS) 是臨床醫師在患者床邊進行和解讀超音波掃描的實務。然而,解讀這些影像所需的專業知識相當可觀,而且在緊急情況下可能並非隨時具備。這種現實情況使得機器學習分類器等演算法對於加強人類決策變得極為有價值。POCUS 裝置正以合理成本推出,尺寸為手機大小。將 POCUS 裝置轉變為救生工具的挑戰在於,解讀超音波影像需要專門訓練和經驗。不幸的是,取得正向訓練影像的困難度代表著建置有效率且準確的分類器的一大障礙。因此,我們嘗試探討的問題是如何探索策略,以提高使用稀疏資料訓練的分類器的準確度。我們假設使用少數資料實例進行訓練可能不足以讓分類器概括,導致它們過度擬合。我們的做法使用可解釋 AI 增強方法,以協助演算法從較少的資料中學習更多,並潛在協助分類器更好地概括。
Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach
2407.00167v1 by Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang
In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit-vaping intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit vaping intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection.
摘要:近年來,美國見證了電子煙或電子香菸使用率大幅激增,導致電子煙和電子煙使用相關肺損傷 (EVALI) 病例顯著增加,在 2019 年 EVALI 爆發期間造成住院和死亡,凸顯了理解電子煙行為和制定有效戒菸策略的迫切性。由於社群媒體平台的普及,全球超過 47 億使用者使用它們進行連結、溝通、新聞和娛樂,其中很大一部分與健康相關,因此將社群媒體資料建立為公共衛生研究中無價的有機資料資源。在本研究中,我們從 Reddit 上一個電子煙子社群中提取一個範例資料集,以分析使用者的戒電子煙意圖。利用 OpenAI 最新的大型語言模型 GPT-4 進行句子層級的戒電子煙意圖偵測,本研究比較了此模型的結果與外行人和臨床專家註解。使用不同的提示策略,例如零次學習、一次學習、少次學習和思考鏈提示,我們開發了 8 個提示,詳細程度不同,向 GPT-4 解釋任務,並評估這些策略彼此之間的效能。這些初步發現強調了 GPT-4 在社群媒體資料分析中的潛力,特別是在識別人類偵測可能無法察覺的使用者微妙意圖方面。
Towards Compositional Interpretability for XAI
2406.17583v1 by Sean Tull, Robin Lorenz, Stephen Clark, Ilyas Khan, Bob Coecke
Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.
摘要:
Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods
2406.12142v2 by Vincent Olesen, Nina Weng, Aasa Feragen, Eike Petersen
Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.
摘要:機器學習模型在醫學影像分析中已達到整體高準確度。然而,特定患者群體的效能差異對其臨床效用、安全性與公平性構成挑戰。這可能會影響已知的患者群體(例如基於性別、年齡或疾病亞型)以及先前未知且未標籤的群體。此外,此類觀察到的效能差異的根本原因通常難以發現,阻礙了緩解措施。在本文中,為了解決這些問題,我們利用切片發現方法 (SDM) 來識別可解釋的資料效能不佳子集,並針對觀察到的效能差異原因制定假設。我們引入一種新的 SDM,並在胸部 X 光片中肺炎和肺不張分類的案例研究中應用它。我們的研究證明了 SDM 在假設制定中的有效性,並對廣泛使用的胸部 X 光片資料集和模型中先前觀察到但無法解釋的男性和女性患者之間的效能差異提供了解釋。我們的發現表明,在分類任務中,透過胸腔引流管和心電圖導線的存在,存在捷徑學習。這些捷徑特徵的盛行率存在基於性別的差異,似乎會導致觀察到的分類效能差距,這代表捷徑學習和模型公平性分析之間先前未受到重視的交互作用。
Unlocking the Potential of Metaverse in Innovative and Immersive Digital Health
2406.07114v2 by Fatemeh Ebrahimzadeh, Ramin Safa
The concept of Metaverse has attracted a lot of attention in various fields and one of its important applications is health and treatment. The Metaverse has enormous potential to transform healthcare by changing patient care, medical education, and the way teaching/learning and research are done. The purpose of this research is to provide an introduction to the basic concepts and fundamental technologies of the Metaverse. This paper examines the pros and cons of the Metaverse in healthcare context and analyzes its potential from the technology and AI perspective. In particular, the role of machine learning methods is discussed; We will explain how machine learning algorithms can be applied to the Metaverse generated data to gain better insights in healthcare applications. Additionally, we examine the future visions of the Metaverse in health delivery, by examining emerging technologies such as blockchain and also addressing privacy concerns. The findings of this study contribute to a deeper understanding of the applications of Metaverse in healthcare and its potential to revolutionize the delivery of medical services.
摘要:元宇宙的概念在各個領域都備受關注,其重要應用之一便是醫療保健。元宇宙有巨大的潛力透過改變病患照護、醫學教育,以及教學/學習和研究的方式來轉型醫療保健。本研究的目的是提供元宇宙基本概念和基礎技術的介紹。本文探討了元宇宙在醫療保健背景下的優缺點,並從技術和 AI 的角度分析其潛力。特別是,討論了機器學習方法的角色;我們將說明如何將機器學習演算法應用於元宇宙產生的資料,以獲得醫療保健應用方面的更佳見解。此外,我們透過探討區塊鏈等新興技術,並解決隱私問題,來探討元宇宙在醫療保健方面的未來願景。本研究的發現有助於更深入地了解元宇宙在醫療保健中的應用,以及其在醫療服務提供方面發揮革命性變革的潛力。
AI-Driven Predictive Analytics Approach for Early Prognosis of Chronic Kidney Disease Using Ensemble Learning and Explainable AI
2406.06728v2 by K M Tawsik Jawad, Anusha Verma, Fathi Amsaad, Lamia Ashraf
Chronic Kidney Disease (CKD) is one of the widespread Chronic diseases with no known ultimo cure and high morbidity. Research demonstrates that progressive Chronic Kidney Disease (CKD) is a heterogeneous disorder that significantly impacts kidney structure and functions, eventually leading to kidney failure. With the progression of time, chronic kidney disease has moved from a life-threatening disease affecting few people to a common disorder of varying severity. The goal of this research is to visualize dominating features, feature scores, and values exhibited for early prognosis and detection of CKD using ensemble learning and explainable AI. For that, an AI-driven predictive analytics approach is proposed to aid clinical practitioners in prescribing lifestyle modifications for individual patients to reduce the rate of progression of this disease. Our dataset is collected on body vitals from individuals with CKD and healthy subjects to develop our proposed AI-driven solution accurately. In this regard, blood and urine test results are provided, and ensemble tree-based machine-learning models are applied to predict unseen cases of CKD. Our research findings are validated after lengthy consultations with nephrologists. Our experiments and interpretation results are compared with existing explainable AI applications in various healthcare domains, including CKD. The comparison shows that our developed AI models, particularly the Random Forest model, have identified more features as significant contributors than XgBoost. Interpretability (I), which measures the ratio of important to masked features, indicates that our XgBoost model achieved a higher score, specifically a Fidelity of 98\%, in this metric and naturally in the FII index compared to competing models.
摘要:慢性腎臟病 (CKD) 是一種廣泛的慢性疾病,目前尚未找到最終的治療方法,且發病率很高。研究表明,進行性慢性腎臟病 (CKD) 是一種異質性疾病,會顯著影響腎臟結構和功能,最終導致腎衰竭。隨著時間的推移,慢性腎臟病已從影響少數人的致命疾病演變成一種嚴重程度不一的常見疾病。本研究的目標是使用整體學習和可解釋的 AI 來視覺化支配性特徵、特徵分數和值,以進行 CKD 的早期預後和檢測。為此,提出了一種 AI 驅動的預測分析方法,以幫助臨床醫生為個別患者開具生活方式的修改建議,以降低此疾病的進展速度。我們的數據集是從 CKD 患者和健康受試者的身體生命徵象中收集的,以準確開發我們提出的 AI 驅動的解決方案。在這方面,提供了血液和尿液檢測結果,並應用基於集成樹的機器學習模型來預測未見的 CKD 病例。我們的研究結果在與腎臟科醫師進行長時間諮詢後得到驗證。我們的實驗和解釋結果與各種醫療保健領域中現有的可解釋 AI 應用進行了比較,包括 CKD。比較表明,我們開發的 AI 模型,特別是隨機森林模型,已經確定了比 XgBoost 更多的特徵作為顯著的貢獻者。可解釋性 (I) 衡量重要特徵與被遮蔽特徵的比率,表明我們的 XgBoost 模型在此指標中取得了更高的分數,特別是 98% 的保真度,並且在 FII 指數中自然高於競爭模型。
Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook
2406.05984v1 by Yusif Ibrahimov, Tarique Anwar, Tommy Yuan
Mental health constitutes a complex and pervasive global challenge, affecting millions of lives and often leading to severe consequences. In this paper, we conduct a thorough survey to explore the intersection of data science, artificial intelligence, and mental healthcare, focusing on the recent developments of mental disorder detection through online social media (OSM). A significant portion of the population actively engages in OSM platforms, creating a vast repository of personal data that holds immense potential for mental health analytics. The paper navigates through traditional diagnostic methods, state-of-the-art data- and AI-driven research studies, and the emergence of explainable AI (XAI) models for mental healthcare. We review state-of-the-art machine learning methods, particularly those based on modern deep learning, while emphasising the need for explainability in healthcare AI models. The experimental design section provides insights into prevalent practices, including available datasets and evaluation approaches. We also identify key issues and challenges in the field and propose promising future research directions. As mental health decisions demand transparency, interpretability, and ethical considerations, this paper contributes to the ongoing discourse on advancing XAI in mental healthcare through social media. The comprehensive overview presented here aims to guide researchers, practitioners, and policymakers in developing the area of mental disorder detection.
摘要:心理健康構成了一項複雜且普遍的全球挑戰,影響了數百萬人的生活,並經常導致嚴重的後果。在本文中,我們進行了一項徹底的調查,以探索數據科學、人工智慧和心理保健的交集,重點關注通過線上社交媒體 (OSM) 進行心理疾病檢測的最新發展。很大一部分人口積極參與 OSM 平台,創造了一個龐大的人員資料庫,對心理健康分析具有巨大的潛力。本文探討了傳統的診斷方法、最先進的資料和 AI 驅動的研究,以及心理保健中可解釋 AI (XAI) 模型的出現。我們回顧了最先進的機器學習方法,特別是那些基於現代深度學習的方法,同時強調了醫療保健 AI 模型中可解釋性的必要性。實驗設計部分提供了對普遍做法的見解,包括可用的資料集和評估方法。我們還找出該領域的主要問題和挑戰,並提出了有希望的未來研究方向。由於心理健康決策需要透明度、可解釋性和道德考量,本文有助於推進心理保健中透過社交媒體推進 XAI 的持續討論。這裡提出的全面概述旨在引導研究人員、從業人員和政策制定者發展心理疾病檢測領域。
Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance
2406.05746v1 by Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou
AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.
摘要:
Advancing Histopathology-Based Breast Cancer Diagnosis: Insights into Multi-Modality and Explainability
2406.12897v1 by Faseela Abdullakutty, Younes Akbari, Somaya Al-Maadeed, Ahmed Bouridane, Rifat Hamoudi
It is imperative that breast cancer is detected precisely and timely to improve patient outcomes. Diagnostic methodologies have traditionally relied on unimodal approaches; however, medical data analytics is integrating diverse data sources beyond conventional imaging. Using multi-modal techniques, integrating both image and non-image data, marks a transformative advancement in breast cancer diagnosis. The purpose of this review is to explore the burgeoning field of multimodal techniques, particularly the fusion of histopathology images with non-image data. Further, Explainable AI (XAI) will be used to elucidate the decision-making processes of complex algorithms, emphasizing the necessity of explainability in diagnostic processes. This review utilizes multi-modal data and emphasizes explainability to enhance diagnostic accuracy, clinician confidence, and patient engagement, ultimately fostering more personalized treatment strategies for breast cancer, while also identifying research gaps in multi-modality and explainability, guiding future studies, and contributing to the strategic direction of the field.
摘要:精確且及時地偵測乳癌對於改善患者預後至關重要。診斷方法傳統上依賴於單一模式方法;然而,醫療資料分析正在整合超越傳統影像的各種資料來源。使用整合影像和非影像資料的多模式技術,標誌著乳癌診斷的變革性進展。本篇綜述的目的是探討多模式技術的新興領域,特別是將組織病理學影像與非影像資料融合。此外,可解釋人工智慧 (XAI) 將用於闡明複雜演算法的決策過程,強調診斷過程中可解釋性的必要性。本綜述利用多模式資料並強調可解釋性,以提高診斷準確性、臨床醫師的信心和患者參與度,最終促進乳癌更個人化的治療策略,同時也找出多模式和可解釋性的研究差距,引導未來的研究,並為該領域的策略方向做出貢獻。
Using Explainable AI for EEG-based Reduced Montage Neonatal Seizure Detection
2406.16908v3 by Dinuka Sandun Udayantha, Kavindu Weerasinghe, Nima Wickramasinghe, Akila Abeyratne, Kithmin Wickremasinghe, Jithangi Wanigasinghe, Anjula De Silva, Chamira U. S. Edussooriya
The neonatal period is the most vulnerable time for the development of seizures. Seizures in the immature brain lead to detrimental consequences, therefore require early diagnosis. The gold-standard for neonatal seizure detection currently relies on continuous video-EEG monitoring; which involves recording multi-channel electroencephalogram (EEG) alongside real-time video monitoring within a neonatal intensive care unit (NICU). However, video-EEG monitoring technology requires clinical expertise and is often limited to technologically advanced and resourceful settings. Cost-effective new techniques could help the medical fraternity make an accurate diagnosis and advocate treatment without delay. In this work, a novel explainable deep learning model to automate the neonatal seizure detection process with a reduced EEG montage is proposed, which employs convolutional nets, graph attention layers, and fully connected layers. Beyond its ability to detect seizures in real-time with a reduced montage, this model offers the unique advantage of real-time interpretability. By evaluating the performance on the Zenodo dataset with 10-fold cross-validation, the presented model achieves an absolute improvement of 8.31% and 42.86% in area under curve (AUC) and recall, respectively.
摘要:新生兒期是大腦發育最脆弱的時期,容易出現癲癇發作。大腦發育不成熟時出現癲癇發作會造成不良後果,因此需要及早診斷。目前新生兒癲癇發作的黃金標準依賴於連續的視訊腦電圖 (EEG) 監測;其中包括在新生兒加護病房 (NICU) 內同時進行多頻道腦電圖 (EEG) 記錄和即時視訊監控。然而,視訊腦電圖監控技術需要臨床專業知識,而且通常僅限於技術先進且資源豐富的環境。具成本效益的新技術可以幫助醫療界準確診斷並立即提倡治療。在這項工作中,提出了一個新穎的可解釋深度學習模型,以自動化新生兒癲癇發作偵測過程,並採用減少的腦電圖裝置,其中採用了卷積神經網路、圖形注意力層和全連接層。除了能夠使用減少的裝置即時偵測癲癇發作外,此模型還提供了即時可解釋性的獨特優勢。透過在 Zenodo 資料集上使用 10 倍交叉驗證評估效能,所提出的模型在曲線下面積 (AUC) 和召回率方面分別達到了 8.31% 和 42.86% 的絕對改善。
Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques
2406.00532v1 by Samita Bai, Sidra Nasir, Rizwan Ahmed Khan, Sheeraz Arif, Alexandre Meyer, Hubert Konik
Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies continue to permeate the healthcare sector, particularly in oncology, the need for transparent and interpretable models becomes imperative to enhance clinical decision-making and patient care. This review discusses the integration of various XAI approaches, such as SHAP, LIME, Grad-CAM, and others, with machine learning and deep learning models utilized in breast cancer detection and classification. By investigating the modalities of breast cancer datasets, including mammograms, ultrasounds and their processing with AI, the paper highlights how XAI can lead to more accurate diagnoses and personalized treatment plans. It also examines the challenges in implementing these techniques and the importance of developing standardized metrics for evaluating XAI's effectiveness in clinical settings. Through detailed analysis and discussion, this article aims to highlight the potential of XAI in bridging the gap between complex AI models and practical healthcare applications, thereby fostering trust and understanding among medical professionals and improving patient outcomes.
摘要:乳癌 (BC) 是影響全球女性最常見的惡性腫瘤之一,因此需要進步的診斷方法,以改善臨床結果。本文全面探討了可解釋人工智慧 (XAI) 技術在乳癌偵測和診斷中的應用。隨著人工智慧 (AI) 技術持續滲透醫療保健領域,特別是在腫瘤學中,透明且可解釋的模型需求變得勢在必行,以增強臨床決策制定和患者照護。此篇評論探討了各種 XAI 方法的整合,例如 SHAP、LIME、Grad-CAM 等,以及用於乳癌偵測和分類的機器學習和深度學習模型。透過探討乳癌資料集的模式,包括乳房攝影、超音波及其在 AI 中的處理,本文重點說明 XAI 如何能導致更準確的診斷和個人化治療計畫。它也探討了實施這些技術的挑戰,以及制定標準化評量指標以評估 XAI 在臨床環境中的有效性的重要性。透過詳細的分析和討論,本文旨在強調 XAI 在縮小複雜 AI 模型與實務醫療保健應用之間差距的潛力,進而促進醫療專業人員之間的信任與理解,並改善患者的結果。
Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
2406.01624v2 by Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara
Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. The source code of this paper is publicly available via this https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition.
摘要:語音情緒辨識 (SER) 由於其在心理健康、教育和人機互動等多個應用領域而備受關注。然而,SER 系統的準確性受到高維特徵集的阻礙,這些特徵集可能包含不相關和冗餘的資訊。為了克服這個挑戰,本研究提出了一種用於 SER 的迭代特徵提升方法,該方法強調特徵相關性和可解釋性,以增強機器學習模型的效能。我們的做法涉及仔細的特徵選擇和分析,以建立高效的 SER 系統。為了透過模型可解釋性解決我們的核心問題,我們採用了具有 Shapley 值的特徵評估迴圈,以反覆改善特徵集。這個過程在模型效能和透明度之間取得平衡,這使得我們能夠全面了解模型的預測。所提出的方法提供了多項優點,包括識別和移除不相關和冗餘的特徵,從而建立更有效的模型。此外,它促進了可解釋性,有助於理解模型的預測以及識別情緒決定的關鍵特徵。所提出的方法的有效性已在多倫多情緒語音集 (TESS)、柏林情緒語音資料庫 (EMO-DB)、賴爾森音訊視覺情緒語音和歌曲資料庫 (RAVDESS) 和薩里音訊視覺表達情緒 (SAVEE) 資料集的 SER 基準上得到驗證,其效能優於現有方法。據我們所知,這是第一個將模型可解釋性納入 SER 架構的研究。本文的原始碼可透過此連結公開取得:https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition。
The Explanation Necessity for Healthcare AI
2406.00216v1 by Michail Mamalakis, Héloïse de Vareilles, Graham Murray, Pietro Lio, John Suckling
Explainability is often critical to the acceptable implementation of artificial intelligence (AI). Nowhere is this more important than healthcare where decision-making directly impacts patients and trust in AI systems is essential. This trust is often built on the explanations and interpretations the AI provides. Despite significant advancements in AI interpretability, there remains the need for clear guidelines on when and to what extent explanations are necessary in the medical context. We propose a novel categorization system with four distinct classes of explanation necessity, guiding the level of explanation required: patient or sample (local) level, cohort or dataset (global) level, or both levels. We introduce a mathematical formulation that distinguishes these categories and offers a practical framework for researchers to determine the necessity and depth of explanations required in medical AI applications. Three key factors are considered: the robustness of the evaluation protocol, the variability of expert observations, and the representation dimensionality of the application. In this perspective, we address the question: When does an AI medical application need to be explained, and at what level of detail?
摘要:可解释性通常对于人工智能 (AI) 的可接受实施至关重要。在医疗保健领域,这一点尤为重要,因为决策直接影响患者,并且对 AI 系统的信任至关重要。这种信任通常建立在 AI 提供的解释和诠释之上。尽管 AI 可解释性取得了重大进展,但仍然需要明确的指导方针,说明在医疗环境中何时以及在多大程度上需要解释。我们提出了一种新颖的分类系统,该系统具有四种不同的解释必要性类别,指导所需的解释级别:患者或样本(局部)级别、队列或数据集(全局)级别,或两个级别。我们引入了一个数学公式,该公式区分了这些类别,并为研究人员提供了一个实用框架,以确定医疗 AI 应用中所需的解释的必要性和深度。考虑了三个关键因素:评估协议的稳健性、专家观察的可变性以及应用程序的表示维数。从这个角度来看,我们解决了这个问题:AI 医疗应用何时需要解释,以及需要解释到何种程度?
Interdisciplinary Expertise to Advance Equitable Explainable AI
2406.18563v1 by Chloe R. Bennett, Heather Cole-Lewis, Stephanie Farquhar, Naama Haamel, Boris Babenko, Oran Lang, Mat Fleck, Ilana Traynis, Charles Lau, Ivor Horn, Courtney Lyles
The field of artificial intelligence (AI) is rapidly influencing health and healthcare, but bias and poor performance persists for populations who face widespread structural oppression. Previous work has clearly outlined the need for more rigorous attention to data representativeness and model performance to advance equity and reduce bias. However, there is an opportunity to also improve the explainability of AI by leveraging best practices of social epidemiology and health equity to help us develop hypotheses for associations found. In this paper, we focus on explainable AI (XAI) and describe a framework for interdisciplinary expert panel review to discuss and critically assess AI model explanations from multiple perspectives and identify areas of bias and directions for future research. We emphasize the importance of the interdisciplinary expert panel to produce more accurate, equitable interpretations which are historically and contextually informed. Interdisciplinary panel discussions can help reduce bias, identify potential confounders, and identify opportunities for additional research where there are gaps in the literature. In turn, these insights can suggest opportunities for AI model improvement.
摘要:人工智慧 (AI) 領域正快速影響著健康與醫療保健,但對於面臨廣泛結構性壓迫的人群來說,偏見和不良表現依然存在。先前的研究已清楚說明,需要更嚴格地注意資料代表性和模型效能,以促進公平性並減少偏見。然而,我們有機會透過運用社會流行病學和健康公平的最佳實務,來改善 AI 的可解釋性,以幫助我們針對發現的關聯性,發展假設。在本文中,我們專注於可解釋 AI (XAI),並描述一個跨領域專家小組審查架構,以從多重觀點討論和批判性評估 AI 模型的解釋,並找出偏見領域和未來研究的方向。我們強調跨領域專家小組對於產生更準確、公平的詮釋至關重要,而這些詮釋是根據歷史和脈絡而來的。跨領域小組討論有助於減少偏見、找出潛在的混淆因素,並在文獻中有缺口時找出額外研究的機會。反過來,這些見解可以建議 AI 模型改進的機會。
"It depends": Configuring AI to Improve Clinical Usefulness Across Contexts
2407.11978v1 by Hubert D. Zając, Jorge M. N. Ribeiro, Silvia Ingala, Simona Gentile, Ruth Wanjohi, Samuel N. Gitau, Jonathan F. Carlsen, Michael B. Nielsen, Tariq O. Andersen
Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in Denmark and Kenya, based on three iterations of a functional AI-based prototype. Ten sociotechnical dependencies were identified as crucial for the design of AI in radiology. We conceptualised four technical dimensions that must be configured to the intended clinical context of use: AI functionality, AI medical focus, AI decision threshold, and AI Explainability. We present four design recommendations on how to address dependencies pertaining to the medical knowledge, clinic type, user expertise level, patient context, and user situation that condition the configuration of these technical dimensions.
摘要:人工智慧(AI)在實驗室實驗中不斷地與放射科醫師匹敵或表現得更出色。然而,發現放射科 AI 為基礎系統的實際執行幾乎沒有提供臨床價值。本文探討如何為 AI 設計在不同情境中臨床上的效用。我們根據功能性 AI 為基礎原型的三次迭代,在丹麥和肯亞的 7 個臨床場域與 13 位放射科醫師進行了 19 次設計會議和設計介入。十個社會技術依賴關係被認為對於放射科中 AI 的設計至關重要。我們概念化了四個技術面向,必須根據預期的臨床使用情境進行設定:AI 功能、AI 醫療重點、AI 決策門檻,以及 AI 可解釋性。我們提出四項設計建議,說明如何處理與醫療知識、診所類型、使用者專業知識等級、患者情境,以及影響這些技術面向設定的使用者情境相關的依賴關係。
Improving Health Professionals' Onboarding with AI and XAI for Trustworthy Human-AI Collaborative Decision Making
2405.16424v1 by Min Hun Lee, Silvana Xin Yi Choo, Shamala D/O Thilarajah
With advanced AI/ML, there has been growing research on explainable AI (XAI) and studies on how humans interact with AI and XAI for effective human-AI collaborative decision-making. However, we still have a lack of understanding of how AI systems and XAI should be first presented to users without technical backgrounds. In this paper, we present the findings of semi-structured interviews with health professionals (n=12) and students (n=4) majoring in medicine and health to study how to improve onboarding with AI and XAI. For the interviews, we built upon human-AI interaction guidelines to create onboarding materials of an AI system for stroke rehabilitation assessment and AI explanations and introduce them to the participants. Our findings reveal that beyond presenting traditional performance metrics on AI, participants desired benchmark information, the practical benefits of AI, and interaction trials to better contextualize AI performance, and refine the objectives and performance of AI. Based on these findings, we highlight directions for improving onboarding with AI and XAI and human-AI collaborative decision-making.
摘要:隨著先進的 AI/ML,對可解釋 AI (XAI) 的研究不斷增加,以及關於人類如何與 AI 和 XAI 互動以進行有效的人工智慧協作決策制定。然而,我們仍然缺乏對 AI 系統和 XAI 應如何首先呈現給沒有技術背景的用戶的了解。在本文中,我們展示了與醫療專業人員 (n=12) 和主修醫學和健康的學生 (n=4) 進行半結構化訪談的結果,以研究如何改善 AI 和 XAI 的入門。對於訪談,我們建立在人機互動準則之上,為中風康復評估和 AI 解釋的 AI 系統創建入門材料,並將它們介紹給參與者。我們的研究結果表明,除了呈現傳統的 AI 性能指標外,參與者還希望基准信息、AI 的實際好處以及交互試驗,以更好地將 AI 性能情境化,並完善 AI 的目標和性能。根據這些發現,我們強調了改進 AI 和 XAI 以及人機協作決策制定的入門方向。
Exploring Nutritional Impact on Alzheimer's Mortality: An Explainable AI Approach
2405.17502v1 by Ziming Liu, Longjian Liu, Robert E. Heidel, Xiaopeng Zhao
This article uses machine learning (ML) and explainable artificial intelligence (XAI) techniques to investigate the relationship between nutritional status and mortality rates associated with Alzheimers disease (AD). The Third National Health and Nutrition Examination Survey (NHANES III) database is employed for analysis. The random forest model is selected as the base model for XAI analysis, and the Shapley Additive Explanations (SHAP) method is used to assess feature importance. The results highlight significant nutritional factors such as serum vitamin B12 and glycated hemoglobin. The study demonstrates the effectiveness of random forests in predicting AD mortality compared to other diseases. This research provides insights into the impact of nutrition on AD and contributes to a deeper understanding of disease progression.
摘要:本文使用機器學習 (ML) 和可解釋人工智慧 (XAI) 技術來探討營養狀況與阿茲海默症 (AD) 相關的死亡率之間的關係。採用第三次全國健康與營養檢查調查 (NHANES III) 資料庫進行分析。選擇隨機森林模型作為 XAI 分析的基礎模型,並使用 Shapley Additive Explanations (SHAP) 方法來評估特徵重要性。結果突顯了重要的營養因素,例如血清維生素 B12 和糖化血紅蛋白。該研究證明了隨機森林在預測 AD 死亡率方面相較於其他疾病的有效性。本研究提供了營養對 AD 的影響的見解,並有助於更深入地了解疾病的進展。
Explainable AI Enhances Glaucoma Referrals, Yet the Human-AI Team Still Falls Short of the AI Alone
2407.11974v1 by Catalina Gomez, Ruolin Wang, Katharina Breininger, Corinne Casey, Chris Bradley, Mitchell Pavlak, Alex Pham, Jithin Yohannan, Mathias Unberath
Primary care providers are vital for initial triage and referrals to specialty care. In glaucoma, asymptomatic and fast progression can lead to vision loss, necessitating timely referrals to specialists. However, primary eye care providers may not identify urgent cases, potentially delaying care. Artificial Intelligence (AI) offering explanations could enhance their referral decisions. We investigate how various AI explanations help providers distinguish between patients needing immediate or non-urgent specialist referrals. We built explainable AI algorithms to predict glaucoma surgery needs from routine eyecare data as a proxy for identifying high-risk patients. We incorporated intrinsic and post-hoc explainability and conducted an online study with optometrists to assess human-AI team performance, measuring referral accuracy and analyzing interactions with AI, including agreement rates, task time, and user experience perceptions. AI support enhanced referral accuracy among 87 participants (59.9%/50.8% with/without AI), though Human-AI teams underperformed compared to AI alone. Participants believed they included AI advice more when using the intrinsic model, and perceived it more useful and promising. Without explanations, deviations from AI recommendations increased. AI support did not increase workload, confidence, and trust, but reduced challenges. On a separate test set, our black-box and intrinsic models achieved an accuracy of 77% and 71%, respectively, in predicting surgical outcomes. We identify opportunities of human-AI teaming for glaucoma management in primary eye care, noting that while AI enhances referral accuracy, it also shows a performance gap compared to AI alone, even with explanations. Human involvement remains essential in medical decision making, underscoring the need for future research to optimize collaboration, ensuring positive experiences and safe AI use.
摘要:
Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery
2406.18552v1 by Yingying Fang, Zihao Jin, Xiaodan Xing, Simon Walsh, Guang Yang
In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To bridge this gap, we propose an explainable model that is equipped with both decision reasoning and feature identification capabilities. Our approach not only detects influential image patterns but also uncovers the decisive features that drive the model's final predictions. By implementing our method, we can efficiently identify and visualise class-specific features leveraged by the data-driven model, providing insights into the decision-making processes of deep learning models. We validated our model in the demanding realm of medical prognosis task, demonstrating its efficacy and potential in enhancing the reliability of AI in healthcare and in discovering new knowledge in diseases where prognostic understanding is limited.
摘要:在醫學影像中,特別是在早期疾病檢測和預後任務中,辨別 AI 模型預測背後的原理對於評估其決策的可靠性至關重要。傳統的解釋方法在識別醫學影像分類中可識別的決定性特徵時面臨挑戰,其中區別性特徵很微妙或並不明顯。為了彌合這一差距,我們提出了一個可解釋的模型,該模型具備決策推理和特徵識別能力。我們的做法不僅檢測有影響力的影像模式,還揭示了推動模型最終預測的決定性特徵。通過實施我們的模型,我們可以有效識別和視覺化由數據驅動模型利用的類特定特徵,從而深入了解深度學習模型的決策過程。我們在要求嚴格的醫學預後任務領域驗證了我們的模型,展示了其在提高 AI 在醫療保健中的可靠性和發現預後理解受限疾病的新知識方面的功效和潛力。
The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach
2405.13099v1 by Mohsen Jozani, Jason A. Williams, Ahmed Aleroud, Sarbottam Bhagat
This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded in informational support exchanges, demonstrating the importance of emotion in providing informational support. This complex interplay between emotional and informational support has not been previously researched. The study refines social support theory and lays the groundwork for the development of user decision aids. Further implications are discussed.
摘要:本研究探討線上健康社群中尋求資訊支持的問題、回應,以及有幫助的評分之間的關係。我們建立了一組標記的問答配對資料集,並開發了多模態機器學習和深度學習模型,以可靠地預測資訊支持問題和回應。我們採用可解釋的 AI 來揭示資訊支持交流中蘊含的情緒,證明情緒在提供資訊支持中的重要性。這種情緒支持和資訊支持之間的複雜交互作用以前並未被研究過。本研究改進了社會支持理論,並為使用者決策輔助工具的開發奠定了基礎。討論了進一步的影響。
ChatGPT in Classrooms: Transforming Challenges into Opportunities in Education
2405.10645v1 by Harris Bin Munawar, Nikolaos Misirlis
In the era of exponential technology growth, one unexpected guest has claimed a seat in classrooms worldwide, Artificial Intelligence. Generative AI, such as ChatGPT, promises a revolution in education, yet it arrives with a double-edged sword. Its potential for personalized learning is offset by issues of cheating, inaccuracies, and educators struggling to incorporate it effectively into their lesson design. We are standing on the brink of this educational frontier, and it is clear that we need to navigate this terrain with a lot of care. This is a major challenge that could undermine the integrity and value of our educational process. So, how can we turn these challenges into opportunities? When used inappropriately, AI tools can become the perfect tool for the cut copy paste mentality, and quickly begin to corrode critical thinking, creativity, and deep understanding, the most important skills in our rapidly changing world. Teachers feel that they are not equipped to leverage this technology, widening the digital divide among educators and institutions. Addressing these concerns calls for an in depth research approach. We will employ empirical research, drawing on the Technology Acceptance Model, to assess the attitudes toward generative AI among educators and students. Understanding their perceptions, usage patterns, and hurdles is the first crucial step in creating an effective solution. The present study will be used as a process manual for future researchers to apply, running their own data, based on the steps explained here
摘要:在科技飛速發展的時代,一位意外的訪客已在全球教室中佔有一席之地,那就是人工智慧。生成式 AI,例如 ChatGPT,承諾在教育領域掀起一場革命,但它卻是一把雙面刃。它在個人化學習方面的潛力,卻因作弊、不準確以及教育工作者難以將其有效融入教學設計等問題而抵銷。我們正站在這教育前沿的邊緣,顯然我們需要非常小心地探索這片領域。這是一個重大的挑戰,可能會損害我們教育過程的完整性和價值。那麼,我們如何將這些挑戰轉化為機遇?當不適當地使用時,AI 工具可能會成為複製貼上心態的完美工具,並迅速腐蝕批判性思維、創造力和深入理解,這些都是我們快速變化的世界中最重要的技能。教師們覺得他們沒有能力利用這項技術,這擴大了教育工作者和機構之間的數位鴻溝。解決這些問題需要深入的研究方法。我們將採用實證研究,借鑑技術接受模型,來評估教育工作者和學生對生成式 AI 的態度。了解他們的看法、使用模式和障礙是創造有效解決方案的第一個關鍵步驟。本研究將作為未來研究人員應用的流程手冊,根據此處說明的步驟運行他們自己的數據
Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data
2405.07590v1 by Camelia Oprea, Mike Grüne, Mateusz Buglowski, Lena Olivier, Thorsten Orlikowsky, Stefan Kowalewski, Mark Schoberer, André Stollenwerk
With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.
摘要:隨著醫療保健系統的數位化,人工智慧在醫學領域中變得更加普及。特別是機器學習在時間序列分類等複雜任務中展現出極大的潛力,但通常是以透明度和可理解性為代價。這導致人類缺乏信任,從而阻礙了其積極使用。可解釋的人工智慧試圖通過提供對決策過程的洞察來彌補這一差距,但其不同方法的實際效用尚不清楚。本文提出了一個基於使用者研究的評估,其中包含了 Grad-CAM 解釋方法,並將其應用於神經網路以分類時間序列新生兒呼吸數據中的呼吸。我們展示了不同利益相關者對可解釋性方法的感知效用,揭示了實現實際透明度的難度,以及許多參與者希望獲得更深入的解釋。
XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare
2405.06270v3 by Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia, Eugenio di Sciascio
The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications.
摘要:大型語言模型 (LLM) 與醫療診斷整合 為臨床決策提供了一個有前景的途徑。本研究概述了一種新穎方法的開發,用於零次學習/少量學習情境學習 (ICL),方法是使用多層結構化提示整合醫療領域知識。我們還探討了使用者與 LLM 之間兩種溝通方式的功效:數值對話 (NC) 方式,它會逐步處理資料,以及自然語言單回合 (NL-ST) 方式,它會使用長篇敘事提示。 我們的研究系統性地評估了診斷準確性和風險因子,包括性別偏見和假陰性率,使用了一個包含 920 個患者記錄的資料集,採用各種少量學習情境。結果表明,傳統的臨床機器學習 (ML) 模型通常在零次學習和少量學習設定中表現優於 LLM。然而,當使用少量學習範例以及有效的可解釋 AI (XAI) 方法作為領域知識來源時,效能差距會顯著縮小。此外,隨著時間充足和範例數量增加,對話方式 (NC) 幾乎可以媲美 ML 模型的效能。最值得注意的是,LLM 相對於 ML 模型展現出相當或更佳的成本敏感準確度。 本研究證實,透過適當的領域知識和量身打造的溝通策略,LLM 可以顯著增強診斷程序。這些發現突顯了最佳化訓練範例數量和溝通方式的重要性,以提高準確度並減少 LLM 應用中的偏差。
To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems
2405.05766v1 by Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Manuel González-Hidalgo, Maria Gemma Sempere Campello, Juan Antonio Palmer Sancho
The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel approach for measuring user trust in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted a case study in a realistic medical scenario: the usage of XAI system for the detection of pneumonia from x-ray images.
摘要:隨著對深度學習模型依賴性的增加,加上其固有的透明度不足,促使一個新的研究領域發展,稱為可解釋 AI (XAI) 方法。這些方法旨在透過深入了解決策背後的原理,來提升最終使用者對自動化系統的信賴。本文提出了一種衡量使用者對 XAI 系統信賴度的新穎方法,允許對其進行改進。我們提出的指標結合了客觀觀點下的效能指標和信賴指標。為了驗證這個新穎的方法,我們在一個真實的醫療場景中進行了一個案例研究:使用 XAI 系統從 X 光影像中偵測肺炎。
Region-specific Risk Quantification for Interpretable Prognosis of COVID-19
2405.02815v1 by Zhusi Zhong, Jie Li, Zhuoqi Ma, Scott Collins, Harrison Bai, Paul Zhang, Terrance Healey, Xinbo Gao, Michael K. Atalay, Zhicheng Jiao
The COVID-19 pandemic has strained global public health, necessitating accurate diagnosis and intervention to control disease spread and reduce mortality rates. This paper introduces an interpretable deep survival prediction model designed specifically for improved understanding and trust in COVID-19 prognosis using chest X-ray (CXR) images. By integrating a large-scale pretrained image encoder, Risk-specific Grad-CAM, and anatomical region detection techniques, our approach produces regional interpretable outcomes that effectively capture essential disease features while focusing on rare but critical abnormal regions. Our model's predictive results provide enhanced clarity and transparency through risk area localization, enabling clinicians to make informed decisions regarding COVID-19 diagnosis with better understanding of prognostic insights. We evaluate the proposed method on a multi-center survival dataset and demonstrate its effectiveness via quantitative and qualitative assessments, achieving superior C-indexes (0.764 and 0.727) and time-dependent AUCs (0.799 and 0.691). These results suggest that our explainable deep survival prediction model surpasses traditional survival analysis methods in risk prediction, improving interpretability for clinical decision making and enhancing AI system trustworthiness.
摘要:COVID-19 疫情對全球公共衛生造成壓力,必須進行準確的診斷和干預,以控制疾病傳播並降低死亡率。本文介紹了一個可解釋的深度生存預測模型,專門設計用於透過胸部 X 光 (CXR) 影像改善對 COVID-19 預後的理解和信賴。透過整合大規模預訓練影像編碼器、風險特定 Grad-CAM 和解剖區域偵測技術,我們的做法產生區域可解釋的結果,有效捕捉必要的疾病特徵,同時專注於罕見但關鍵的異常區域。我們的模型預測結果透過風險區域定位提供增強的清晰度和透明度,讓臨床醫生能夠在更了解預後見解的情況下,就 COVID-19 診斷做出明智的決策。我們在多中心生存資料集上評估所提出的方法,並透過量化和質化評估證明其有效性,達到優異的 C 指數(0.764 和 0.727)和時間相關 AUC(0.799 和 0.691)。這些結果表明,我們可解釋的深度生存預測模型在風險預測方面超越傳統的生存分析方法,提升臨床決策的解釋性,並增強 AI 系統的信賴度。
Rad4XCNN: a new agnostic method for post-hoc global explanation of CNN-derived features by means of radiomics
2405.02334v2 by Francesco Prinzi, Carmelo Militello, Calogero Zarcaro, Tommaso Vincenzo Bartolotta, Salvatore Gaglio, Salvatore Vitabile
In recent years, machine learning-based clinical decision support systems (CDSS) have played a key role in the analysis of several medical conditions. Despite their promising capabilities, the lack of transparency in AI models poses significant challenges, particularly in medical contexts where reliability is a mandatory aspect. However, it appears that explainability is inversely proportional to accuracy. For this reason, achieving transparency without compromising predictive accuracy remains a key challenge. This paper presents a novel method, namely Rad4XCNN, to enhance the predictive power of CNN-derived features with the inherent interpretability of radiomic features. Rad4XCNN diverges from conventional methods based on saliency maps, by associating intelligible meaning to CNN-derived features by means of Radiomics, offering new perspectives on explanation methods beyond visualization maps. Using a breast cancer classification task as a case study, we evaluated Rad4XCNN on ultrasound imaging datasets, including an online dataset and two in-house datasets for internal and external validation. Some key results are: i) CNN-derived features guarantee more robust accuracy when compared against ViT-derived and radiomic features; ii) conventional visualization map methods for explanation present several pitfalls; iii) Rad4XCNN does not sacrifice model accuracy for their explainability; iv) Rad4XCNN provides a global explanation enabling the physician to extract global insights and findings. Our method can mitigate some concerns related to the explainability-accuracy trade-off. This study highlighted the importance of proposing new methods for model explanation without affecting their accuracy.
摘要:
Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability
2404.16957v1 by Yunfei Ge, Quanyan Zhu
The pervasive integration of Artificial Intelligence (AI) has introduced complex challenges in the responsibility and accountability in the event of incidents involving AI-enabled systems. The interconnectivity of these systems, ethical concerns of AI-induced incidents, coupled with uncertainties in AI technology and the absence of corresponding regulations, have made traditional responsibility attribution challenging. To this end, this work proposes a Computational Reflective Equilibrium (CRE) approach to establish a coherent and ethically acceptable responsibility attribution framework for all stakeholders. The computational approach provides a structured analysis that overcomes the limitations of conceptual approaches in dealing with dynamic and multifaceted scenarios, showcasing the framework's explainability, coherence, and adaptivity properties in the responsibility attribution process. We examine the pivotal role of the initial activation level associated with claims in equilibrium computation. Using an AI-assisted medical decision-support system as a case study, we illustrate how different initializations lead to diverse responsibility distributions. The framework offers valuable insights into accountability in AI-induced incidents, facilitating the development of a sustainable and resilient system through continuous monitoring, revision, and reflection.
摘要:隨著人工智慧 (AI) 的普及整合,在涉及 AI 驅動系統的事故中,責任和義務歸屬產生了複雜的挑戰。這些系統的互連性、AI 引發事故的倫理問題,加上 AI 技術的不確定性和缺乏相應法規,使得傳統責任歸屬面臨挑戰。為此,本研究提出了一種計算反思均衡 (CRE) 方法,以建立一個連貫且在倫理上可接受的責任歸屬架構,適用於所有利害關係人。計算方法提供了結構化的分析,克服了概念方法在處理動態且多面向情境時的限制,展示了該架構在責任歸屬過程中具備的可解釋性、連貫性和適應性。我們探討了與均衡計算中索賠相關的初始啟動層級的關鍵作用。我們以 AI 輔助醫療決策支援系統為案例研究,說明不同的初始化如何導致不同的責任分配。該架構提供了對 AI 引發事故中問責制的寶貴見解,透過持續監控、修訂和反思,促進了永續且有韌性的系統發展。
Explainable AI for Fair Sepsis Mortality Predictive Model
2404.13139v1 by Chia-Hsuan Chang, Xiaoyang Wang, Christopher C. Yang
Artificial intelligence supports healthcare professionals with predictive modeling, greatly transforming clinical decision-making. This study addresses the crucial need for fairness and explainability in AI applications within healthcare to ensure equitable outcomes across diverse patient demographics. By focusing on the predictive modeling of sepsis-related mortality, we propose a method that learns a performance-optimized predictive model and then employs the transfer learning process to produce a model with better fairness. Our method also introduces a novel permutation-based feature importance algorithm aiming at elucidating the contribution of each feature in enhancing fairness on predictions. Unlike existing explainability methods concentrating on explaining feature contribution to predictive performance, our proposed method uniquely bridges the gap in understanding how each feature contributes to fairness. This advancement is pivotal, given sepsis's significant mortality rate and its role in one-third of hospital deaths. Our method not only aids in identifying and mitigating biases within the predictive model but also fosters trust among healthcare stakeholders by improving the transparency and fairness of model predictions, thereby contributing to more equitable and trustworthy healthcare delivery.
摘要:人工智慧透過預測模型協助醫療專業人員,大幅轉變了臨床決策制定。本研究探討了在醫療保健中使用人工智慧應用程式時公平性和可解釋性的關鍵需求,以確保在不同的患者人口統計資料中獲得公平的結果。透過專注於敗血症相關死亡率的預測模型,我們提出了一種方法,該方法會學習一個效能最佳化的預測模型,然後採用轉移學習過程來產生一個具有更好公平性的模型。我們的模型還引入了一種新穎的基於排列的特徵重要性演算法,旨在闡明每個特徵在增強預測公平性方面的貢獻。與現有的可解釋性方法專注於解釋特徵對預測效能的貢獻不同,我們提出的方法獨特地彌補了理解每個特徵如何有助於公平性的差距。這項進展至關重要,因為敗血症的死亡率很高,且在三分之一的醫院死亡中扮演著角色。我們的模型不僅有助於識別和減輕預測模型中的偏差,還能透過提高模型預測的透明度和公平性來培養醫療保健利益相關者之間的信任,進而有助於提供更公平且值得信賴的醫療保健服務。
Multi Class Depression Detection Through Tweets using Artificial Intelligence
2404.13104v1 by Muhammad Osama Nusrat, Waseem Shahzad, Saad Ahmed Jamal
Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc. These platforms contain valuable information which can be used for research purposes. Considerable research has been conducted across various social media platforms. However, certain limitations persist in these endeavors. Particularly, previous studies were only focused on detecting depression and the intensity of depression in tweets. Also, there existed inaccuracies in dataset labeling. In this research work, five types of depression (Bipolar, major, psychotic, atypical, and postpartum) were predicted using tweets from the Twitter database based on lexicon labeling. Explainable AI was used to provide reasoning by highlighting the parts of tweets that represent type of depression. Bidirectional Encoder Representations from Transformers (BERT) was used for feature extraction and training. Machine learning and deep learning methodologies were used to train the model. The BERT model presented the most promising results, achieving an overall accuracy of 0.96.
摘要:現今,憂鬱症是一個重要的議題。根據世界衛生組織 (WHO) 的資料,在 2023 年,超過 2.8 億人正在與憂鬱症搏鬥。這是一個龐大的數字;如果不認真看待,這些數字將會快速增加。大約有 48.9 億人是社群媒體使用者。人們在 Twitter、Facebook、Reddit、Instagram 等平台上表達自己的感受和情緒。這些平台包含有價值的資訊,可用於研究目的。已經在各種社群媒體平台上進行了大量的研究。然而,這些努力仍存在某些限制。特別是,先前的研究僅專注於偵測推文中的憂鬱症和憂鬱症的強度。此外,資料集標籤中存在不準確的情況。在這項研究工作中,使用基於詞彙標籤的 Twitter 資料庫中的推文預測了五種類型的憂鬱症(雙極型、重度、精神病型、非典型和產後)。可解釋的 AI 用於透過強調代表憂鬱症類型的推文部分來提供推理。從 Transformers(BERT)中提取的雙向編碼器表示用於特徵提取和訓練。機器學習和深度學習方法用於訓練模型。BERT 模型呈現出最有希望的結果,達到 0.96 的整體準確度。
COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images
2404.12832v2 by Dmytro Shvetsov, Joonas Ariva, Marharyta Domnich, Raul Vicente, Dmytro Fishman
Deep learning is dramatically transforming the field of medical imaging and radiology, enabling the identification of pathologies in medical images, including computed tomography (CT) and X-ray scans. However, the performance of deep learning models, particularly in segmentation tasks, is often limited by the need for extensive annotated datasets. To address this challenge, the capabilities of weakly supervised semantic segmentation are explored through the lens of Explainable AI and the generation of counterfactual explanations. The scope of this research is development of a novel counterfactual inpainting approach (COIN) that flips the predicted classification label from abnormal to normal by using a generative model. For instance, if the classifier deems an input medical image X as abnormal, indicating the presence of a pathology, the generative model aims to inpaint the abnormal region, thus reversing the classifier's original prediction label. The approach enables us to produce precise segmentations for pathologies without depending on pre-existing segmentation masks. Crucially, image-level labels are utilized, which are substantially easier to acquire than creating detailed segmentation masks. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia. The findings indicate that COIN greatly surpasses established attribution methods, such as RISE, ScoreCAM, and LayerCAM, as well as an alternative counterfactual explanation method introduced by Singla et al. This evidence suggests that COIN is a promising approach for semantic segmentation of tumors in CT images, and presents a step forward in making deep learning applications more accessible and effective in healthcare, where annotated data is scarce.
摘要:深度学习正大幅轉變醫學影像和放射線學領域,能辨識醫學影像中的病理,包括電腦斷層掃描 (CT) 和 X 光掃描。然而,深度學習模型的效能,特別是在分割任務中,常常受到廣泛註解資料集需求的限制。為了應對此挑戰,透過可解釋 AI 和反事實解釋的產生,探索弱監督語意分割的能力。本研究的範圍是開發一種新的反事實內插方法 (COIN),該方法使用生成模型將預測的分類標籤從異常翻轉為正常。例如,如果分類器將輸入的醫學影像 X 視為異常,表示存在病理,則生成模型旨在內插異常區域,從而逆轉分類器的原始預測標籤。此方法使我們能夠產生病理的精確分割,而無需依賴於預先存在的分割遮罩。至關重要的是,利用影像層級標籤,這比建立詳細的分割遮罩容易取得。該方法的有效性透過分割合成目標和從愛沙尼亞塔爾圖大學醫院取得的 CT 影像中的實際腎臟腫瘤來證明。研究結果表明,COIN 遠遠超過已建立的歸因方法,例如 RISE、ScoreCAM 和 LayerCAM,以及 Singla 等人提出的另一種反事實解釋方法。此證據表明,COIN 是一種很有前途的 CT 影像中腫瘤語意分割方法,並在醫療保健中讓深度學習應用更易於取得和更有效率邁進一步,其中註解資料很稀少。
Hybrid Intelligence for Digital Humanities
2406.15374v1 by Victor de Boer, Lise Stork
In this paper, we explore the synergies between Digital Humanities (DH) as a discipline and Hybrid Intelligence (HI) as a research paradigm. In DH research, the use of digital methods and specifically that of Artificial Intelligence is subject to a set of requirements and constraints. We argue that these are well-supported by the capabilities and goals of HI. Our contribution includes the identification of five such DH requirements: Successful AI systems need to be able to 1) collaborate with the (human) scholar; 2) support data criticism; 3) support tool criticism; 4) be aware of and cater to various perspectives and 5) support distant and close reading. We take the CARE principles of Hybrid Intelligence (collaborative, adaptive, responsible and explainable) as theoretical framework and map these to the DH requirements. In this mapping, we include example research projects. We finally address how insights from DH can be applied to HI and discuss open challenges for the combination of the two disciplines.
摘要:在本文中,我們探討數位人文學科 (DH) 作為一門學科與混合智能 (HI) 作為一個研究典範之間的協同作用。在 DH 研究中,數位方法的使用,特別是人工智慧的使用,受到一系列要求和限制。我們認為這些要求和限制獲得 HI 的能力和目標的充分支持。我們的貢獻包括找出五個這樣的 DH 要求:成功的 AI 系統需要能夠 1) 與(人類)學者合作;2) 支援資料批評;3) 支援工具批評;4) 察覺並迎合各種觀點;5) 支援遠距和近距離閱讀。我們將混合智能的 CARE 原則(協作、適應、負責和可解釋)作為理論架構,並將這些原則對應到 DH 要求。在此對應中,我們納入範例研究專案。最後,我們探討如何將 DH 的見解應用於 HI,並討論結合這兩個學科的開放挑戰。
Ethical Framework for Responsible Foundational Models in Medical Imaging
2406.11868v1 by Abhijit Das, Debesh Jha, Jasmer Sanjotra, Onkar Susladkar, Suramyaa Sarkar, Ashish Rauniyar, Nikhil Tomar, Vanshali Sharma, Ulas Bagci
Foundational models (FMs) have tremendous potential to revolutionize medical imaging. However, their deployment in real-world clinical settings demands extensive ethical considerations. This paper aims to highlight the ethical concerns related to FMs and propose a framework to guide their responsible development and implementation within medicine. We meticulously examine ethical issues such as privacy of patient data, bias mitigation, algorithmic transparency, explainability and accountability. The proposed framework is designed to prioritize patient welfare, mitigate potential risks, and foster trust in AI-assisted healthcare.
摘要:基礎模型 (FM) 具有徹底改變醫學影像的巨大潛力。然而,它們在現實世界臨床環境中的部署需要廣泛的倫理考量。本文旨在強調與 FM 相關的倫理問題,並提出一個框架來指導它們在醫學中的負責任開發和實施。我們仔細審查了倫理問題,例如患者數據隱私、偏差緩解、演算法透明度、可解釋性和問責制。所提出的框架旨在優先考慮患者福利、減輕潛在風險,並培養對 AI 輔助醫療保健的信任。
Advancements in Radiomics and Artificial Intelligence for Thyroid Cancer Diagnosis
2404.07239v1 by Milad Yousefi, Shadi Farabi Maleki, Ali Jafarizadeh, Mahya Ahmadpour Youshanlui, Aida Jafari, Siamak Pedrammehr, Roohallah Alizadehsani, Ryszard Tadeusiewicz, Pawel Plawiak
Thyroid cancer is an increasing global health concern that requires advanced diagnostic methods. The application of AI and radiomics to thyroid cancer diagnosis is examined in this review. A review of multiple databases was conducted in compliance with PRISMA guidelines until October 2023. A combination of keywords led to the discovery of an English academic publication on thyroid cancer and related subjects. 267 papers were returned from the original search after 109 duplicates were removed. Relevant studies were selected according to predetermined criteria after 124 articles were eliminated based on an examination of their abstract and title. After the comprehensive analysis, an additional six studies were excluded. Among the 28 included studies, radiomics analysis, which incorporates ultrasound (US) images, demonstrated its effectiveness in diagnosing thyroid cancer. Various results were noted, some of the studies presenting new strategies that outperformed the status quo. The literature has emphasized various challenges faced by AI models, including interpretability issues, dataset constraints, and operator dependence. The synthesized findings of the 28 included studies mentioned the need for standardization efforts and prospective multicenter studies to address these concerns. Furthermore, approaches to overcome these obstacles were identified, such as advances in explainable AI technology and personalized medicine techniques. The review focuses on how AI and radiomics could transform the diagnosis and treatment of thyroid cancer. Despite challenges, future research on multidisciplinary cooperation, clinical applicability validation, and algorithm improvement holds the potential to improve patient outcomes and diagnostic precision in the treatment of thyroid cancer.
摘要:甲狀腺癌是一種日益嚴重的全球健康問題,需要先進的診斷方法。本篇評論探討了人工智能與放射特徵分析在甲狀腺癌診斷中的應用。在符合 PRISMA 指南的情況下,對多個資料庫進行了回顧,直到 2023 年 10 月。通過結合關鍵字,發現了一篇關於甲狀腺癌和相關主題的英文學術出版物。在移除 109 篇重複文獻後,原始搜尋共回傳 267 篇論文。在根據預先確定的標準,淘汰了 124 篇文章的摘要和標題後,選出了相關研究。在進行全面分析後,額外排除了六項研究。在納入的 28 項研究中,結合超音波 (US) 影像的放射特徵分析,證明了其在診斷甲狀腺癌方面的有效性。研究結果不一,有些研究提出了優於現狀的新策略。文獻強調了人工智能模型面臨的各種挑戰,包括可解釋性問題、資料集限制和操作員依賴性。28 項納入研究的綜合發現提到,需要標準化工作和前瞻性多中心研究來解決這些問題。此外,還確定了克服這些障礙的方法,例如可解釋人工智能技術和個人化醫療技術的進步。本篇評論重點探討了人工智能和放射特徵分析如何轉變甲狀腺癌的診斷和治療。儘管存在挑戰,但未來對多學科合作、臨床適用性驗證和演算法改進的研究,仍有潛力改善甲狀腺癌治療中的患者預後和診斷精準度。
Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI
2404.04686v1 by Taminul Islam, Md. Alif Sheakh, Mst. Sazia Tahosin, Most. Hasna Hena, Shopnil Akash, Yousef A. Bin Jardan, Gezahign Fentahun Wondmie, Hiba-Allah Nafidi, Mohammed Bourhia
Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.
摘要:
Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI
2404.03892v3 by Maryam Ahmed, Tooba Bibi, Rizwan Ahmed Khan, Sidra Nasir
The Deep learning (DL) models for diagnosing breast cancer from mammographic images often operate as "black boxes", making it difficult for healthcare professionals to trust and understand their decision-making processes. The study presents an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cancer using the CBIS-DDSM dataset. The methodology encompasses an elaborate data preprocessing pipeline and advanced data augmentation techniques to counteract dataset limitations and transfer learning using pre-trained networks such as VGG-16, Inception-V3 and ResNet was employed. A focal point of our study is the evaluation of XAI's effectiveness in interpreting model predictions, highlighted by utilizing the Hausdorff measure to assess the alignment between AI-generated explanations and expert annotations quantitatively. This approach is critical for XAI in promoting trustworthiness and ethical fairness in AI-assisted diagnostics. The findings from our research illustrate the effective collaboration between CNNs and XAI in advancing diagnostic methods for breast cancer, thereby facilitating a more seamless integration of advanced AI technologies within clinical settings. By enhancing the interpretability of AI driven decisions, this work lays the groundwork for improved collaboration between AI systems and medical practitioners, ultimately enriching patient care. Furthermore, the implications of our research extended well beyond the current methodologies. It encourages further research into how to combine multimodal data and improve AI explanations to meet the needs of clinical practice.
摘要:深度學習 (DL) 用於從乳房攝影術影像診斷乳癌的模型通常以「黑盒子」方式運作,這使得醫療保健專業人員難以信任和理解其決策過程。本研究提出一個整合架構,結合卷積神經網路 (CNN) 和可解釋人工智慧 (XAI),以使用 CBIS-DDSM 資料集增強乳癌的診斷。方法包含一個精細的資料前處理管線和進階資料擴充技術,以對抗資料集限制,並採用預先訓練的網路(例如 VGG-16、Inception-V3 和 ResNet)進行遷移學習。我們研究的重點是評估 XAI 在解釋模型預測中的有效性,重點利用豪斯多夫測度量化評估 AI 生成的解釋和專家註解之間的一致性。這種方法對於 XAI 在促進 AI 輔助診斷中的可信度和倫理公平性至關重要。我們研究的發現說明了 CNN 和 XAI 在推進乳癌診斷方法中的有效協作,從而促進了先進 AI 技術在臨床環境中的更順暢整合。透過增強 AI 驅動決策的可解釋性,這項工作為 AI 系統和醫療從業人員之間的改善協作奠定了基礎,最終豐富了患者照護。此外,我們研究的影響遠遠超出了目前的技術。它鼓勵進一步研究如何結合多模式資料並改善 AI 解釋,以滿足臨床實務的需求。
Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives
2404.00320v2 by Xingrui Gu, Zhixuan Wang, Irisa Jin, Zekun Wu
This research presents a novel multimodal data fusion methodology for pain behavior recognition, integrating statistical correlation analysis with human-centered insights. Our approach introduces two key innovations: 1) integrating data-driven statistical relevance weights into the fusion strategy to effectively utilize complementary information from heterogeneous modalities, and 2) incorporating human-centric movement characteristics into multimodal representation learning for detailed modeling of pain behaviors. Validated across various deep learning architectures, our method demonstrates superior performance and broad applicability. We propose a customizable framework that aligns each modality with a suitable classifier based on statistical significance, advancing personalized and effective multimodal fusion. Furthermore, our methodology provides explainable analysis of multimodal data, contributing to interpretable and explainable AI in healthcare. By highlighting the importance of data diversity and modality-specific representations, we enhance traditional fusion techniques and set new standards for recognizing complex pain behaviors. Our findings have significant implications for promoting patient-centered healthcare interventions and supporting explainable clinical decision-making.
摘要:本研究提出了一種創新的多模態數據融合方法,用於疼痛行為識別,將統計相關分析與以人為中心的見解相結合。我們的做法引入了兩項關鍵創新:1) 將數據驅動的統計相關權重整合到融合策略中,以有效利用來自異質模態的補充信息,以及 2) 將以人為中心的運動特徵納入多模態表示學習中,以詳細建模疼痛行為。我們的模型在各種深度學習架構中得到驗證,展示了卓越的性能和廣泛的適用性。我們提出了一個可自定義的框架,根據統計顯著性將每個模態與合適的分類器對齊,推進個性化和有效的多模態融合。此外,我們的模型提供對多模態數據的可解釋分析,有助於醫療保健中的可解釋和可解釋 AI。通過強調數據多樣性和模態特定表示的重要性,我們增強了傳統的融合技術,並為識別複雜的疼痛行為設定了新的標準。我們的發現對促進以患者為中心的醫療保健干預和支持可解釋的臨床決策制定具有重要意義。
Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach
2403.17873v1 by Andrea Ferrario, Alberto Termine, Alessandro Facchini
Human-centered explainable AI (HCXAI) advocates for the integration of social aspects into AI explanations. Central to the HCXAI discourse is the Social Transparency (ST) framework, which aims to make the socio-organizational context of AI systems accessible to their users. In this work, we suggest extending the ST framework to address the risks of social misattributions in Large Language Models (LLMs), particularly in sensitive areas like mental health. In fact LLMs, which are remarkably capable of simulating roles and personas, may lead to mismatches between designers' intentions and users' perceptions of social attributes, risking to promote emotional manipulation and dangerous behaviors, cases of epistemic injustice, and unwarranted trust. To address these issues, we propose enhancing the ST framework with a fifth 'W-question' to clarify the specific social attributions assigned to LLMs by its designers and users. This addition aims to bridge the gap between LLM capabilities and user perceptions, promoting the ethically responsible development and use of LLM-based technology.
摘要:以人为本的可解释 AI (HCXAI) 倡导将社会层面整合到 AI 解释中。HCXAI 话语的核心是社会透明度 (ST) 框架,其目标是让 AI 系统的社会组织背景对用户来说是可理解的。在这项工作中,我们建议扩展 ST 框架以解决大型语言模型 (LLM) 中社会错误归因的风险,尤其是在心理健康等敏感领域。事实上,LLM 能够出色地模拟角色和人格,这可能导致设计者的意图和用户对社会属性的认知之间出现错配,从而有风险促进情绪操纵和危险行为、认知不公正和不合理的信任。为了解决这些问题,我们建议用第五个“W 问题”来增强 ST 框架,以明确设计者和用户赋予 LLM 的具体社会属性。此补充旨在弥合 LLM 能力和用户认知之间的差距,促进基于 LLM 的技术在道德上负责任地开发和使用。
Clinical Domain Knowledge-Derived Template Improves Post Hoc AI Explanations in Pneumothorax Classification
2403.18871v1 by Han Yuan, Chuan Hong, Pengtao Jiang, Gangming Zhao, Nguyen Tuan Anh Tran, Xinxing Xu, Yet Yen Yan, Nan Liu
Background: Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. To address the opaqueness often associated with deep learning (DL) models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax diagnoses made by DL models. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement. Method: We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of these explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template's boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods with and without our template guidance when explaining two DL models in two real-world datasets. Results: The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. Conclusions: In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving AI explanations. We anticipate that our template guidance will forge a fresh approach to elucidating AI models by integrating clinical domain expertise.
摘要: