Explainability Metrics for Evaluating Human Trust in High-Stakes AI Medical Diagnosis Tools

Authors

  • Vidya Sagar S D Department of MCA, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Author
  • Syeda Meraj Lecturer, Department of Computer Science, College of Computer Science, King Khalid University, Saudi Arabia Author
  • Dileep M R Department of MCA, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Author

Keywords:

Explainable Artificial Intelligence (XAI), Medical Image Diagnosis, Human Trust Evaluation, Deep Learning, ChestX-ray14 Dataset, Explainability Metrics

Abstract

Artificial intelligence (AI) diagnostic systems have demonstrated remarkable accuracy in medical image analysis but often lack transparency, hindering clinician trust and adoption in high-stakes healthcare settings. This study aims to develop and evaluate explainability metrics that quantitatively assess the quality of AI-generated explanations and their influence on human trust in medical diagnosis tools. We employ a DenseNet-121 convolutional neural network trained on the ChestX-ray14 dataset for multi-label thoracic disease classification. Explainability methods, including Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP), generate visual and quantitative interpretations of model predictions. We propose novel metrics for explanation fidelity, human-centered trust, and task effectiveness. A controlled user study involving 20 clinicians evaluates the impact of these explanations on trust and diagnostic decision-making. Statistical analyses examine the relationships between model accuracy, explanation quality, and human trust. The model achieved an average area under the curve (AUC) of 0.89 across key diseases, with Grad-CAM and SHAP explanations attaining an average fidelity score of 0.87. Clinician trust scores increased significantly from 3.1 to 4.2 on a 5-point Likert scale when explanations accompanied predictions (p < 0.001). Trust Calibration Index improved to 0.81, indicating strong alignment between AI confidence and user trust. Regression analysis revealed explanation quality as the strongest predictor of trust (β = 0.65, p < 0.001). This work advances the quantitative evaluation of explainability in medical AI and demonstrates that transparent explanations substantially enhance clinician trust. The proposed framework facilitates safer integration of AI diagnostic tools in clinical practice, promoting wider adoption and improved patient outcomes

Downloads

Published

2024-03-31

How to Cite

Vidya Sagar S D, Syeda Meraj, & Dileep M R. (2024). Explainability Metrics for Evaluating Human Trust in High-Stakes AI Medical Diagnosis Tools. Synthesis: A Multidisciplinary Research Journal, 2(1), 10-21. https://www.macawpublications.com/Journals/index.php/SMRJ/article/view/180

Share