Explainability Metrics for Evaluating Human Trust in High-Stakes AI Medical Diagnosis Tools

Vidya Sagar S D; Syeda Meraj; Dileep M R

Authors

Vidya Sagar S D Department of MCA, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Author
Syeda Meraj Lecturer, Department of Computer Science, College of Computer Science, King Khalid University, Saudi Arabia Author
Dileep M R Department of MCA, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Author

Keywords:

Explainable Artificial Intelligence (XAI), Medical Image Diagnosis, Human Trust Evaluation, Deep Learning, ChestX-ray14 Dataset, Explainability Metrics

Abstract

Artificial intelligence (AI) diagnostic systems have demonstrated remarkable accuracy in medical image analysis but often lack transparency, hindering clinician trust and adoption in high-stakes healthcare settings. This study aims to develop and evaluate explainability metrics that quantitatively assess the quality of AI-generated explanations and their influence on human trust in medical diagnosis tools. We employ a DenseNet-121 convolutional neural network trained on the ChestX-ray14 dataset for multi-label thoracic disease classification. Explainability methods, including Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP), generate visual and quantitative interpretations of model predictions. We propose novel metrics for explanation fidelity, human-centered trust, and task effectiveness. A controlled user study involving 20 clinicians evaluates the impact of these explanations on trust and diagnostic decision-making. Statistical analyses examine the relationships between model accuracy, explanation quality, and human trust. The model achieved an average area under the curve (AUC) of 0.89 across key diseases, with Grad-CAM and SHAP explanations attaining an average fidelity score of 0.87. Clinician trust scores increased significantly from 3.1 to 4.2 on a 5-point Likert scale when explanations accompanied predictions (p < 0.001). Trust Calibration Index improved to 0.81, indicating strong alignment between AI confidence and user trust. Regression analysis revealed explanation quality as the strongest predictor of trust (β = 0.65, p < 0.001). This work advances the quantitative evaluation of explainability in medical AI and demonstrates that transparent explanations substantially enhance clinician trust. The proposed framework facilitates safer integration of AI diagnostic tools in clinical practice, promoting wider adoption and improved patient outcomes

Explainability Metrics for Evaluating Human Trust in High-Stakes AI Medical Diagnosis Tools

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Share

Most read articles by the same author(s)

Make a Submission

Journal Information

Downloads

Information

Keywords

Latest publications

QUICK LINKS

DOWNLOADS