Explainable Multimodal LLMs: Integrating Multi-Shot Reasoning for Transparent and Trustworthy AI


Date Published : 11 January 2026

Contributors

Aleem Ali

Lincoln University College
Author

Shashi Kant Gupta

Author

Keywords

Explainable Multimodal Large Language Models Multi-Shot Multimodal Reasoning Cross-Modal Explainability Attention-Based Interpretability.

Proceeding

Track

Engineering, Sciences, Mathematics & Computations

License

Copyright (c) 2026 Sustainable Global Societies Initiative

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

Recent advancements in multimodal large language models (MLLMs) have expanded artificial intelligence capabilities to process and reason across diverse modalities—such as text, image, and video. However, the decision-making processes of these models remain largely opaque, limiting their deployment in critical and trust-sensitive domains. This paper introduces an explainability-driven extension of the Multi-Shot Multimodal Large Language Model (MS-MLLM), integrating interpretability modules to enable transparent and trustworthy multimodal reasoning. The proposed model combines cross-attention fusion, multi-shot contextual learning, and explainable visual-textual inference through attention-based and gradient-based interpretability mechanisms. Experiments on benchmark datasets—MIMIC-CXR, MS COCO, and YouTube8M—demonstrate that the proposed framework maintains high performance (89% accuracy in medical diagnosis, CIDEr score of 112 for image captioning, and 82% accuracy in video QA) while offering interpretable insights via heatmaps and textual rationales. The study underscores the necessity of integrating explainability into multi-shot multimodal learning to ensure human-aligned, transparent, and reliable AI systems for real-world applications.

References

No References

Downloads

How to Cite

Aleem Ali, A. A., & Shashi Kant Gupta, S. K. G. (2026). Explainable Multimodal LLMs: Integrating Multi-Shot Reasoning for Transparent and Trustworthy AI. Sustainable Global Societies Initiative, 1(1). https://vectmag.com/sgsi/paper/view/66