Causally-Aware Explainable Deep Learning for Reliable Decision-Making in Safety-Critical Systems: A Systematic Review Using PRISMA 2020
Contributors
Mayur Bhoyar
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Deep learning (DL) models have unprecedented predictive accuracy in domains that are considered safety-critical such as healthcare diagnostics, autonomous vehicles (AV) and industrial control systems (ICS). This makes them fragile to real-world distribution shifts, however, both the nature of their statistical correlations instead of real causal mechanisms and the fact that life-threatening execution of them are intrinsic. Concurrently, the EU AI Act (Regulation (EU) 2024/1689, 2024) and FDA Software as a Medical Device (SaMD) guidelines mandate causally-grounded, interpretable AI for high-risk deployments. Objective: This systematic review synthesises the evidence on causally-aware explainable AI (XAI) methods for DL models deployed in safety-critical systems, evaluating their methodological rigour, regulatory alignment, and performance under distribution shift. Methods: A PRISMA 2020-compliant search of Scopus, Web of Science (SCIE), IEEE Xplore, ACM Digital Library, and Science Direct (January 2018 3,642 initial records; 94 post-deduplication; 47 full inclusion) was conducted. The papers had to suggest or test XAI techniques combining with both the elements of causal inference (structural causal models, do-calculus, counterfactual reasoning, causal discovery) applied to DL pipelines in safety-critical settings. Findings: The Post-hoc XAI algorithms (SHAP, LIME, Grad-CAM) can explain statistical associations, not causal relationships, and have gas-gone-bad behavior of up to 43 in distribution shift. Few medical or AV reviewed DL systems meet EU AI Act Article 13 transparency requirements (only 11%). Causal XAI Structural Causal Model (SCM)-based approaches to DNN networks are shown to have much better out-of-distribution (OOD) robustness, but are scalable only when the causal graph is structured (NP-hard causal graph learning). Conclusion: A single causally-aware XAI system, referred to as CausalXAI, built upon SCMs, differentiable causal discovery (NOTEARS) and deep neural networks is suggested as the framework of the next generation regulatory-compliant AI in safety-critical systems. Future study should take into consideration the causal graph scalability, benchmark standardization, and multi-domain validation.