Explainable Knowledge Distillation via Capsule Vision Transformers for Automated Kidney Disease Categorization
Contributors
Mr. Sachin Dattatraya Shingade
Mr. Midhun Chakkaravarthy
Mr. Dimitrios A. Karras
Mr. Sachin S.
Miss. Komal Mahadeo Masal
Keywords
Proceeding
Track
Engineering, Sciences, Mathematics & Computations
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Abstract: Kidney disease, characterized by the progressive decline in the renal system's ability to filter metabolic waste and excess fluids, poses a significant global health risk. When these physiological impairments persist beyond a three-month threshold, the condition is classified as Chronic Kidney Disease (CKD). Current diagnostic frameworks often struggle with high computational overhead, suboptimal precision, and a lack of architectural efficiency for deployment on lightweight devices. To resolve these challenges, this study introduces CapViT-DKD , an advanced kidney detection and classification framework utilizing a Capsule Vision Transformer integrated with Self-Supervised Diverse Knowledge Distillation. The methodology utilizes the CT-kidney dataset, initiated by a robust preprocessing pipeline comprising image resizing, normalization, and strategic data augmentation. The core architecture employs a teacher-student paradigm: a high-capacity Perceptive Capsule Transformer Network (PCapTN) serves as the teacher, transferring complex feature representations to a Lightweight Capsule Transformer Network (LCapTN) student model. This diverse knowledge distillation (DKD) approach significantly boosts the student model’s performance while maintaining a small computational footprint. To address the "black box" nature of deep learning, we incorporate Principal Component of Gradient-Class Activation Mapping (PCG-CAM), which provides visual explanations by highlighting the specific anatomical regions influencing the diagnostic output. Empirical results demonstrate that the proposed system achieves superior performance metrics, including an accuracy of 99.75%, precision of 99.50%, Recall of 99.15 % and an F1-score of 99.10%, validating its efficacy for clinical decision support