ViT-ESA: Enhanced Spatial Attention in ViT for Breast Cancer Histopathological Image Classification
Contributors
Amrutanshu Panigrahi
Subrata Chowdhury
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Breast cancer is one of the most common causes of death due to cancer among women in the world requiring accurate and early diagnosis. Accurate diagnosis of histopathological images is crucial for final diagnosis. Moreover, many computer diagnosis systems do not capture contextual information globally or at several scales of tissue pattern. Recent studies showed that ViT's models which use deep learning show a better performance in modeling long-range dependencies than CNN’s. Even though high-dimensional deep features can improve performance through improved feature representation, they also introduce redundancy and computation overloads. These can hinder clinical deployment. To address these concerns, the proposed method integrates deep feature extraction based on Vision Transformers with feature selection using the Elephant Search Algorithm (ESA). Throughout the BreakHis breast histopathology dataset when tested at various magnification factors, the proposed ViT–ESA framework improves classification performance at lower feature dimensions. A comparative evaluation of multiple machine learning classifiers demonstrates that the XGBoost classifier performs the best validating the effectiveness of this method with ESA-based transformer features for reliable breast cancer diagnosis.