Skin Disease Detection using Hybrid CNN and Vision Transformer Architecture


Date Published : 10 January 2026

Contributors

Bipin P R

Author

Upendra Kumar

Author

Sai Kiran Oruganti

Lincoln University College
Author

Keywords

Deep Learning Vision Transformer Convolutional Neural Network Skin Cancer Classification Hybrid Model Transfer Learning Dermoscopy Medical Image Analysis

Proceeding

Track

Engineering, Sciences, Mathematics & Computations

License

Copyright (c) 2026 Sustainable Global Societies Initiative

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

Skin cancer is one of the fastest-growing malignancies worldwide, with melanoma accounting for the majority of skin-related mortalities. Early detection of such conditions greatly enhances treatment outcomes and survival rates. Conventional visual diagnosis depends on dermatological expertise, which may lead to variability in interpretation. In recent years, deep learning methods—particularly convolutional neural networks (CNNs)—have shown remarkable potential in automating skin lesion classification. Nevertheless, CNNs often fail to capture global dependencies across an image. Vision Transformers (ViTs), on the other hand, utilize self-attention mechanisms that enable them to model long-range interactions between image patches.

This research proposes a hybrid model that integrates CNN and ViT architectures to leverage both local and global features for improved classification accuracy. The CNN component performs preprocessing and local feature extraction, while the ViT module captures global context. Experiments conducted on the ISIC 2019 dataset show that the hybrid model achieves superior accuracy compared with individual CNN or ViT systems. The proposed architecture presents a reliable solution for automated dermatological diagnostics suitable for clinical and telemedicine environments.

References

No References

Downloads

How to Cite

Bipin P R, B. P. R., Upendra Kumar , U. K. ., & Oruganti, S. K. (2026). Skin Disease Detection using Hybrid CNN and Vision Transformer Architecture. Sustainable Global Societies Initiative, 1(1). https://vectmag.com/sgsi/paper/view/76