Review: UNet-based Medical Image Segmentation
Contributors
Rupak Chakraborty
Shashi Kant Gupta
Keywords
Proceeding
Track
Engineering, Sciences, Mathematics & Computations
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Creating a deep learning framework for accurate segmentation of MRI images using robust AI models is the recent trend. It has been experienced that U-Net based models achieved higher segmentation accuracies in this scenario. It has also been surveyed that incorporating more ‘skip-connections’ to UNet, advanced UNet++ has been developed. Further vision transformer-based architecture has been added with UNet++ models for effective segmentation. Later, hybrid CNN–Transformer architectures like Vision Transformer (ViT) has been surveyed to leverage both local and global image features to enhancing segmentation accuracy. Some drawbacks of ViT-based models include Data Requirements, Deployment Challenges, Fixed Input Size Dependency, Weak Inductive Bias for Locality etc. As a solution of those State Space Model (SSM)-based Vision Mamba (VM) architecture has been proposed. By integrating XAI techniques such as SHAP, Grad-CAM and uncertainty quantification, the model aims to improve interpretability and reliability. Evaluations will be carried out on existing public datasets such as BraTS2020, ISIC2020, cvc clinicdb, Synapse etc. and the measurement metrics will be considered like Dice Similarity Coefficient (DSC), Intersection over Union (IoU), Hausdorff Distance, Pixel-wise Accuracy etc. The outcomes of the proposed VM-UNet++ model will be compared with UNet and UNet++, Swin Unet, Residual CNN etc.