Vision Mamba-based UNet++ Approach for Effective Medical Image Segmentataion
Contributors
Rupak Chakraborty
Shashi kant Gupta
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
In the field of medical image segmentation, UNet and UNet++ approaches got attention. Though the performance of modified architecture of those models was promising, they got stuck in some areas like inaccurate boundary delineation, sensitivity to image quality variance etc. So, researchers moved towards Vision Transformer (ViT) architecture where self-attention mechanisms were applied on a set of patches of images. But these pre-trained transformer-based models suffer from high complexity and heavy power support. To overcome those issues, State Space Model (SSM)-based Vision Mamba (VM) architecture has been introduced. Incorporation of bidirectional sequence feature of SSM to encoder-decoder based UNet architecture is the recent research interest. Inspired by existing literature, one Vision Mamba UNet++ (VMUNet++) has been proposed for effective segmentation. The nested ‘skip-connections’ of UNet++ has been chosen here to reduce the semantic gap between encoding and decoding features. The proposed model will be tested to public datasets like ISIC18, Synapse, SegPC-2021 and the outcome of the model will be compared with existing recent architectures to show the efficacy of the model.