Effective Predictive model for diabetes classification using optimized machine learning on imbalanced dataset
Contributors
Dr. G. R. Ashisha
Sai Kiran
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Diabetes mellitus is a growing health issue that demands precise and early detection. In recent research, machine learning techniques have shown promising results in assisting medical practitioners with the prediction of the disease. However, the presence of class imbalance and missing values is a common issue with real-world medical datasets, which may impact the prediction performance. In this paper, the authors propose an optimized machine learning framework that uses a class-balanced dataset generated with the Synthetic Minority Over-sampling Technique. The proposed framework is experimented with the Diabetes 130-US Hospitals dataset. Data preprocessing techniques were used to improve the quality of the dataset. Various machine learning algorithms were used, including CatBoost, XGBoost, LightGBM, and stacking. In the experimentation process, the authors observed that the use of ensemble-based machine learning techniques resulted in better classification performance.