Machine Learning Classifiers for Thunderstorm Nowcasting: A Study of Support Vector Machines and Random Forest Approaches against Thermodynamic Indices
Contributors
Madhusudhan HS
Dr. Ajay Kumar
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Performance of machine learning classifiers in thunderstorm nowcasting through the study of support vector machines and random forest methods in comparison with thermodynamic indices is presented in this work. Consequently, the ability to forecast mesoscale convectional systems (MCS) including severe thunderstorm events remains a formidable challenge in the day-to-day weather practice. The main issue: These storms evolve very rapidly, and its behaviour is far more complicated than a simple rule-based method can capture and hence forecasting them is a major difficulty. Practically we tend to rely on a limited number of threshold tests, such as CAPE or the Lifted Index. The latter numbers are directly out of large Numerical Weather Prediction (NWP) models. Unfortunately, they are prone to indicating too many storms that do not occur (large False Alarm Ratios) and failing to see the causes of thunderheads appearing. This is why we are proposing a robust Machine-Learning (ML) solution to nowcasting. We are comparing two common models, the Support vector Machines (SVM) and the random forests (RF). The point is to find out which one is more successful in telling the distinguish between storm and non-storm environments. The models were trained with a huge set of features, 21 variables of which are all kinematic and thermodynamic quantities inspired by the SALAMA framework. Then we compared their performance with the older stability index thresholds which are still used by most forecasters particularly across the Indian subcontinent. The findings were clear. According to previous studies, it is proposed that Random Forests may have greater Critical Success Index (CSI) and fewer false alarms than SVMs and simple thermodynamic indices. This is just a demonstration that tree-based forests are helpful indeed. Lastly, we investigated the question of why the models are correct (or incorrect) and discovered that the most significant predictors, in fact, were Total Precipitable Water (TPW) and mid-level vertical velocity. These were more helpful than the usual CAPE values we would otherwise contemplate, and this implies that we are likely over-relying on CAPE alone.