Addressing Limitations in CNN-Based Acoustic Profiling: Enhancing Real-Time Depression Detection with RNN and LSTM Architectures
Contributors
Prof. (Dr.) Dhananjay S. Deshpande
Shashi Kant Gupta
Sai Kiran Oruganti
Keywords
Proceeding
Track
Engineering, Sciences, Mathematics & Computations
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
The human voice carries subtle cues that can indicate emotional well-being, making speech analysis an increasingly valuable tool for detecting early signs of depression. While earlier studies have applied Convolutional Neural Networks (CNNs) to classify acoustic features, these models struggle to interpret the changing flow and timing of speech, elements that are essential to identifying mood variations. In this work, we examine the performance constraints of CNN-based systems and explore Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) architectures as enhanced solutions for real-time depression assessment. By modelling speech as a continuous sequence rather than isolated segments, RNNs and LSTMs can capture temporal patterns linked with depressive behaviour more effectively. Our comparative evaluation shows noticeable improvements in detection accuracy and response latency, demonstrating that temporal modelling plays a critical role in voice-driven mental health screening. These findings provide support for the integration of sequential deep learning models into future clinical and mobile applications aimed at scalable mental health monitoring.