Attention-Based Acoustic Encoding: Transformer-Driven Longitudinal Vocal Biomarkers for Enhanced Depression Detection

Prof. (Dr.) Dhananjay S. Deshpande; Shashi Kant Gupta; Sai Kiran Oruganti

Home
Proceedings
Vol. 1 No. 1 (2026): LGPR Batch 1 Conference 4
Paper

Attention-Based Acoustic Encoding: Transformer-Driven Longitudinal Vocal Biomarkers for Enhanced Depression Detection

Date Published : 11 January 2026

Contributors

Prof. (Dr.) Dhananjay S. Deshpande

MBAESG - School of Management, Ajeenkya D Y Patil University, Pune

Author

Shashi Kant Gupta

Chitkara University, Punjab

Author

Sai Kiran Oruganti

Lincoln University College

Author

Keywords

Depression Detection; Voice Biomarkers; Attention Mechanism; Transformer Networks; Acoustic Profiling; Longitudinal Speech Analysis; Prosodic Features; Mental Health AI

Proceeding

Vol. 1 No. 1 (2026): LGPR Batch 1 Conference 4

Track

Engineering, Sciences, Mathematics & Computations

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

Spotting the signs of depression in a person's voice is notoriously difficult. The vocal clues are often subtle and vary greatly from person to person. While existing AI models have been used for this task, they often miss the broader context in speech and the tiny, yet critical, shifts in tone and rhythm that can signal depression. To tackle this, we built a new model inspired by powerful Transformer technology, which uses a "self-attention" mechanism. Think of it as teaching the model to focus more intently on the most telling parts of a voice recording—like a visual map of sound—to pick up on patterns such as flat intonation, unusually long pauses, or energy changes. A key feature of our system is its ability to track these vocal patterns over time for an individual, making it more resilient to differences between speakers or background noise. In tests on standard depression speech datasets, our approach proved to be more accurate and sensitive than current methods. We believe this is a promising step toward creating practical tools that could help doctors with early detection and ongoing monitoring, offering a scalable way to support those at risk for depression.

References

No References

Downloads

PDF

How to Cite

Deshpande, D., Gupta, S. K. ., & Oruganti, S. K. (2026). Attention-Based Acoustic Encoding: Transformer-Driven Longitudinal Vocal Biomarkers for Enhanced Depression Detection . Sustainable Global Societies Initiative, 1(1). https://vectmag.com/sgsi/paper/view/55