A Comprehensive Review of Multimodal Emotion Recognition Techniques Using Machine Learning
Contributors
Dr. Ujwalla Gawande
Dr. S. Hemalatha
Keywords
Proceeding
Track
Engineering, Sciences, Mathematics & Computations
License
Copyright (c) 2025 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Emotion recognition allows intelligent systems to recognize and address human emotions, making human computer interaction better. The single-modality-based approaches, i.e., speech or facial expressions alone, usually failed to recognize the full spectrum of human emotions under realistic conditions. The advance in machine learning and deep learning has given rise to particularly important multimodal and multilingual emotion recognition methods. In such a system, facial expressions, voice, and text are the different sources used by integrating these to improve the detection of emotion in various languages. This study present recent advance on multimodal and multilingual emotion recognition, while giving special emphasis on convolutional, recurrent, and transformer models. It covers systems like Multilingual Speech Emotion Recognition (MSER) and shows how they apply to India and other low-resource languages. It identifies emotion-aware music, activity, and education recommendation systems for their potential to boost personalization and user engagement. It discusses specific challenges on dataset imbalance, cultural differences, subjective emotional boundaries, and privacy issues. Finally, this paper provides a direction for creating contextually aware, adaptive, and ethically sensitive emotion recognition systems.