A Health-Aware and Noise-Robust Speaker Recognition Framework Using Adaptive Neural Architectures
Contributors
Dr. Arundhati Niwatkar
Dr. Sai Kiran Oruganti
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Speaker verification has increased in use as a biometric technology employed to verify users
using a variety of authentication methods, including using voice recognition, digital assistants, and
providing a secure method to interact with other machines. Contemporary speaker verification systems
have made extensive use of deep neural network architectures for speaker verification, including using
embeddings for speakers using x-vectors, ECAPA-TDNNs, self-supervised learning, and other related
models; however, the performance of these systems is often degraded (sometimes drastically) from
real-world impacts, such as environmental noise, mismatch of channels, and physiological influences of
the voice due to factors like illness or fatigue. Over the past few years, researchers have made great
advances concerning deep learning architectures and methodology, advantages of self-supervised
learning for developing speaker representations based on speech and using voice biomarkers for
determining health. This review provides an overview of these advances and of the limitations of
previous forms of speaker verification (e.g., topography, vocal stability, etc.) in separating the identity of
a speaker from health variations/changes caused by issues such as environmental impacts, illness, and
other physiological influences. This document will discuss a framework for creating a health-aware and
noise-robust speaker verification system using an adaptive learning architecture that utilizes advanced
architecture, including dynamic attention mechanisms as well as adversarial trained systems. The goal of
the proposed work is to create a speaker verification system that will recognize the identity of the
speaker regardless of variations in noise and/or health; thus, it will function effectively as a biometric
verification mechanism in a variety of real-world applications.