Designing Tiny Machine Learning Model for Keyword Spotting Using Knowledge Distillation – A comprehensive review


Date Published : 29 April 2026

Contributors

Selvaperumal P

Lincoln University College
Author

Keywords

Keyword spotting Knowledge distillation TinyML Edge AI

Proceeding

Track

Engineering and Sciences

License

Copyright (c) 2026 Sustainable Global Societies Initiative

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

Small-footprint Keyword spotting (KWS) in an IoT or edge device is the process of identifying pre-defined keywords from speech in local resource constrained devices using light-weight machine learning models. Since these devices have highly constrained processing, memory, and power capacities, it is difficult to design a tiny machine learning model with accuracy similar to the large models for keyword spotting from speech. This exhaustive review systematically examines recent advances in tiny machine learning-based keyword spotting systems including the study of various models used, knowledge distillation process employed, and quantization process for model compression. Most of the works surveyed uses Google Speech Commands dataset (versions v1 and v2) as the benchmark dataset. Most of the reported works predominantly use hand-crafted acoustic features extracted from the raw audio waveforms, with Mel-Frequency Cepstral Coefficients (MFCC) (typically 10–40 coefficients) and log-Mel filterbank energies (LFBE) or log-Mel spectrograms (e.g., 40–80 Mel bins) being the most common inputs to both teacher and student models during training or distillation. These acoustic features serve as the foundation for feeding efficient neural networks—such as CNNs (e.g., DS-CNN, BC-ResNet), Transformers (e.g., DistilHuBERT and LightHuBERT), or hybrid designs—during both teacher training and student distillation. Subsequently, knowledge distillation techniques (e.g., soft targets/logits, layer-wise hidden representations, contextualized latent transfer, or robust variants like VIC-KD) compress the model.  The surveyed approaches produce substantial reductions in model size (often 29–75%) and inference cost while preserving accuracy.  This survey is conducted to study the current keyword spotting models and identifying research gaps in them.

References

No References

Downloads

How to Cite

Selvaperumal P, S. P. (2026). Designing Tiny Machine Learning Model for Keyword Spotting Using Knowledge Distillation – A comprehensive review. Sustainable Global Societies Initiative, 1(5). https://vectmag.com/sgsi/paper/view/474