DA-ACFNet: Adaptive Cross-Modal Transformer Fusion for Emotion Recognition in Neurodiverse Children

surendra Ramteke; Dr. Sunil Kumar

Home
Proceedings
Vol. 1 No. 6 (2026): LGPR Batch 2 Conference 4
Paper

DA-ACFNet: Adaptive Cross-Modal Transformer Fusion for Emotion Recognition in Neurodiverse Children

Date Published : 24 June 2026

Contributors

surendra Ramteke

Lincoln University College, Malaysia.

Author

Dr. Sunil Kumar

Lincoln University College, Malaysia.

Author

Keywords

Facial Emotion Recognition Neurodiverse Children Transformer Fusion Cross-Modal Attention Contrastive Learning

Proceeding

Vol. 1 No. 6 (2026): LGPR Batch 2 Conference 4

Track

Engineering and Sciences

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

Facial Emotion Recognition (FER) is an emerging technology in assistive healthcare, special education and affective human-computer interaction. It is important to note that regular facial expression recognition models may not be effective in recognition of neurodiversity children's facial expressions, since their expressions can be subtle, inconsistent, or even different from those of neurotypical children. In this paper, we suggest DA-ACFNet, which is an adaptive transformer-based fusion model for emotion recognition of neurodiverse children. This proposed model is a dual stream model, consisting of real facial and synthetic facial representations. A ResNeXt backbone is used to extract the spatial features from real facial images, and transformer encoders are employed to process synthetic facial images, created by augmentation. The adaptive cross-modal attention module learns to optimally integrate complementary emotional information from the two streams. Furthermore, a hybrid loss function is applied to enhance the inter-class discrimination, which is a combination of cross-entropy and contrastive learning. Results in the experiments demonstrate the superiority of the performance of DA-ACFNet over CNN, ResNet-50, EfficientNet-B0 and compact vision transformer baselines. The proposed model is able to achieve 98.2% accuracy in the overall emotion recognition task, showing its efficiency in emotion recognition in children with different levels of neurodiversity.

References

No References

Downloads

PDF

How to Cite

Ramteke, surendra, & Dr. Sunil Kumar, D. S. K. (2026). DA-ACFNet: Adaptive Cross-Modal Transformer Fusion for Emotion Recognition in Neurodiverse Children. Sustainable Global Societies Initiative, 1(6). https://vectmag.com/sgsi/paper/view/703