Comparative Evaluation of Rule-Based, SVM, CRF and BiLSTM Models for Gujarati Part-of-Speech Tagging Using an Art and Culture Corpus


Date Published : 26 June 2026

Contributors

Dr. Pooja Bhatt

Author

Pawan Wing

Author

Keywords

Gujarati NLP POS Tagging Art and Culture Corpus CRF BiLSTM Low-Resource Languages

Proceeding

Track

General Track

License

Copyright (c) 2026 Sustainable Global Societies Initiative

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

Part-of-Speech (POS) tagging is a fundamental Natural Language Processing (NLP) task that assigns grammatical categories to words in a sentence. Gujarati, being a low-resource language, has limited benchmark datasets and standardized evaluation frameworks. This study presents a comparative analysis of Rule-Based, Support Vector Machine (SVM), Conditional Random Fields (CRF), and Bidirectional Long Short-Term Memory (BiLSTM) models for Gujarati POS tagging using an Art and Culture corpus containing approximately 20,000 manually annotated tokens based on BIS tag standards. A unified experimental framework is employed to ensure fair comparison across all models. The results show that BiLSTM achieved the highest accuracy of 96.7%, followed by CRF (94.2%), SVM (91.5%), and the Rule-Based approach (79.1%). The study demonstrates that deep learning models provide superior contextual understanding, while CRF remains an effective choice for low-resource Gujarati POS tagging.

References

No References

Downloads

How to Cite

Bhatt, P., & Wing, P. . (2026). Comparative Evaluation of Rule-Based, SVM, CRF and BiLSTM Models for Gujarati Part-of-Speech Tagging Using an Art and Culture Corpus. Sustainable Global Societies Initiative, 1(9). https://vectmag.com/sgsi/paper/view/838