Comparative Evaluation of Rule-Based, SVM, CRF and BiLSTM Models for Gujarati Part-of-Speech Tagging Using an Art and Culture Corpus
Contributors
Dr. Pooja Bhatt
Pawan Wing
Keywords
Proceeding
Track
General Track
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Part-of-Speech (POS) tagging is a fundamental Natural Language Processing (NLP) task that assigns grammatical categories to words in a sentence. Gujarati, being a low-resource language, has limited benchmark datasets and standardized evaluation frameworks. This study presents a comparative analysis of Rule-Based, Support Vector Machine (SVM), Conditional Random Fields (CRF), and Bidirectional Long Short-Term Memory (BiLSTM) models for Gujarati POS tagging using an Art and Culture corpus containing approximately 20,000 manually annotated tokens based on BIS tag standards. A unified experimental framework is employed to ensure fair comparison across all models. The results show that BiLSTM achieved the highest accuracy of 96.7%, followed by CRF (94.2%), SVM (91.5%), and the Rule-Based approach (79.1%). The study demonstrates that deep learning models provide superior contextual understanding, while CRF remains an effective choice for low-resource Gujarati POS tagging.