Detection of AI-Generated Text Using Linguistic Features and Machine Learning for Preserving Academic Integrity
Contributors
Saleem
Dr. Basant Kumar
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Advancement of Large Language Models (LLMs) has reshaped how information is produced and consumed, enabling highly coherent, human-like text across diverse applications such as academic research, tutoring, summarization, and code generation. While these advancements enhance productivity and learning efficiency, they also raise pressing ethical concerns, including misinformation, plagiarism, and academic integrity violations. Current detection methods often rely on model-specific parameters or hidden states, which limits their adaptability to new and evolving architectures. Similarly, linguistic and embedding-based approaches, though effective, are computationally intensive, reducing scalability and hindering real-time deployment. Proprietary detection tools further complicate accessibility, as they are costly and frequently lack accuracy against rapidly advancing models. To address these challenges, a lightweight hybrid framework integrating linguistic and machine learning techniques is proposed. By combining stylometric, syntactic, and pragmatic features, this approach aims to improve generalization across domains while maintaining computational efficiency. Evaluated on multi-domain datasets with adversarial testing, the framework demonstrates resilience against paraphrasing and evolving LLMs. Such solutions are crucial for ensuring trustworthy AI integration, balancing the benefits of LLMs with the need for ethical safeguards and scalable detection mechanisms.