Security Vulnerabilities and Defense Framework for Large Language Models
Contributors
Dr. Madhavi Dhingra
Dr. S K Manju Bargavi
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Large Language Models (LLMs) have transformed content creation in various fields while also posing major security threats, including data privacy violations, prompt injection assaults, and intricate adversarial weaknesses. This document outlines an in-depth research investigation focused on examining these risks via a tripartite approach: (1) evaluating adversarial weaknesses through experimental red-teaming, (2) classifying exploitable actions using unsupervised clustering, and (3) creating a systematic multi-tiered preventive security framework. Through the assessment of various open-source and closed-source models (such as LLaMA, Falcon, and GPT variants), we gauge the rates of successful attacks, instances of refusal, and the intensity of harmful outputs. Additionally, we utilize KMeans clustering on response data to create an automated risk classification. The research assesses a multi-tiered defense structure that includes input cleansing and real-time oversight, showing quantifiable decreases in the intensity of security risks. The results highlight the necessity for multi-metric assessments and the adoption of defense-in-depth approaches to guarantee the safe and reliable use of LLMs in operational settings.