Data Engineering and Feature Design for AI/ML-Driven Cyber Threat Intelligence
Contributors
Sesha Bhargavi
Dr. Upendra Kumar
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
This paper presents Stage II of the ADCTI-AR (AI-Driven Cyber Threat Intelligence and Adaptive Response) research programme. Building upon the systematic literature review and conceptual framework of Stage I, this work completes three core phases: (i) large-scale data collection and curation from heterogeneous security sources; (ii) rigorous preprocessing and multi-dimensional feature engineering producing a 92-dimensional threat feature matrix; and (iii) development and preliminary evaluation of multiple ML architectures. The curated dataset integrates NSL-KDD, CICIDS-2017, UNSW-NB15, and MITRE ATT&CK evaluation data, supplemented by Conditional GAN-generated zero-day attack samples to address severe class imbalance. A hybrid ensemble model combining Deep Neural Networks, Random Forests, and LSTM networks achieves 99.31% detection accuracy, 0.43% False Positive Rate, and AUC-ROC of 0.9984 on benchmark datasets. Autoencoder-based anomaly detection records a 91.47% detection rate on zero-day attack patterns. These results confirm the validity of the ADCTI-AR architectural design and establish a strong baseline for Stage III.