Implementation of Machine Learning Technique for Fast and Reliable DNA Sequence Classification
Contributors
Dr. Kshatrapal Singh
Dr. Raja Sarath Kumar Boddu
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
DNA sequence data is currently growing at an exponential rate due to advancements in sequencing methods, which has also thrust DNA sequence research into the big data revolution. A variety of strong computer algorithms known as machine learning (ML) may create predictive models by intelligently and autonomously analyzing enormous amounts of frequently unstructured data. It has achieved many experimental successes and is commonly utilized in the analysis of DNA sequence data. Since DNA contains most of an organism's genetic information, it can be used to classify DNA sequences and identify diseases at an earlier stage. This explains why biological computation places a high value on the grouping of DNA sequences. This research has proposed a method for classifying DNA sequences using data obtained from the NCBI. This work proposes a new method for feature extraction from DNA sequence that employs hot vector matrix, as well as a machine learning-based classifier. Each word pair in the hot vector that represents the DNA sequence is denoted by a binary matrix that shows where each nucleotide is located in the sequence. After that, the final matrix is fed into a conventional CNN in order to extract features.