A Comparative Study of Deterministic and Stochastic Motif Discovery Algorithms for Coronaviral Genomic Surveillance
Contributors
Dr. Pushpa Susant Mahapatro
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
The ability to experimentally identify the sites of replication of the genomes as well as the non-contiguous locations of regulatory motifs in Coronaviruses has several huge obstacles to overcome; these include high mutation rates, low conservation, and difficulty scaling across rapidly emerging variants. The current study evaluated the relative effectiveness of using two deterministic algorithms (Greedy Motif Search and Greedy with Pseudocounts) versus two sampling-based frameworks (Randomized Motif Search and Gibbs Sampler). All the viral sequences used were curated by the NCBI from high quality complete viral sequence repositories. Based on statistical evaluation of the model outputs using sensitivity, specificity and accuracy, deterministic algorithms are computationally efficient; however, they become trapped in local optima (poor solutions) when the mutation rate is high. On the other hand, stochastic sampling methods (specifically Gibbs Sampler) exhibited an increased degree of robustness when isolating subtle, non-contiguous mutated motifs from background biological “noise.” The combined data from this analytical study provides an implementation of a capable and scalable computational methodology for expediting downstream discovery of antiviral targets, automated structural drug design, and establishment of a global real-time genomic surveillance network.