Mutation prediction in the SARS-CoV-2 genome using attention-based neural machine translation

  • Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) has been evolving rapidly after causing havoc worldwide in 2020. Since then, it has been very hard to contain the virus owing to its frequently mutating nature. Changes in its genome lead to viral evolution, rendering it more resistant to existing vaccines and drugs. Predicting viral mutations beforehand will help in gearing up against more infectious and virulent versions of the virus in turn decreasing the damage caused by them. In this paper, we have proposed different NMT (neural machine translation) architectures based on RNNs (recurrent neural networks) to predict mutations in the SARS-CoV-2-selected non-structural proteins (NSP), i.e., NSP1, NSP3, NSP5, NSP8, NSP9, NSP13, and NSP15. First, we created and pre-processed the pairs of sequences from two languages using k-means clustering and nearest neighbors for training a neural translation machine. We also provided insights for training NMTs on long biological sequences. In addition, we evaluated and benchmarked our models to demonstrate their efficiency and reliability.

Download full text files

Export metadata

Metadaten
Author:Darrak Moin Quddusi, Sandesh Athni Hiremath, Naim Bajcinca
URN:urn:nbn:de:hbz:386-kluedo-85051
Parent Title (English):Mathematical Biosciences and Engineering
Publisher:AIMS Press
Editor:Pedro Carmona Sáez
Document Type:Article
Language of publication:English
Date of Publication (online):2024/05/20
Year of first Publication:2024
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Date of the Publication (Server):2024/11/21
Issue:2024, 21(5): 5996-6018
Source:10.3934/mbe.2024264
Faculties / Organisational entities:Kaiserslautern - Fachbereich Maschinenbau und Verfahrenstechnik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Collections:Open-Access-Publikationsfonds
Licence (German):Zweitveröffentlichung