Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
Article
Article Title | Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
---|---|
ERA Journal ID | 212275 |
Article Category | Article |
Authors | Erten, Mehmet, Acharya, Madhav R., Kamath, Aditya P., Sampathila, Niranjana, Bairy, G. Muralidhar, Aydemir, Emrah, Barua, Prabal Datta, Baygin, Mehmet, Tuncer, Ilknur, Dogan, Sengul and Tuncer, Turker |
Journal Title | Diagnostics |
Journal Citation | 12 (12) |
Article Number | 3181 |
Number of Pages | 15 |
Year | 2022 |
Publisher | MDPI AG |
Place of Publication | Switzerland |
ISSN | 2075-4418 |
Digital Object Identifier (DOI) | https://doi.org/10.3390/diagnostics12123181 |
Web Address (URL) | https://www.mdpi.com/2075-4418/12/12/3181 |
Abstract | SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections. |
Keywords | bioinformatics; Hamlet Pattern; protein sequence classification; SARS-CoV-2 |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 3206. Medical biotechnology |
Byline Affiliations | Malatya Training and Research Hospital, Turkiye |
Manipal Academy of Higher Education, India | |
Brown University, United States | |
Sakarya University, Turkiye | |
School of Business | |
University of Technology Sydney | |
Ardahan University, Turkiye | |
Government office in Elazig, Turkiye | |
Firat University, Turkey |
https://research.usq.edu.au/item/yywq4/hamlet-pattern-based-automated-covid-19-and-influenza-detection-model-using-protein-sequences
Download files
56
total views24
total downloads4
views this month1
downloads this month