Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework

Article


Ahmed, Fee Faysal, Khatun, Mst. Shamima, Mosharaf, Md. Parvez and Mollah, Md. Nurul Haque. 2021. "Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework." Current Bioinformatics. 16 (6), pp. 865-879. https://doi.org/10.2174/1574893616666210204145254
Article Title

Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework

ERA Journal ID36845
Article CategoryArticle
AuthorsAhmed, Fee Faysal, Khatun, Mst. Shamima, Mosharaf, Md. Parvez and Mollah, Md. Nurul Haque
Journal TitleCurrent Bioinformatics
Journal Citation16 (6), pp. 865-879
Number of Pages15
Year2021
Place of PublicationUnited Arab Emirates
ISSN1574-8936
Digital Object Identifier (DOI)https://doi.org/10.2174/1574893616666210204145254
Web Address (URL)https://www.eurekaselect.com/article/113978
Abstract

Background: Protein-protein interactions (PPI) play a vital role in a wide range of biological processes starting from cell-cell interactions to developmental control in all organisms. However, experimental identification of PPI is often laborious, time-consuming and costly compared to computational prediction. There are several computational prediction models in the literature based on complete training samples, but none of them dealt with the partial training samples.

Objective: The objective of this work was to develop an effective PPI prediction model for Arabidopsis Thaliana using partial training samples in a machine learning framework.

Methods: We proposed an effective computational PPI prediction model by combining random forest (RF) classifier and autocorrelation (AC) sequence encoding features with 1:2 ratio of positive-PPI and unknown-PPI samples.

Results: We observed that the proposed prediction model produces the highest average performance scores of sensitivity (94.62%), AUC (0.92) and pAUC (0.189) with the training datasets and sensitivity (88.14%), AUC (0.89) and pAUC (0.176) with the test datasets of 5-fold cross-validation compared to other candidate predictors based on LDA, LOGI, ADA, NB, KNN & SVM classifiers. It also computed the highest performance scores of TPR (91.82%) and pAUC (0.174) at FPR= 20% with AUC (0.948) compared to other candidate predictors.

Conclusion: Overall performance of the developed model revealed that our proposed predictor might be useful to elucidate the biological function of unseen PPIs from a large number of candidate proteins in Arabidopsis thaliana.

KeywordsArabidopsis thaliana; Interaction prediction; Machine learning ap-proach; Protein sequence; Protein-protein interaction; Random forest; Sequence encoding
Contains Sensitive ContentDoes not contain sensitive content
Public Notes

Files associated with this item cannot be displayed due to copyright restrictions.

Byline AffiliationsJashore University of Science and Technology, Bangladesh
University of Rajshahi, Bangladesh
Kyushu Institute of Technology, Japan
Permalink -

https://research.usq.edu.au/item/yy7yq/prediction-of-protein-protein-interactions-in-arabidopsis-thaliana-using-partial-training-samples-in-a-machine-learning-framework

  • 15
    total views
  • 1
    total downloads
  • 0
    views this month
  • 0
    downloads this month

Export as

Related outputs

Exploration of key drug target proteins highlighting their related regulatory molecules, functional pathways and drug candidates associated with delirium: evidence from meta-data analyses
Mosharaf, Md Parvez, Alam, Khorshed, Gow, Jeff and Mahumud, Rashidul Alam. 2023. "Exploration of key drug target proteins highlighting their related regulatory molecules, functional pathways and drug candidates associated with delirium: evidence from meta-data analyses." BMC Geriatrics. 23. https://doi.org/10.1186/s12877-023-04457-1
Effect of workplace violence on health workers injuries and workplace absenteeism in Bangladesh
Shahjalal, Md., Mosharaf, Md. Parvez and Mahumud, Rashidul Alam. 2023. "Effect of workplace violence on health workers injuries and workplace absenteeism in Bangladesh." Global Health Research and Policy. 8 (1). https://doi.org/10.1186/s41256-023-00316-z
Bioinformatics-based investigation on the genetic influence between SARS-CoV-2 infections and idiopathic pulmonary fibrosis (IPF) diseases, and drug repurposing
Islam, Md. Ariful, Kibria, Md. Kaderi, Hossen, Md. Bayazid, Reza, Md. Selim, Tasmia, Samme Amena, Tuly, Khanis Farhana, Mosharof, Md. Parvez, Kabir, Syed Rashel, Kabir, Md. Hadiul, Mollah, Md. Nurul Haque and Mosharof, P.. 2023. "Bioinformatics-based investigation on the genetic influence between SARS-CoV-2 infections and idiopathic pulmonary fibrosis (IPF) diseases, and drug repurposing." Scientific Reports. 13 (1). https://doi.org/10.1038/s41598-023-31276-6
The burden of chronic diseases, disease-stratified exploration and gender-differentiated healthcare utilisation among patients in Bangladesh
Mahumud, Rashidul Alam, Gow, Jeff, Mosharaf, Md Parvez, Kundu, Satyajit, Rahman, Md Ashfikur, Dukhi, Natisha, Shahajalal, Md, Mistry, Sabuj Kanti and Alam, Khorshed. 2023. "The burden of chronic diseases, disease-stratified exploration and gender-differentiated healthcare utilisation among patients in Bangladesh." PLoS One. 18 (5). https://doi.org/10.1371/journal.pone.0284117
Hospital costs of post-operative delirium: A systematic review
Mosharaf, Md. Parvez, Alam, Khorshed, Ralph, Nicholas and Gow, Jeff. 2022. "Hospital costs of post-operative delirium: A systematic review." Journal of Perioperative Nursing. 35 (2), pp. 14-26. https://doi.org/10.26550/2209-1092.1165
Meta-Data Analysis to Explore the Hub of the Hub-Genes That Influence SARS-CoV-2 Infections Highlighting Their Pathogenetic Processes and Drugs Repurposing
Mosharaf, Md. Parvez, Kibria, Md. Kaderi, Hossen, Md. Bayazid, Islam, Md. Ariful, Reza, Md. Selim, Mahumud, Rashidul Alam, Alam, Khorshed, Gow, Jeffrey and Mollah, Md. Nurul Haque. 2022. "Meta-Data Analysis to Explore the Hub of the Hub-Genes That Influence SARS-CoV-2 Infections Highlighting Their Pathogenetic Processes and Drugs Repurposing." Vaccines. 10 (8), pp. 1-22. https://doi.org/10.3390/vaccines10081248
Comprehensive In Silico Analysis of RNA Silencing-Related Genes and Their Regulatory Elements in Wheat (Triticum aestivum L.)
Akond, Zobaer, Rahman, Hafizur, Ahsan, Md. Asif, Mosharaf, Md. Parvez, Alam, Munirul and Mollah, Md. Nurul Haque. 2022. "Comprehensive In Silico Analysis of RNA Silencing-Related Genes and Their Regulatory Elements in Wheat (Triticum aestivum L.)." BioMed Research International. 2022. https://doi.org/10.1155/2022/4955209
Identification of host transcriptome-guided repurposable drugs for SARS-CoV-1 infections and their validation with SARS-CoV-2 infections by using the integrated bioinformatics approaches
Ahmed, Fee Faysal, Reza, Md. Selim, Sarker, Md. Shahin, Islam, Md. Samiul, Mosharaf, Md. Parvez, Hasan, Sohel and Mollah, Md. Nurul Haque. 2022. "Identification of host transcriptome-guided repurposable drugs for SARS-CoV-1 infections and their validation with SARS-CoV-2 infections by using the integrated bioinformatics approaches." PLoS One. 17 (4 April). https://doi.org/10.1371/journal.pone.0266124
Disclosing Potential Key Genes, Therapeutic Targets and Agents for Non-Small Cell Lung Cancer: Evidence from Integrative Bioinformatics Analysis
Mosharaf, Md. Parvez, Reza, Md. Selim, Gov, Esra, Mahumud, Rashidul Alam and Mollah, Md. Nurul Haque. 2022. "Disclosing Potential Key Genes, Therapeutic Targets and Agents for Non-Small Cell Lung Cancer: Evidence from Integrative Bioinformatics Analysis." Vaccines. 10 (5). https://doi.org/10.3390/vaccines10050771
Computational identification of host genomic biomarkers highlighting their functions, pathways and regulators that influence SARS‑CoV‑2 infections and drug repurposing
Mosharaf, Md. Parvez, Reza, Md. Selim, Kibria, Md. Kaderi, Ahmed, Fee Faysal, Kabir, Md. Hadiul, Hasan, Sohel and Mollah, Md. Nurul Haque. 2022. "Computational identification of host genomic biomarkers highlighting their functions, pathways and regulators that influence SARS‑CoV‑2 infections and drug repurposing." Scientific Reports. 12 (1). https://doi.org/10.1038/s41598-022-08073-8
Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana
Mosharaf, Md. Parvez, Hassan, Md. Mehedi, Ahmed, Fee Faysal, Khatun, Mst. Shamima, Moni, Mohammad Ali and Mollah, Md. Nurul Haque. 2020. "Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana." Computational Biology and Chemistry. 85. https://doi.org/10.1016/j.compbiolchem.2020.107238
In silico identification and characterization of AGO, DCL and RDR gene families and their associated regulatory elements in sweet orange (Citrus sinensis L.)
Mosharaf, Md. Parvez, Rahman, Hafizur, Ahsan, Md. Asif, Akond, Zobaer, Ahmed, Fee Faysal, Islam, Md. Mazharul, Moni, Mohammad Ali and Mollah, Md. Nurul Haque. 2020. "In silico identification and characterization of AGO, DCL and RDR gene families and their associated regulatory elements in sweet orange (Citrus sinensis L.)." PLoS One. 15 (12 December). https://doi.org/10.1371/journal.pone.0228233