Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework
Article
Article Title | Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework |
---|---|
ERA Journal ID | 36845 |
Article Category | Article |
Authors | Ahmed, Fee Faysal, Khatun, Mst. Shamima, Mosharaf, Md. Parvez and Mollah, Md. Nurul Haque |
Journal Title | Current Bioinformatics |
Journal Citation | 16 (6), pp. 865-879 |
Number of Pages | 15 |
Year | 2021 |
Place of Publication | United Arab Emirates |
ISSN | 1574-8936 |
Digital Object Identifier (DOI) | https://doi.org/10.2174/1574893616666210204145254 |
Web Address (URL) | https://www.eurekaselect.com/article/113978 |
Abstract | Background: Protein-protein interactions (PPI) play a vital role in a wide range of biological processes starting from cell-cell interactions to developmental control in all organisms. However, experimental identification of PPI is often laborious, time-consuming and costly compared to computational prediction. There are several computational prediction models in the literature based on complete training samples, but none of them dealt with the partial training samples. Objective: The objective of this work was to develop an effective PPI prediction model for Arabidopsis Thaliana using partial training samples in a machine learning framework. Methods: We proposed an effective computational PPI prediction model by combining random forest (RF) classifier and autocorrelation (AC) sequence encoding features with 1:2 ratio of positive-PPI and unknown-PPI samples. Results: We observed that the proposed prediction model produces the highest average performance scores of sensitivity (94.62%), AUC (0.92) and pAUC (0.189) with the training datasets and sensitivity (88.14%), AUC (0.89) and pAUC (0.176) with the test datasets of 5-fold cross-validation compared to other candidate predictors based on LDA, LOGI, ADA, NB, KNN & SVM classifiers. It also computed the highest performance scores of TPR (91.82%) and pAUC (0.174) at FPR= 20% with AUC (0.948) compared to other candidate predictors. Conclusion: Overall performance of the developed model revealed that our proposed predictor might be useful to elucidate the biological function of unseen PPIs from a large number of candidate proteins in Arabidopsis thaliana. |
Keywords | Arabidopsis thaliana; Interaction prediction; Machine learning ap-proach; Protein sequence; Protein-protein interaction; Random forest; Sequence encoding |
Contains Sensitive Content | Does not contain sensitive content |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Jashore University of Science and Technology, Bangladesh |
University of Rajshahi, Bangladesh | |
Kyushu Institute of Technology, Japan |
https://research.usq.edu.au/item/yy7yq/prediction-of-protein-protein-interactions-in-arabidopsis-thaliana-using-partial-training-samples-in-a-machine-learning-framework
34
total views1
total downloads1
views this month0
downloads this month