An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers

Zhu Hong You, Xiao Li, Chun Chung Chan

Research output: Journal article publicationJournal articleAcademic researchpeer-review

38 Citations (Scopus)


Protein-protein Interactions (PPIs) play important roles in a wide variety of cellular processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. High-throughput biological experiments for identifying PPIs are beginning to provide valuable information about the complexity of PPI networks, but are expensive, cumbersome, and extremely time-consuming. Hence, there is a need for accurate and robust computational methods for predicting PPIs. In this article, a sequence-based approach is proposed by combining a novel amino acid substitution matrix feature representation and Rotation Forest (RF) classifier. Given the protein sequences as input, the proposed method predicts whether or not the pair of proteins interacts. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 93.74% prediction accuracy with 90.05% sensitivity at the precision of 97.08%. Extensive experiments are performed to compare our method with the existing sequence-based method. Experimental results demonstrate that PPIs can be reliably predicted using only sequence-derived information. Achieved results show that the proposed approach offers an inexpensive method for computational construction of PPI networks, so it can be a useful supplementary tool for future proteomics studies.
Original languageEnglish
Pages (from-to)277-282
Number of pages6
Publication statusPublished - 8 Mar 2017


  • Ensemble classifier
  • Protein sequence
  • Protein-protein interaction
  • Rotation forest
  • Substitution matrix

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this