A MapReduce based parallel SVM for large-scale predicting protein-protein interactions

Zhu Hong You, Jian Zhong Yu, Lin Zhu, Shuai Li, Zhen Kun Wen

Research output: Journal article publicationJournal articleAcademic researchpeer-review

101 Citations (Scopus)


Protein-protein interactions (PPIs) are crucial to most biochemical processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. Although large amount of protein-protein interaction data for different species has been generated by high-throughput experimental techniques, the number is still limited compared to the total number of possible PPIs. Furthermore, the experimental methods for identifying PPIs are both time-consuming and expensive. Therefore, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. In this article, we propose a novel MapReduce-based parallel SVM model for large-scale predicting protein-protein interactions only using the information of protein sequences. First, the local sequential features represented by autocorrelation descriptor are extracted from protein sequences. Then the MapReduce framework is employed to train support vector machine (SVM) classifiers in a distributed way, obtaining significant improvement in training time while maintaining a high level of accuracy. The experimental results demonstrate that the proposed parallel algorithms not only can tackle large-scale PPIs dataset, but also perform well in terms of the evaluation metrics of speedup and accuracy. Consequently, the proposed approach can be considered as a new promising and powerful tools for large-scale predicting PPI with excellent performance and less time.
Original languageEnglish
Pages (from-to)37-43
Number of pages7
Publication statusPublished - 5 Dec 2014


  • Autocorrelation descriptor
  • MapReduce
  • Protein sequence
  • Protein-protein interaction
  • Support vector machine

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence


Dive into the research topics of 'A MapReduce based parallel SVM for large-scale predicting protein-protein interactions'. Together they form a unique fingerprint.

Cite this