EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning

Jiyun Zhou, Qin Lu, Ruifeng Xu, Lin Gui, Hongpeng Wang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.

Original languageEnglish
Article number8417936
Pages (from-to)124-135
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume17
Issue number1
DOIs
Publication statusPublished - 1 Jan 2020

Keywords

  • bi-grams
  • DNA-binding residue
  • ensemble learning
  • LSTM
  • Protein-DNA interaction
  • relationship

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this