Numericalization of the self adaptive spectral rotation method for coding region prediction

Bo Chen, Ping Ji

Research output: Journal article publicationJournal articleAcademic researchpeer-review

4 Citations (Scopus)

Abstract

Recently, for identifying protein coding regions in new sequences from unknown organisms without training sets, a Self Adaptive Spectral Rotation (SASR) method has been developed to visualize the Triplet Periodicity (TP) property, which is a simple and universal coding related property. The rough locations of coding regions can be visually revealed by the SASR method, without any training. However, the method does not numerically discriminate the locations of coding regions. Based on the SASR method, we develop a new approach, named the T-Z-T analysis, to provide numerical results of coding region prediction. This approach adopts a t-test segmentation to separate coding and non-coding regions in the SASR's output and further uses a z-test filter to recognize region patterns. After that, another t-test segmentation is conducted to break down adjacent coding regions by detecting the frame shifts. Since it is based on the graphic output of the SASR, this approach does not require any training. Meanwhile, this approach is more stable, because it is not sensitive to errors in the input DNA sequence. Such advantages make it suitable for coding region prediction in the early stage, when there is insufficient training set, and even the input data are inaccurate.
Original languageEnglish
Pages (from-to)95-102
Number of pages8
JournalJournal of Theoretical Biology
Volume296
DOIs
Publication statusPublished - 7 Mar 2012

Keywords

  • Gene finding
  • Local stationary process
  • Triplet periodicity
  • Visualization

ASJC Scopus subject areas

  • Statistics and Probability
  • Modelling and Simulation
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this