Spectral approaches for DNA sequence classification

K. O. Cheng, Ngai Fong Law, W. C. Siu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Z-curve features are one of the popular features used in DNA sequence classification. Here, we studied the Z-curve features from a signal processing point of view. In particular, the Z-curve features are re-interpreted through a spectral formulation. Our analysis showed that there are significant differences in the spectral interpretation between the Z-curve formulation and the FFT (Fast Fourier Transform) approach. From the spectral formulation of the Z-curve approach, we obtained three modified sequences that characterize different biological properties which are useful for coding region prediction. Spectral analysis on the modified sequences showed a much more prominent three-periodicity property in coding regions than using the FFT approach. Our experiments indicated that for long sequences, prominent peaks at 2II/3 are observed at coding regions. For short sequences, peaks can still be observed at coding regions. We also obtained good classification performance using the spectral features derived from the three modified sequences.
Original languageEnglish
Title of host publicationFourth International Conference on Information Technology and Applications, ICITA 2007
Pages541-544
Number of pages4
Publication statusPublished - 3 Dec 2007
Event4th International Conference on Information Technology and Applications, ICITA 2007 - Harbin, China
Duration: 15 Jan 200718 Jan 2007

Conference

Conference4th International Conference on Information Technology and Applications, ICITA 2007
Country/TerritoryChina
CityHarbin
Period15/01/0718/01/07

Keywords

  • Coding region
  • DNA sequence
  • Fourier approach
  • Spectral analysis
  • Z-curve approach

ASJC Scopus subject areas

  • Computer Science(all)

Cite this