Abstract
Z-curve features are one of the popular features used in DNA sequence classification. Here, we studied the Z-curve features from a signal processing point of view. In particular, the Z-curve features are re-interpreted through a spectral formulation. Our analysis showed that there are significant differences in the spectral interpretation between the Z-curve formulation and the FFT (Fast Fourier Transform) approach. From the spectral formulation of the Z-curve approach, we obtained three modified sequences that characterize different biological properties which are useful for coding region prediction. Spectral analysis on the modified sequences showed a much more prominent three-periodicity property in coding regions than using the FFT approach. Our experiments indicated that for long sequences, prominent peaks at 2II/3 are observed at coding regions. For short sequences, peaks can still be observed at coding regions. We also obtained good classification performance using the spectral features derived from the three modified sequences.
Original language | English |
---|---|
Title of host publication | Fourth International Conference on Information Technology and Applications, ICITA 2007 |
Pages | 541-544 |
Number of pages | 4 |
Publication status | Published - 3 Dec 2007 |
Event | 4th International Conference on Information Technology and Applications, ICITA 2007 - Harbin, China Duration: 15 Jan 2007 → 18 Jan 2007 |
Conference
Conference | 4th International Conference on Information Technology and Applications, ICITA 2007 |
---|---|
Country/Territory | China |
City | Harbin |
Period | 15/01/07 → 18/01/07 |
Keywords
- Coding region
- DNA sequence
- Fourier approach
- Spectral analysis
- Z-curve approach
ASJC Scopus subject areas
- General Computer Science