Abstract
The nested arc-annotation of a sequence is an important model used to represent structural information for RNA and protein sequences. Given two sequences S1 and S2 and a nested arc-annotation P 1 for S1, this paper considers the problem of inferring the nested arc-annotation P2 for S2 such that (S 1, P1) and (S2, P2) have the largest common substructure. The problem has a direct application in predicting the secondary structure of an RNA sequence given a closely related sequence with known secondary structure. The currently most efficient algorithm for this problem requires O(nm3) time and O(nm2) space where n is the length of the sequence with known arc-annotation and m is the length of the sequence whose arc-annotation is to be inferred. By using sparsification on a new recursive dynamic programming algorithm and applying a Hirschberg-like traceback technique with compression, we obtain an improved algorithm that runs in min{O(nm2 + n2m),O(nm2 log n), O(nm 3) time and min{O(m2 + mn), O(m2 log n + n)} space.
Original language | English |
---|---|
Pages (from-to) | 223-245 |
Number of pages | 23 |
Journal | Algorithmica (New York) |
Volume | 46 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Oct 2006 |
Externally published | Yes |
Keywords
- Arc annotations
- RNA secondary structure
- Sequence structure alignment
- Sparsification technique
ASJC Scopus subject areas
- General Computer Science
- Computer Science Applications
- Applied Mathematics