TY - GEN
T1 - Linear-time protein 3-D structure searching with insertions and deletions
AU - Shibuya, Tetsuo
AU - Jansson, Jesper Andreas
AU - Sadakane, Kunihiko
PY - 2009/11/2
Y1 - 2009/11/2
N2 - It becomes more and more important to search for similar structures from molecular 3-D structure databases in the structural biology of the post genomic era. Two molecules are said to be similar if the RMSD (root mean square deviation) of the two molecules is less than or equal to some given constant bound. In this paper, we consider an important, fundamental problem of finding all the similar substructures from 3-D structure databases of chain molecules (such as proteins), with consideration of indels (i.e., insertions and deletions). The problem has been believed to be very difficult, but its computational difficulty has not been well known. In this paper, we first show that the same problem in arbitrary dimension is NP-hard. Moreover, we also propose a new algorithm that dramatically improves the average-case time complexity for the problem, in case the number of indels k is bounded by some constant. Our algorithm solves the above problem in average O(N) time, while the time complexity of the best known algorithm was O(Nm k + 1), for a query of size m and a database of size N.
AB - It becomes more and more important to search for similar structures from molecular 3-D structure databases in the structural biology of the post genomic era. Two molecules are said to be similar if the RMSD (root mean square deviation) of the two molecules is less than or equal to some given constant bound. In this paper, we consider an important, fundamental problem of finding all the similar substructures from 3-D structure databases of chain molecules (such as proteins), with consideration of indels (i.e., insertions and deletions). The problem has been believed to be very difficult, but its computational difficulty has not been well known. In this paper, we first show that the same problem in arbitrary dimension is NP-hard. Moreover, we also propose a new algorithm that dramatically improves the average-case time complexity for the problem, in case the number of indels k is bounded by some constant. Our algorithm solves the above problem in average O(N) time, while the time complexity of the best known algorithm was O(Nm k + 1), for a query of size m and a database of size N.
UR - http://www.scopus.com/inward/record.url?scp=70350367027&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04241-6_26
DO - 10.1007/978-3-642-04241-6_26
M3 - Conference article published in proceeding or book
SN - 3642042406
SN - 9783642042409
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 310
EP - 320
BT - Algorithms in Bioinformatics - 9th International Workshop, WABI 2009, Proceedings
T2 - 9th International Workshop on Algorithms in Bioinformatics, WABI 2009
Y2 - 12 September 2009 through 13 September 2009
ER -