TY - GEN
T1 - Topic sequence kernel
AU - Xu, Jian
AU - Lu, Qin
AU - Liu, Zhengzhong
AU - Chai, Junyi
PY - 2012/12/31
Y1 - 2012/12/31
N2 - This paper addresses the problem of classifying documents using the kernel approaches based on topic sequences. Previously, the string kernel uses the ordered subsequence of characters as features and the word sequence kernel is proposed to use words as the subsequences. However, they both face the problem of computational complexity because of the large amount of symbols (characters or words). This paper, therefore, proposes to use sequences of topics rather than characters or words to reduce the number of symbols, thus increasing the computational efficiency. Documents that exhibit similar posterior topic proportions are expected to have similar topic sequence and then should be classified into the same category. Experiments conducted on the Reuters-21578 datasets have proven this hypothesis.
AB - This paper addresses the problem of classifying documents using the kernel approaches based on topic sequences. Previously, the string kernel uses the ordered subsequence of characters as features and the word sequence kernel is proposed to use words as the subsequences. However, they both face the problem of computational complexity because of the large amount of symbols (characters or words). This paper, therefore, proposes to use sequences of topics rather than characters or words to reduce the number of symbols, thus increasing the computational efficiency. Documents that exhibit similar posterior topic proportions are expected to have similar topic sequence and then should be classified into the same category. Experiments conducted on the Reuters-21578 datasets have proven this hypothesis.
KW - Classification
KW - String kernel
KW - Topic sequence
UR - http://www.scopus.com/inward/record.url?scp=84871566194&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-35341-3_41
DO - 10.1007/978-3-642-35341-3_41
M3 - Conference article published in proceeding or book
SN - 9783642353406
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 457
EP - 466
BT - Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings
T2 - 8th Asia Information Retrieval Societies Conference, AIRS 2012
Y2 - 17 December 2012 through 19 December 2012
ER -