Topic sequence kernel

Jian Xu, Qin Lu, Zhengzhong Liu, Junyi Chai

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

This paper addresses the problem of classifying documents using the kernel approaches based on topic sequences. Previously, the string kernel uses the ordered subsequence of characters as features and the word sequence kernel is proposed to use words as the subsequences. However, they both face the problem of computational complexity because of the large amount of symbols (characters or words). This paper, therefore, proposes to use sequences of topics rather than characters or words to reduce the number of symbols, thus increasing the computational efficiency. Documents that exhibit similar posterior topic proportions are expected to have similar topic sequence and then should be classified into the same category. Experiments conducted on the Reuters-21578 datasets have proven this hypothesis.
Original languageEnglish
Title of host publicationInformation Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings
Pages457-466
Number of pages10
DOIs
Publication statusPublished - 31 Dec 2012
Event8th Asia Information Retrieval Societies Conference, AIRS 2012 - Tianjin, China
Duration: 17 Dec 201219 Dec 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7675 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Asia Information Retrieval Societies Conference, AIRS 2012
Country/TerritoryChina
CityTianjin
Period17/12/1219/12/12

Keywords

  • Classification
  • String kernel
  • Topic sequence

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this