Hybrid term indexing for different IR models

Ken C.W. Chow, Wing Pong Robert Luk, K. F. Wong, K. L. Kwok

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

3 Citations (Scopus)

Abstract

2000 ACM. Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others like Japanese and Korean), there are no space to delimit words. Indexing using hybrid terms (i.e. words and bigrams) were able to achieve the best precision amongst homogenous terms at a lower storage cost than indexing with bigrams. However, this was tested with conjunctive queries. Here, we extended the weighted Boolean models using fuzzy and p-norm measures, as well as the vector space model using the cosine measure, for processing hybrid terms. Our evaluation shows that all IR models using hybrid terms achieve better average precision over those using words. Across different recall values, the weighted Boolean model using fuzzy measures with hybrid terms achieve consistently about 8% higher than those using words. The vector space model using the cosine measures with hybrid terms achieved the best improvement in the average recall and precision.
Original languageEnglish
Title of host publicationProceedings of the 5th international Workshop on Information Retrieval with Asian Languages, IRAL 2000
PublisherAssociation for Computing Machinery, Inc
Pages49-54
Number of pages6
ISBN (Electronic)1581133006, 9781581133004
DOIs
Publication statusPublished - 1 Nov 2000
Event5th International Workshop on Information Retrieval with Asian Languages, IRAL 2000 - Hong Kong, Hong Kong
Duration: 30 Sep 20001 Oct 2000

Conference

Conference5th International Workshop on Information Retrieval with Asian Languages, IRAL 2000
Country/TerritoryHong Kong
CityHong Kong
Period30/09/001/10/00

Keywords

  • Chinese information retrieval
  • Evaluation
  • Indexing
  • IR models

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Cite this