Abstract
Retrieval effectiveness depends on both the retrieval model and how terms are extracted and indexed. For Chinese, Japanese and Korea text, there are no spaces to delimit words. Indexing using hybrid terms (i.e. words and bigrams) was found to be effective and efficient using the 2-Poisson model in NTCIR-III open evaluation workshop. Here, we explore another Okapi weight, BM25, based on the 2-Poisson model and compared their performances with bigram and word indexing strategies. Results show that word indexing is the most efficient in terms of indexing time and storage but hybrid term indexing requires the least amount of retrieval time per query. Without pseudo-relevance feedback (PRF), our BM25 appeared to yield better retrieval effectiveness performance for short queries. With PRF, our implementation of the BM11 weights, which are a simplified version of BM25, with hybrid term indexing remains the most effective combination for retrieval in this study.
Original language | English |
---|---|
Pages (from-to) | 1745-1752 |
Number of pages | 8 |
Journal | IEICE Transactions on Information and Systems |
Volume | E86-D |
Issue number | 9 |
Publication status | Published - 1 Jan 2003 |
Keywords
- 2-Poisson model
- Chinese information retrieval
- Evaluation
- Indexing
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering
- Artificial Intelligence