Fast forward index methods for pseudo-relevance feedback retrieval

Edward Kai Fung Dang, Wing Pong Robert Luk, James Allan

Research output: Journal article publicationJournal articleAcademic researchpeer-review

5 Citations (Scopus)

Abstract

The inverted index is the dominant indexing method in information retrieval systems. It enables fast return of the list of all documents containing a given query term. However, for retrieval schemes involving query expansion, as in pseudo-relevance feedback (PRF), the retrieval time based on an inverted index increases linearly with the number of expansion terms. In this regard, we have examined the use of a forward index, which consists of the mapping of each document to its constituent terms. We propose a novel forward index-based reranking scheme to shorten the PRF retrieval time. In our method, a first retrieval of the original query is performed using an inverted index, and then a forward index is employed for the PRF part. We have studied several new forward indexes, including using a novel spstring data structure and the weighted variable bit-block compression (wvbc) signature. With modern hardware such as solid-state drives (SSDs) and sufficiently large main memory, forward index methods are particularly promising. We find that with the whole index stored in main memory, PRF retrieval using a spstring or wvbc forward index excels in time efficiency over an inverted index, being able to obtain the same levels of performance measures at shorter times.
Original languageEnglish
JournalACM Transactions on Information Systems
Volume33
Issue number4
DOIs
Publication statusPublished - 1 Apr 2015

Keywords

  • Forward index
  • Information retrieval
  • Inverted index
  • Time efficiency

ASJC Scopus subject areas

  • Information Systems
  • General Business,Management and Accounting
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Fast forward index methods for pseudo-relevance feedback retrieval'. Together they form a unique fingerprint.

Cite this