A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval

Edward Kai Fung Dang, Robert Wing Pong Luk, James Allan

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-To-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.

Original languageEnglish
Article number62
Pages (from-to)1-37
JournalACM Transactions on Information Systems
Volume40
Issue number3
DOIs
Publication statusPublished - Jul 2022

Keywords

  • comparison
  • evaluation
  • Information retrieval
  • multiple hypotheses testing
  • retrieval model

ASJC Scopus subject areas

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Cite this