Interpreting TF-IDF term weights as making relevance decisions

Ho Chung Wu, Wing Pong Robert Luk, Kam Fai Wong, Kui Lam Kwok

Research output: Journal article publicationJournal articleAcademic researchpeer-review

464 Citations (Scopus)

Abstract

A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these "local" relevance decisions as the "document-wide" relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity - log p(r̄t ∈ d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by ) of term t. Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.
Original languageEnglish
Article number13
JournalACM Transactions on Information Systems
Volume26
Issue number3
DOIs
Publication statusPublished - 1 Jun 2008

Keywords

  • Information retrieval
  • Relevance decision
  • Term weight

ASJC Scopus subject areas

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Cite this