Abstract
Model construction is a kind of knowledge engineering, and building retrieval models is critical to the success of search engines. This article proposes a new (retrieval) language model, called binary independence language model (BILM). It integrates two document-context based language models together into one by the log-odds ratio where these two are language models applied to describe document-contexts of query terms. One model is based on relevance information while the other is based on the non-relevance information. Each model incorporates link dependencies and multiple query term dependencies. The probabilities are interpolated between the relative frequency and the background probabilities. In a simulated relevance feedback environment of top 20 judged documents, our BILM performed statistically significantly better than the other highly effective retrieval models at 95% confidence level across four TREC collections using fixed parameter values for the mean average precision. For the less stable performance measure (i.e. precision at the top 10), no statistical significance is shown between the different models for the individual test collections although numerically our BILM is better than two other models with a confidence level of 95% based on a paired sign test across the test collections of both relevance feedback and retrospective experiments.
Original language | English |
---|---|
Pages (from-to) | 873-895 |
Number of pages | 23 |
Journal | International Journal of Software Engineering and Knowledge Engineering |
Volume | 29 |
Issue number | 6 |
DOIs | |
Publication status | Published - 1 Jun 2019 |
Keywords
- Information retrieval
- language model
- proximity matching
ASJC Scopus subject areas
- Software
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence