TY - JOUR
T1 - A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval
AU - Dang, Edward Kai Fung
AU - Luk, Robert Wing Pong
AU - Allan, James
N1 - Funding Information:
This work was supported in part by the HK PolyU project P0030932. Authors’ addresses: E. K. F. Dang and R. W. P. Luk, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong; emails: {cskfdang, csrluk}@comp.polyu.edu.hk; J. Allan, College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, 01003-9264; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 1046-8188/2021/12-ART62 $15.00 https://doi.org/10.1145/3483612
Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2022/7
Y1 - 2022/7
N2 - In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-To-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.
AB - In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-To-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.
KW - comparison
KW - evaluation
KW - Information retrieval
KW - multiple hypotheses testing
KW - retrieval model
UR - http://www.scopus.com/inward/record.url?scp=85127622824&partnerID=8YFLogxK
U2 - 10.1145/3483612
DO - 10.1145/3483612
M3 - Journal article
AN - SCOPUS:85127622824
SN - 1046-8188
VL - 40
SP - 1
EP - 37
JO - ACM Transactions on Information Systems
JF - ACM Transactions on Information Systems
IS - 3
M1 - 62
ER -