TY - GEN
T1 - Multiview Identifiers Enhanced Generative Retrieval
AU - Li, Yongqi
AU - Yang, Nan
AU - Wang, Liang
AU - Wei, Furu
AU - Li, Wenjie
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023/7
Y1 - 2023/7
N2 - Instead of simply matching a query to preexisting passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness. The code is released at https://github.com/liyongqi67/MINDER.
AB - Instead of simply matching a query to preexisting passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness. The code is released at https://github.com/liyongqi67/MINDER.
UR - http://www.scopus.com/inward/record.url?scp=85174421453&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
AN - SCOPUS:85174421453
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 6636
EP - 6648
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -