Abstract
Effective passage retrieval is crucial for conversation question answering (QA) but challenging due to the ambiguity of questions. Current methods rely on the dual-encoder architecture to embed contextualized vectors of questions in conversations. However, this architecture is limited in the embedding bottleneck and the dot-product operation. To alleviate these limitations, we propose generative retrieval for conversational QA (GCoQA). GCoQA assigns distinctive identifiers for passages and retrieves passages by generating their identifiers token-by-token via the encoder–decoder architecture. In this generative way, GCoQA eliminates the need for a vector-style index and could attend to crucial tokens of the conversation context at every decoding step. We conduct experiments on three public datasets over a corpus containing about twenty million passages. The results show GCoQA achieves relative improvements of +13.6% in passage retrieval and +42.9% in document retrieval. GCoQA is also efficient in terms of memory usage and inference speed, which only consumes 1/10 of the memory and takes in less than 33% of the time. The code and data are released at https://github.com/liyongqi67/GCoQA.
Original language | English |
---|---|
Article number | 103475 |
Pages (from-to) | 1-12 |
Journal | Information Processing and Management |
Volume | 60 |
Issue number | 5 |
DOIs | |
Publication status | Published - Sept 2023 |
Keywords
- Conversational question answering
- Generative retrieval
ASJC Scopus subject areas
- Information Systems
- Media Technology
- Computer Science Applications
- Management Science and Operations Research
- Library and Information Sciences