TY - GEN
T1 - Lighter And Better
T2 - 18th ACM International Conference on Web Search and Data Mining, WSDM 2025
AU - Wu, Chenyuan
AU - Shao, Ninglu
AU - Liu, Zheng
AU - Xiao, Shitao
AU - Li, Chaozhuo
AU - Zhang, Chen
AU - Wang, Senzhang
AU - Lian, Defu
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/3/10
Y1 - 2025/3/10
N2 - The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial computational overhead. On the other hand, directly using generic Large Language Models (LLMs) often leads to sub-optimal answers, while task-specific fine-tuning may compromise the LLMs' general capabilities. To address these challenges, we introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG). In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the LLMs. Simultaneously, these compressed embeddings are optimized to enhance downstream RAG performance. A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts. With these designs, FlexRAG achieves superior generation quality while significantly reducing running costs. The experiments across multiple QA datasets validate our approach as a cost-effective and flexible solution for RAG systems (codebase: https://github.com/wcyno23/FlexRAG).
AB - The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial computational overhead. On the other hand, directly using generic Large Language Models (LLMs) often leads to sub-optimal answers, while task-specific fine-tuning may compromise the LLMs' general capabilities. To address these challenges, we introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG). In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the LLMs. Simultaneously, these compressed embeddings are optimized to enhance downstream RAG performance. A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts. With these designs, FlexRAG achieves superior generation quality while significantly reducing running costs. The experiments across multiple QA datasets validate our approach as a cost-effective and flexible solution for RAG systems (codebase: https://github.com/wcyno23/FlexRAG).
KW - Large Language Models
KW - Retrieval Augmented Generation
UR - https://www.scopus.com/pages/publications/105001669811
U2 - 10.1145/3701551.3703580
DO - 10.1145/3701551.3703580
M3 - Conference article published in proceeding or book
AN - SCOPUS:105001669811
T3 - WSDM 2025 - Proceedings of the 18th ACM International Conference on Web Search and Data Mining
SP - 271
EP - 280
BT - WSDM 2025 - Proceedings of the 18th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
Y2 - 10 March 2025 through 14 March 2025
ER -