Abstract
As the training of large language models (LLMs) will encounter high computational costs, massive works are now focusing on inference. Their methods can be generally summarised as re-sampling the target multiple times and performing a vote upon the outputs. Despite bringing significant performance improvements, it is a high-cost method that requires multiple sampling with the preset size. In this paper, we propose a simple yet efficient inference strategies named __Hybrid Sampling__ that combining both multiple and single sampling to greatly reduce the cost of multiple sampling without sacrificing performance. __Hybrid Sampling__ could dynamically choose the essential part of generated sequence for multiple sampling and proceed the rest with single sampling, achieving a performance-cost balance. Extensive experiments in several benchmarks underscore the robustness and effectiveness of our proposed Hybrid Sampling and more importantly, it is much faster.
Original language | English |
---|---|
Title of host publication | Proceedings of the Conference Findings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) |
Editors | Luis Chiruzzo, Alan Ritter, Lu Wang |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4199-4210 |
ISBN (Electronic) | 9798891761957 |
DOIs | |
Publication status | Published - Apr 2025 |
Event | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics - Albuquerque Convention Center, Albuquerque, United States Duration: 29 Apr 2025 → 4 May 2025 https://2025.naacl.org/ |
Conference
Conference | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics |
---|---|
Abbreviated title | NAACL 2025 |
Country/Territory | United States |
City | Albuquerque |
Period | 29/04/25 → 4/05/25 |
Internet address |