|国家预印本平台
首页|METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

来源:Arxiv_logoArxiv
英文摘要

RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay. Prior work either reduces the response delay (through better scheduling of RAG queries) or strives to maximize quality (which involves tuning the RAG workflow), but they fall short in optimizing the tradeoff between the delay and quality of RAG responses. This paper presents METIS, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods, in order to balance quality optimization and response delay reduction. Using 4 popular RAG-QA datasets, we show that compared with the state-of-the-art RAG optimization schemes, METIS reduces the generation latency by $1.64-2.54\times$ without sacrificing generation quality.

Ravi Netravali、Junchen Jiang、Siddhant Ray、Rui Pan、Zhuohan Gu、Kuntai Du、Shaoting Feng、Ganesh Ananthanarayanan

计算技术、计算机技术

Ravi Netravali,Junchen Jiang,Siddhant Ray,Rui Pan,Zhuohan Gu,Kuntai Du,Shaoting Feng,Ganesh Ananthanarayanan.METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation[EB/OL].(2025-07-16)[2025-08-16].https://arxiv.org/abs/2412.10543.点此复制

评论