首页|Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models

Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models

来源：

英文摘要

With the rise of long-context language models (LMs) capable of processing tens of thousands of tokens in a single pass, do multi-stage retrieval-augmented generation (RAG) pipelines still offer measurable benefits over simpler, single-stage approaches? To assess this question, we conduct a controlled evaluation for QA tasks under systematically scaled token budgets, comparing two recent multi-stage pipelines, ReadAgent and RAPTOR, against three baselines, including DOS RAG (Document's Original Structure RAG), a simple retrieve-then-read method that preserves original passage order. Despite its straightforward design, DOS RAG consistently matches or outperforms more intricate methods on multiple long-context QA benchmarks. We recommend establishing DOS RAG as a simple yet strong baseline for future RAG evaluations, pairing it with emerging embedding and language models to assess trade-offs between complexity and effectiveness as model capabilities evolve.

作者：Alex Laitenberger、Christopher D. Manning、Nelson F. Liu

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Alex Laitenberger,Christopher D. Manning,Nelson F. Liu.Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models[EB/OL].(2025-06-04)[2025-07-16].https://arxiv.org/abs/2506.03989.点此复制

Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models

Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models

评论