|国家预印本平台
首页|Structured Packing in LLM Training Improves Long Context Utilization

Structured Packing in LLM Training Improves Long Context Utilization

Structured Packing in LLM Training Improves Long Context Utilization

来源:Arxiv_logoArxiv
英文摘要

Recent advancements in long-context large language models have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. This study investigates structuring training data to enhance semantic interdependence, demonstrating that this approach effectively improves context utilization. To this end, we introduce the Structured Packing for Long Context (SPLiCe) method, which utilizes retrieval to collate mutually relevant documents into long and coherent training examples. We validate SPLiCe empirically across models of varying sizes -- 3B, 7B, and 13B -- achieving improved performance in long-context tasks, such as Qasper and HotpotQA. Remarkably, even brief fine-tuning with SPLiCe is sufficient to realize these benefits. Additionally, SPLiCe effectively mitigates the lost-in-middle phenomenon often observed in large models. Our comprehensive analysis of SPLiCe explores its design choices and reveals intriguing transfer effects; for instance, training on programming code enhances performance on natural language tasks.

Szymon Tworkowski、Piotr Mi?o?、Henryk Michalewski、Konrad Staniszewski、Yu Zhao、Sebastian Jaszczur、?ukasz Kuci¨?ski

计算技术、计算机技术

Szymon Tworkowski,Piotr Mi?o?,Henryk Michalewski,Konrad Staniszewski,Yu Zhao,Sebastian Jaszczur,?ukasz Kuci¨?ski.Structured Packing in LLM Training Improves Long Context Utilization[EB/OL].(2023-12-28)[2025-05-29].https://arxiv.org/abs/2312.17296.点此复制

评论