Think Before You Attribute: Improving the Performance of LLMs Attribution Systems
Think Before You Attribute: Improving the Performance of LLMs Attribution Systems
Large Language Models (LLMs) are increasingly applied in various science domains, yet their broader adoption remains constrained by a critical challenge: the lack of trustworthy, verifiable outputs. Current LLMs often generate answers without reliable source attribution, or worse, with incorrect attributions, posing a barrier to their use in scientific and high-stakes settings, where traceability and accountability are non-negotiable. To be reliable, attribution systems need high accuracy and retrieve data with short lengths, i.e., attribute to a sentence within a document rather than a whole document. We propose a sentence-level pre-attribution step for Retrieve-Augmented Generation (RAG) systems that classify sentences into three categories: not attributable, attributable to a single quote, and attributable to multiple quotes. By separating sentences before attribution, a proper attribution method can be selected for the type of sentence, or the attribution can be skipped altogether. Our results indicate that classifiers are well-suited for this task. In this work, we propose a pre-attribution step to reduce the computational complexity of attribution, provide a clean version of the HAGRID dataset, and provide an end-to-end attribution system that works out of the box.
Jo?o Eduardo Batista、Emil Vatai、Mohamed Wahib
计算技术、计算机技术
Jo?o Eduardo Batista,Emil Vatai,Mohamed Wahib.Think Before You Attribute: Improving the Performance of LLMs Attribution Systems[EB/OL].(2025-05-18)[2025-06-07].https://arxiv.org/abs/2505.12621.点此复制
评论