|国家预印本平台
首页|Pushing the boundary on Natural Language Inference

Pushing the boundary on Natural Language Inference

Pushing the boundary on Natural Language Inference

来源:Arxiv_logoArxiv
英文摘要

Natural Language Inference (NLI) is a central task in natural language understanding with applications in fact-checking, question answering, and information retrieval. Despite its importance, current NLI systems heavily rely on supervised learning with datasets that often contain annotation artifacts and biases, limiting generalization and real-world applicability. In this work, we apply a reinforcement learning-based approach using Group Relative Policy Optimization (GRPO) for Chain-of-Thought (CoT) learning in NLI, eliminating the need for labeled rationales and enabling this type of training on more challenging datasets such as ANLI. We fine-tune 7B, 14B, and 32B language models using parameter-efficient techniques (LoRA and QLoRA), demonstrating strong performance across standard and adversarial NLI benchmarks. Our 32B AWQ-quantized model surpasses state-of-the-art results on 7 out of 11 adversarial sets$\unicode{x2013}$or on all of them considering our replication$\unicode{x2013}$within a 22GB memory footprint, showing that robust reasoning can be retained under aggressive quantization. This work provides a scalable and practical framework for building robust NLI systems without sacrificing inference quality.

Pablo Miralles-González、Javier Huertas-Tato、Alejandro Martín、David Camacho

计算技术、计算机技术

Pablo Miralles-González,Javier Huertas-Tato,Alejandro Martín,David Camacho.Pushing the boundary on Natural Language Inference[EB/OL].(2025-04-25)[2025-07-17].https://arxiv.org/abs/2504.18376.点此复制

评论