RLSF: Fine-tuning LLMs via Symbolic Feedback
RLSF: Fine-tuning LLMs via Symbolic Feedback
Large Language Models (LLMs) have transformed AI but often struggle with tasks that require domain-specific reasoning and logical alignment. Traditional fine-tuning methods do not leverage the vast amount of symbolic domain-knowledge available to us via symbolic reasoning tools (e.g., provers), and are further limited by sparse rewards and unreliable reward models. We introduce Reinforcement Learning via Symbolic Feedback (RLSF), a novel fine-tuning paradigm where symbolic reasoning tools (e.g., solvers, provers, and algebra systems) provide fine-grained feedback to LLMs. RLSF uses poly-sized certificates (e.g., proofs) generated by symbolic tools to identify and correct errors in model outputs, offering token-level guidance without requiring differentiable reasoning systems. This paradigm bridges the gap between symbolic reasoning and LLM fine-tuning, enabling precise alignment with domain-specific constraints while addressing key limitations of traditional reward signals. Via extensive evaluations, we show that our RLSF-based fine-tuning of LLMs outperforms traditional approaches on five different applications (that have some associated logical or domain constraints), namely, program synthesis from natural language pseudo-code to programming language, three chemistry tasks, and solving the Game of 24. A key takeaway is that fine-tuning via RLSF enables relatively smaller LLMs to significantly outperform closed-source models that are orders of magnitude larger.
Piyush Jha、Prithwish Jana、Pranavkrishna Suresh、Arnav Arora、Vijay Ganesh
计算技术、计算机技术
Piyush Jha,Prithwish Jana,Pranavkrishna Suresh,Arnav Arora,Vijay Ganesh.RLSF: Fine-tuning LLMs via Symbolic Feedback[EB/OL].(2025-06-27)[2025-08-02].https://arxiv.org/abs/2405.16661.点此复制
评论