首页|Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

来源：

英文摘要

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant challenge, particularly when they are conflict. To address this issue, we frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives. We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions. GAPO adaptively rescales the gradients for each objective to determine an update direction that optimally balances the trade-offs between objectives. Additionally, we introduce P-GAPO, which incorporates user preferences across different objectives and achieves Pareto solutions that better align with the user's specific needs. Our theoretical analysis demonstrates that GAPO converges towards a Pareto optimal solution for multiple objectives. Empirical results on Mistral-7B show that GAPO outperforms current state-of-the-art methods, achieving superior performance in both helpfulness and harmlessness.

作者：Chengao Li、Hanyu Zhang、Yunkun Xu、Hongyan Xue、Xiang Ao、Qing He

作者单位：

学科分类：计算技术、计算机技术科学、科学研究

推荐引用：Chengao Li,Hanyu Zhang,Yunkun Xu,Hongyan Xue,Xiang Ao,Qing He.Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models[EB/OL].(2025-07-02)[2025-07-16].https://arxiv.org/abs/2507.01915.点此复制

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

评论