|国家预印本平台
首页|Human-in-the-loop: Real-time Preference Optimization

Human-in-the-loop: Real-time Preference Optimization

Human-in-the-loop: Real-time Preference Optimization

来源:Arxiv_logoArxiv
英文摘要

Human-aware controllers play an important role in engineering systems for improving productivity, efficiency, and sustainability. It is essential to design such a controller that optimizes user utility while adhering to plant dynamics. While most online optimization algorithms rely on first-order or zeroth-order oracles, human feedback often appears as pairwise comparisons. In this work, we propose an online feedback optimization algorithm that leverages such preference feedback. We design a controller that estimates the gradient based on the binary pairwise comparison result between two consecutive points and study its coupled behavior with a nonlinear plant. Under mild assumptions on both the utility and the plant dynamics, we establish explicit stability criteria and quantify sub-optimality. The theoretical findings are further supported through numerical experiments.

Wenbin Wang、Wenjie Xu、Colin N. Jones

自动化基础理论

Wenbin Wang,Wenjie Xu,Colin N. Jones.Human-in-the-loop: Real-time Preference Optimization[EB/OL].(2025-06-02)[2025-06-30].https://arxiv.org/abs/2506.02225.点此复制

评论