A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization
Recommender systems in online marketplaces face the challenge of balancing multiple objectives to satisfy various stakeholders, including customers, providers, and the platform itself. This paper introduces Juggler-MAB, a hybrid approach that combines meta-learning with Multi-Armed Bandits (MAB) to address the limitations of existing multi-stakeholder recommendation systems. Our method extends the Juggler framework, which uses meta-learning to predict optimal weights for utility and compensation adjustments, by incorporating a MAB component for real-time, context-specific refinements. We present a two-stage approach where Juggler provides initial weight predictions, followed by MAB-based adjustments that adapt to rapid changes in user behavior and market conditions. Our system leverages contextual features such as device type and brand to make fine-grained weight adjustments based on specific segments. To evaluate our approach, we developed a simulation framework using a dataset of 0.6 million searches from Expedia's lodging booking platform. Results show that Juggler-MAB outperforms the original Juggler model across all metrics, with NDCG improvements of 2.9%, a 13.7% reduction in regret, and a 9.8% improvement in best arm selection rate.
Tiago Cunha、Andrea Marchini
计算技术、计算机技术
Tiago Cunha,Andrea Marchini.A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization[EB/OL].(2024-09-13)[2025-08-02].https://arxiv.org/abs/2409.08752.点此复制
评论