Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype
This paper presents a concise review of Contextual Multi-Armed Bandit (CMAB) methods and introduces an experimental framework for scalable, interpretable offer selection, addressing the challenge of fast-changing offers. The approach models context at the product category level, allowing offers to span multiple categories and enabling knowledge transfer across similar offers. This improves learning efficiency and generalization in dynamic environments. The framework extends standard CMAB methodology to support multi-category contexts, and achieves scalability through efficient feature engineering and modular design. Advanced features such as MPG (Member Purchase Gap) and MF (Matrix Factorization) capture nuanced user-offer interactions, with implementation in Python for practical deployment. A key contribution is interpretability at scale: logistic regression models yield transparent weight vectors, accessible via a large language model (LLM) interface for real-time, user-level tracking and explanation of evolving preferences. This enables the generation of detailed member profiles and identification of behavioral patterns, supporting personalized offer optimization and enhancing trust in automated decisions. By situating our prototype alongside established paradigms like Generalized Linear Models and Thompson Sampling, we demonstrate its value for both research and real-world CMAB applications.
Nikola Tankovic、Robert Sajina
计算技术、计算机技术自动化基础理论
Nikola Tankovic,Robert Sajina.Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype[EB/OL].(2025-05-22)[2025-06-17].https://arxiv.org/abs/2505.16918.点此复制
评论