首页|Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

来源：

英文摘要

Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only logged bandit data from previous versions. To address the high variance issues associated with existing estimators, we introduce two new assumptions: no direct effect on rankings and user behavior model on ranking embedding spaces. We then propose the generalized marginalized inverse propensity score (GMIPS) estimator with statistically desirable properties compared to existing ones. Finally, we demonstrate that the GMIPS achieves the lowest MSE. Notably, among GMIPS variants, the marginalized reward interaction IPS (MRIPS) incorporates a doubly marginalized importance weight based on a cascade behavior assumption on ranking embeddings. MRIPS effectively balances the trade-off between bias and variance, even as the ranking action spaces increase and the above assumptions may not hold, as evidenced by our experiments.

作者：Tatsuki Takahashi、Chihiro Maru、Hiroko Shoji

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Tatsuki Takahashi,Chihiro Maru,Hiroko Shoji.Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling[EB/OL].(2025-05-31)[2025-06-30].https://arxiv.org/abs/2506.00446.点此复制

Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

评论