Asymmetric Graph Error Control with Low Complexity in Causal Bandits
Asymmetric Graph Error Control with Low Complexity in Causal Bandits
In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning sub-graphs. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.
Chen Peng、Di Zhang、Urbashi Mitra
计算技术、计算机技术自动化基础理论
Chen Peng,Di Zhang,Urbashi Mitra.Asymmetric Graph Error Control with Low Complexity in Causal Bandits[EB/OL].(2025-06-26)[2025-07-16].https://arxiv.org/abs/2408.11240.点此复制
评论