|国家预印本平台
首页|Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

来源:Arxiv_logoArxiv
英文摘要

Bayesian optimization (BO) has become a powerful tool for solving simulation-based engineering optimization problems thanks to its ability to integrate physical and mathematical understandings, consider uncertainty, and address the exploitation-exploration dilemma. Thompson sampling (TS) is a preferred solution for BO to handle the exploitation-exploration trade-off. While it prioritizes exploration by generating and minimizing random sample paths from probabilistic models -- a fundamental ingredient of BO -- TS weakly manages exploitation by gathering information about the true objective function after it obtains new observations. In this work, we improve the exploitation of TS by incorporating the $\varepsilon$-greedy policy, a well-established selection strategy in reinforcement learning. We first delineate two extremes of TS, namely the generic TS and the sample-average TS. The former promotes exploration, while the latter favors exploitation. We then adopt the $\varepsilon$-greedy policy to randomly switch between these two extremes. Small and large values of $\varepsilon$ govern exploitation and exploration, respectively. By minimizing two benchmark functions and solving an inverse problem of a steel cantilever beam, we empirically show that $\varepsilon$-greedy TS equipped with an appropriate $\varepsilon$ is more robust than its two extremes, matching or outperforming the better of the generic TS and the sample-average TS.

Bach Do、Taiwo Adebiyi、Ruda Zhang

10.1115/1.4066858

计算技术、计算机技术工程基础科学

Bach Do,Taiwo Adebiyi,Ruda Zhang.Epsilon-Greedy Thompson Sampling to Bayesian Optimization[EB/OL].(2024-03-01)[2025-07-16].https://arxiv.org/abs/2403.00540.点此复制

评论