首页|Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations

Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations

来源：

英文摘要

The application of deep reinforcement learning algorithms to economic battery dispatch problems has significantly increased recently. However, optimizing battery dispatch over long horizons can be challenging due to delayed rewards. In our experiments we observe poor performance of popular actor-critic algorithms when trained on yearly episodes with hourly resolution. To address this, we propose an approach extending soft actor-critic (SAC) with learning from demonstrations. The special feature of our approach is that, due to the absence of expert demonstrations, the demonstration data is generated through simple, rule-based policies. We conduct a case study on a grid-connected microgrid and use if-then-else statements based on the wholesale price of electricity to collect demonstrations. These are stored in a separate replay buffer and sampled with linearly decaying probability along with the agent's own experiences. Despite these minimal modifications and the imperfections in the demonstration data, the results show a drastic performance improvement regarding both sample efficiency and final rewards. We further show that the proposed method reliably outperforms the demonstrator and is robust to the choice of rule, as long as the rule is sufficient to guide early training into the right direction.

作者：Manuel Sage、Martin Staniszewski、Yaoyao Fiona Zhao

作者单位：

DOI：10.1109/ICCAD57653.2023.10152299

学科分类：自动化技术经济自动化技术、自动化技术设备发电、发电厂输配电工程

推荐引用：Manuel Sage,Martin Staniszewski,Yaoyao Fiona Zhao.Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations[EB/OL].(2025-04-05)[2025-05-14].https://arxiv.org/abs/2504.04326.点此复制

Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations

Economic Battery Storage Dispatch with Deep Reinforcement Learning from Rule-Based Demonstrations

评论