首页|Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

来源：

英文摘要

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

作者：Steven Stenberg Hansen

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Steven Stenberg Hansen.Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning[EB/OL].(2017-05-09)[2025-08-02].https://arxiv.org/abs/1705.03562.点此复制

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

评论