|国家预印本平台
首页|Safe Multi-Agent Reinforcement Learning via Shielding

Safe Multi-Agent Reinforcement Learning via Shielding

Safe Multi-Agent Reinforcement Learning via Shielding

来源:Arxiv_logoArxiv
英文摘要

Multi-agent reinforcement learning (MARL) has been increasingly used in a wide range of safety-critical applications, which require guaranteed safety (e.g., no unsafe states are ever visited) during the learning process.Unfortunately, current MARL methods do not have safety guarantees. Therefore, we present two shielding approaches for safe MARL. In centralized shielding, we synthesize a single shield to monitor all agents' joint actions and correct any unsafe action if necessary. In factored shielding, we synthesize multiple shields based on a factorization of the joint state space observed by all agents; the set of shields monitors agents concurrently and each shield is only responsible for a subset of agents at each step.Experimental results show that both approaches can guarantee the safety of agents during learning without compromising the quality of learned policies; moreover, factored shielding is more scalable in the number of agents than centralized shielding.

Lu Feng、Ingy Elsayed-Aly、Christopher Amato、Suda Bharadwaj、Ufuk Topcu、R¨1diger Ehlers

计算技术、计算机技术

Lu Feng,Ingy Elsayed-Aly,Christopher Amato,Suda Bharadwaj,Ufuk Topcu,R¨1diger Ehlers.Safe Multi-Agent Reinforcement Learning via Shielding[EB/OL].(2021-01-26)[2025-08-02].https://arxiv.org/abs/2101.11196.点此复制

评论