首页|Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Domain-Conditioned Scene Graphs for State-Grounded Task Planning

来源：

英文摘要

Recent robotic task planning frameworks have integrated large multimodal models (LMMs) such as GPT-4V. To address grounding issues of such models, it has been suggested to split the pipeline into perceptional state grounding and subsequent state-based planning. As we show in this work, the state grounding ability of LMM-based approaches is still limited by weaknesses in granular, structured, domain-specific scene understanding. To address this shortcoming, we develop a more structured state grounding framework that features a domain-conditioned scene graph as its scene representation. We show that such representation is actionable in nature as it is directly mappable to a symbolic state in classical planning languages such as PDDL. We provide an instantiation of our state grounding framework where the domain-conditioned scene graph generation is implemented with a lightweight vision-language approach that classifies domain-specific predicates on top of domain-relevant object detections. Evaluated across three domains, our approach achieves significantly higher state estimation accuracy and task planning success rates compared to the previous LMM-based approaches.

作者：Jonas Herzog、Jiangpin Liu、Yue Wang

作者单位：

学科分类：自动化技术、自动化技术设备计算技术、计算机技术

推荐引用：Jonas Herzog,Jiangpin Liu,Yue Wang.Domain-Conditioned Scene Graphs for State-Grounded Task Planning[EB/OL].(2025-04-09)[2025-06-27].https://arxiv.org/abs/2504.06661.点此复制

Domain-Conditioned Scene Graphs for State-Grounded Task Planning

Domain-Conditioned Scene Graphs for State-Grounded Task Planning

评论