CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
Open-vocabulary 3D scene understanding is crucial for applications requiring natural language-driven spatial interpretation, such as robotics and augmented reality. While 3D Gaussian Splatting (3DGS) offers a powerful representation for scene reconstruction, integrating it with open-vocabulary frameworks reveals a key challenge: cross-view granularity inconsistency. This issue, stemming from 2D segmentation methods like SAM, results in inconsistent object segmentations across views (e.g., a "coffee set" segmented as a single entity in one view but as "cup + coffee + spoon" in another). Existing 3DGS-based methods often rely on isolated per-Gaussian feature learning, neglecting the spatial context needed for cohesive object reasoning, leading to fragmented representations. We propose Context-Aware Gaussian Splatting (CAGS), a novel framework that addresses this challenge by incorporating spatial context into 3DGS. CAGS constructs local graphs to propagate contextual features across Gaussians, reducing noise from inconsistent granularity, employs mask-centric contrastive learning to smooth SAM-derived features across views, and leverages a precomputation strategy to reduce computational cost by precomputing neighborhood relationships, enabling efficient training in large-scale scenes. By integrating spatial context, CAGS significantly improves 3D instance segmentation and reduces fragmentation errors on datasets like LERF-OVS and ScanNet, enabling robust language-guided 3D scene understanding.
Wei Sun、Yanzhao Zhou、Jianbin Jiao、Yuan Li
计算技术、计算机技术
Wei Sun,Yanzhao Zhou,Jianbin Jiao,Yuan Li.CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting[EB/OL].(2025-04-16)[2025-04-26].https://arxiv.org/abs/2504.11893.点此复制
评论