|国家预印本平台
首页|Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

来源:Arxiv_logoArxiv
英文摘要

We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp." LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, LOCATE 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices. Key to our approach is 3D-JEPA, a novel self-supervised learning (SSL) algorithm applicable to sensor point clouds. It takes as input a 3D pointcloud featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features. Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly predict 3D masks and bounding boxes. Additionally, we introduce LOCATE 3D DATASET, a new dataset for 3D referential grounding, spanning multiple capture setups with over 130K annotations. This enables a systematic study of generalization capabilities as well as a stronger model.

Ang Cao、Mikael Henaff、Ayush Jain、Ishita Prasad、Mrinal Kalakrishnan、Michael Rabbat、Nicolas Ballas、Mido Assran、Oleksandr Maksymets、Sergio Arnaud、Paul McVay、Ada Martin、Arjun Majumdar、Krishna Murthy Jatavallabhula、Phillip Thomas、Ruslan Partsey、Daniel Dugas、Abha Gejji、Alexander Sax、Vincent-Pierre Berges、Aravind Rajeswaran、Franziska Meier

计算技术、计算机技术

Ang Cao,Mikael Henaff,Ayush Jain,Ishita Prasad,Mrinal Kalakrishnan,Michael Rabbat,Nicolas Ballas,Mido Assran,Oleksandr Maksymets,Sergio Arnaud,Paul McVay,Ada Martin,Arjun Majumdar,Krishna Murthy Jatavallabhula,Phillip Thomas,Ruslan Partsey,Daniel Dugas,Abha Gejji,Alexander Sax,Vincent-Pierre Berges,Aravind Rajeswaran,Franziska Meier.Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D[EB/OL].(2025-04-18)[2025-05-23].https://arxiv.org/abs/2504.14151.点此复制

评论