|国家预印本平台
首页|OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning

OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning

OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning

来源:Arxiv_logoArxiv
英文摘要

Humans possess the cognitive ability to comprehend scenes in a compositional manner. To empower AI systems with similar capabilities, object-centric learning aims to acquire representations of individual objects from visual scenes without any supervision. Although recent advances in object-centric learning have made remarkable progress on complex synthesis datasets, there is a huge challenge for application to complex real-world scenes. One of the essential reasons is the scarcity of real-world datasets specifically tailored to object-centric learning. To address this problem, we propose a versatile real-world dataset of tabletop scenes for object-centric learning called OCTScenes, which is meticulously designed to serve as a benchmark for comparing, evaluating, and analyzing object-centric learning methods. OCTScenes contains 5000 tabletop scenes with a total of 15 objects. Each scene is captured in 60 frames covering a 360-degree perspective. Consequently, OCTScenes is a versatile benchmark dataset that can simultaneously satisfy the evaluation of object-centric learning methods based on single-image, video, and multi-view. Extensive experiments of representative object-centric learning methods are conducted on OCTScenes. The results demonstrate the shortcomings of state-of-the-art methods for learning meaningful representations from real-world data, despite their impressive performance on complex synthesis datasets. Furthermore, OCTScenes can serve as a catalyst for the advancement of existing methods, inspiring them to adapt to real-world scenes. Dataset and code are available at https://huggingface.co/datasets/Yinxuan/OCTScenes.

Bin Li、Tonglin Chen、Yinxuan Huang、Zhimeng Shen、Jinghao Huang、Xiangyang Xue

计算技术、计算机技术

Bin Li,Tonglin Chen,Yinxuan Huang,Zhimeng Shen,Jinghao Huang,Xiangyang Xue.OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning[EB/OL].(2023-06-16)[2025-07-21].https://arxiv.org/abs/2306.09682.点此复制

评论