Generative Perception of Shape and Material from Differential Motion
Generative Perception of Shape and Material from Differential Motion
Perceiving the shape and material of an object from a single image is inherently ambiguous, especially when lighting is unknown and unconstrained. Despite this, humans can often disentangle shape and material, and when they are uncertain, they often move their head slightly or rotate the object to help resolve the ambiguities. Inspired by this behavior, we introduce a novel conditional denoising-diffusion model that generates samples of shape-and-material maps from a short video of an object undergoing differential motions. Our parameter-efficient architecture allows training directly in pixel-space, and it generates many disentangled attributes of an object simultaneously. Trained on a modest number of synthetic object-motion videos with supervision on shape and material, the model exhibits compelling emergent behavior: For static observations, it produces diverse, multimodal predictions of plausible shape-and-material maps that capture the inherent ambiguities; and when objects move, the distributions quickly converge to more accurate explanations. The model also produces high-quality shape-and-material estimates for less ambiguous, real-world objects. By moving beyond single-view to continuous motion observations, our work suggests a generative perception approach for improving visual reasoning in physically-embodied systems.
Xinran Nicole Han、Ko Nishino、Todd Zickler
计算技术、计算机技术
Xinran Nicole Han,Ko Nishino,Todd Zickler.Generative Perception of Shape and Material from Differential Motion[EB/OL].(2025-06-03)[2025-06-17].https://arxiv.org/abs/2506.02473.点此复制
评论