|国家预印本平台
首页|Audio-Visual Contact Classification for Tree Structures in Agriculture

Audio-Visual Contact Classification for Tree Structures in Agriculture

Audio-Visual Contact Classification for Tree Structures in Agriculture

来源:Arxiv_logoArxiv
英文摘要

Contact-rich manipulation tasks in agriculture, such as pruning and harvesting, require robots to physically interact with tree structures to maneuver through cluttered foliage. Identifying whether the robot is contacting rigid or soft materials is critical for the downstream manipulation policy to be safe, yet vision alone is often insufficient due to occlusion and limited viewpoints in this unstructured environment. To address this, we propose a multi-modal classification framework that fuses vibrotactile (audio) and visual inputs to identify the contact class: leaf, twig, trunk, or ambient. Our key insight is that contact-induced vibrations carry material-specific signals, making audio effective for detecting contact events and distinguishing material types, while visual features add complementary semantic cues that support more fine-grained classification. We collect training data using a hand-held sensor probe and demonstrate zero-shot generalization to a robot-mounted probe embodiment, achieving an F1 score of 0.82. These results underscore the potential of audio-visual learning for manipulation in unstructured, contact-rich environments.

Ryan Spears、Moonyoung Lee、George Kantor、Oliver Kroemer

农业科学技术发展自动化技术、自动化技术设备

Ryan Spears,Moonyoung Lee,George Kantor,Oliver Kroemer.Audio-Visual Contact Classification for Tree Structures in Agriculture[EB/OL].(2025-05-18)[2025-06-05].https://arxiv.org/abs/2505.12665.点此复制

评论