Teaching Physical Awareness to LLMs through Sounds
Teaching Physical Awareness to LLMs through Sounds
Large Language Models (LLMs) have shown remarkable capabilities in text and multimodal processing, yet they fundamentally lack physical awareness--understanding of real-world physical phenomena. In this work, we present ACORN, a framework that teaches LLMs physical awareness through sound, focusing on fundamental physical phenomena like the Doppler effect, multipath effect, and spatial relationships. To overcome data scarcity, ACORN introduce a physics-based simulator combining real-world sound sources with controlled physical channels to generate diverse training data. Using this simulator, we build AQA-PHY, a comprehensive Audio Question-Answer dataset, and propose an audio encoder that processes both magnitude and phase information. By connecting our audio encoder to state-of-the-art LLMs, we demonstrate reasonable results in both simulated and real-world tasks, such as line-of-sight detection, Doppler effect estimation, and Direction-of-Arrival estimation, paving the way for enabling LLMs to understand physical world.
Weiguo Wang、Andy Nie、Wenrui Zhou、Yi Kai、Chengchen Hu
电子技术应用
Weiguo Wang,Andy Nie,Wenrui Zhou,Yi Kai,Chengchen Hu.Teaching Physical Awareness to LLMs through Sounds[EB/OL].(2025-06-10)[2025-06-23].https://arxiv.org/abs/2506.08524.点此复制
评论