|国家预印本平台
首页|Teaching Physical Awareness to LLMs through Sounds

Teaching Physical Awareness to LLMs through Sounds

Teaching Physical Awareness to LLMs through Sounds

来源:Arxiv_logoArxiv
英文摘要

Large Language Models (LLMs) have shown remarkable capabilities in text and multimodal processing, yet they fundamentally lack physical awareness--understanding of real-world physical phenomena. In this work, we present ACORN, a framework that teaches LLMs physical awareness through sound, focusing on fundamental physical phenomena like the Doppler effect, multipath effect, and spatial relationships. To overcome data scarcity, ACORN introduce a physics-based simulator combining real-world sound sources with controlled physical channels to generate diverse training data. Using this simulator, we build AQA-PHY, a comprehensive Audio Question-Answer dataset, and propose an audio encoder that processes both magnitude and phase information. By connecting our audio encoder to state-of-the-art LLMs, we demonstrate reasonable results in both simulated and real-world tasks, such as line-of-sight detection, Doppler effect estimation, and Direction-of-Arrival estimation, paving the way for enabling LLMs to understand physical world.

Weiguo Wang、Andy Nie、Wenrui Zhou、Yi Kai、Chengchen Hu

电子技术应用

Weiguo Wang,Andy Nie,Wenrui Zhou,Yi Kai,Chengchen Hu.Teaching Physical Awareness to LLMs through Sounds[EB/OL].(2025-06-10)[2025-06-23].https://arxiv.org/abs/2506.08524.点此复制

评论