基于深度强化学习的无人机自主导航应用研究
Research on the application of UAV autonomous navigation based on DRL
为了解决传统无人机自主导航算法存在的实现流程准确率较低、构建高精度地图消耗大量计算资源等问题,本文基于深度学习和强化学习的方法提出了一种端到端实现、无需构建地图先验的新型导航算法。首先基于Airsim和UE4平台,设计并搭建了高保真和高性能的仿真环境,用于训练无人机的自主导航任务。该仿真环境支持域随机化,支持gym接口,可以实现photo-realistic级别的信息质量。然后利用强化学习思想对无人机自主导航任务进行系统建模,通过设计状态空间和动作空间,制定奖励函数等过程设计并训练针对自主导航任务的网络模型,并对基于演员家-评论家(actor-critic)框架的PPO算法中的单个演员家适用场景简单的问题进行优化,设计了一种融合注意力机制的多演员家-单评论家PPO算法网络结构。最后通过仿真实验验证了该方法的可行性和有效性。
In order to solve the problems of traditional UAV autonomous navigation algorithms, such as low accuracy of the implementation and the consumption of extensive computational resources for constructing high-precision maps, a new navigation algorithm based on deep learning and reinforcement learning methods was proposed, with end-to-end implementation and no need to construct map a priori. Firstly, a high-fidelity and high-performance simulation environment is designed and built based on Airsim and UE4 platforms to train UAV\'s autonomous navigation tasks. It supports domain randomization and gym interface to achieve a photo-realistic level of information quality. Secondly, using reinforcement learning ideas, a network model for autonomous UAV navigation tasks is designed and trained by designing the state and action space and formulating the reward function. A multi-actor-single-critic PPO algorithm network structure incorporating an attention mechanism was designed by optimizing the problem of over-simplified scenarios for single-actor in an actor-critic framework-based PPO algorithm. Finally, simulation experiments verify the method\'s feasibility and effectiveness.
魏世民、马云鹏
无线电导航航空航天技术自动化技术、自动化技术设备
计算机应用技术无人机深度强化学习注意力机制
omputer Application TechnologyUnmanned Aerial VehicleDeep Reinforcement LearningAttention Mechanism
魏世民,马云鹏.基于深度强化学习的无人机自主导航应用研究[EB/OL].(2023-03-09)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/202303-102.点此复制
评论