文献笔记 - Reinforcement Learning for UAV Attitude Control

server/2024/9/24 15:32:46/

这篇博文是自己看文章顺手做的笔记  只是简单翻译和整理 仅做个人参考学习和分享

如果作者看到觉得内容不妥请联系我 我会及时处理 

本人非文章作者,文献的引用格式如下,原文更有价值

Koch W, Mancuso R, West R, et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 1-21.

摘要——

自动驾驶系统通常包括一个提供稳定性和控制的“内环”,和比如航点导航之类的任务层面的“外环”。无人机的自驾系统主要使用PID控制系统,在稳定的环境下还算好用。但是,更难预测的和复杂环境下需要更复杂的控制器。智能飞行控制系统是一个比较活跃的area,利用强化学习(RL)来解决PID解决不了的问题,在其他领域比如机器人领域取得了不错的进展。但是以前的工作集中在使用RL在任务层面的控制器。本文中,我们探索了使用目前的RL训练方法来实现内环控制,使用DDGP,TRPO和PPO。为了探索这些未知,我们首先开发了一个开源的高可信度的仿真环境来通过RL训练一个四旋翼的姿态控制器。然后使用我们的环境来和PID控制器对比RL是不是更快速,和高精度。

结论——

i)RL可以训练准确的姿态控制器

ii)PPO训练得到的控制器比一个调好的PID控制器几乎每个衡量标准下都更好

尽管在episodic task里面训练的,但是在没训练过的任务中也很好。

表明了使用片段式训练足够用于开发智能姿态控制

I. INTRODUCTION

Using RL it is possible to develop optimal control policies for a UAV without making any assumptions about the aircraft dynamics. Recent work has shown RL to be effective for UAV autopilots, providing adequate path tracking [8].

II. BACKGROUND

A. Quadcopter Flight Dynamics
B. Reinforcement Learning

III. RELATED WORK

However these solutions still inherit disadvantages associated with PID control, such as integral windup, need for mixing, and most significantly, they are feedback controllers and therefore inherently reactive. On the other hand feedforward control (or predictive control) is proactive, and allows the controller to output control signals before an error occur. For feedforward control, a model of the system must exist. Learning-based intelligent control has been proposed to develop models of the aircraft for predictive control using artificial neural networks.

Online learning is an essential component to constructing a complete intelligent flight control system. It is fundamental however to develop accurate offline models to account for uncertainties encountered during online learning [2].

Known as the reality gap, transferring from simulation to the real-world has been researched extensively as being problematic without taking additional steps to increase realism in the simulator [26], [3]

IV. ENVIRONMENT

In this section we describe our learning environment GYM FC for developing intelligent flight control systems using RL. The goal of proposed environment is to allow the agent to learn attitude control of an aircraft with only the knowledge of the number of actuators.

GYM FC has a multi-layer hierarchical architecture composed of three layers: (i) a digital twin layer, (ii) a communication layer, and (iii) an agent-environment interface layer.

A. Digital Twin Layer

At the heart of the learning environment is a high fidelity physics simulator which provides functionality and realism that is hard to achieve with an abstract mathematical model of the aircraft and environment.

For this reason, the simulated environment exposes identical interfaces to actuators and sensors as they would exist in the physical world.

B. Communication Layer

The communication layer is positioned in between the digital twin and the agent-environment interface.

C. Environment Interface Layer

The topmost layer interfacing with the agent is the environment interface layer which implements the OpenAI Gym [10]

Each OpenAI Gym environment defines an observation space and an action space.

Reward engineering can be challenging.For this work, with the goal of establishing a baseline of accuracy, we develop a reward to reflect the current angular velocity error (i.e. e = Ω∗ − Ω).

We translate the current error et at time t into into a derived reward rt normalized between [−1, 0] as follows,

Rewards are normalized to provide standardization and stabilization during training [30].

此外,我们还尝试了各种其他奖励。我们发现稀疏二进制奖励1的性能较差。我们认为这是由于四轴飞行器控制的复杂性造成的。在学习的早期阶段,代理探索其环境。然而,在某个阈值内随机达到目标角速度的事件很少见,因此没有为代理提供足够的信息来收敛。
相反,我们发现每个时间步的信号最好。我们还尝试使用误差的欧几里德范数、二次误差和其他标量值,所有这些都没有提供接近绝对误差之和的性能(方程7)。

V. EVALUATION

In this section we present our evaluation on the accuracy of studied neural-network-based attitude flight controllers trained with RL.

To our knowledge, this is the first RL baseline conducted for quadcopter attitude control.

A. Setup

We evaluate the RL algorithms DDGP, TRPO, and PPO using the implementations in the OpenAI Baselines project [3]. The goal of the OpenAI Baselines project is to establish a reference implementation of RL algorithms, providing baselines for researchers to compare approaches and build upon.

Training and evaluations were run on Ubuntu 16.04 with an eight-core i7- 7700 CPU and an NVIDIA GeForce GT 730 graphics card.

B. Results

局限性分析:模型的准确性(包括气动效应)没有说明   没有用于实际飞行   只有角速度环控制 


http://www.ppmy.cn/server/121402.html

相关文章

SpringBoot+Aop+注解方式 实现多数据源动态切换

整体思路: 引入基本依赖SpringBootAopMySqlMyBatislombok在配置文件中配置多个数据源创建数据源配置类用于读取配置编写用于标识切换数据源的注解创建数据源切换工具类DataSourceContextHolder编写切面类用于在注解生效处切换数据源编写配置类,加载数据…

【截稿更新 | 11月杭州 | EI稳定 】

【截稿更新 | 11月杭州 | EI稳定 】2024年人机交互与虚拟现实国际会议(HCIVR 2024) ✅会议时间:2024年11月15-17日 ✅会议地点:中国杭州 🔥二轮截稿日期:2024年10月15日 🌈投稿通道已开启&#…

在一个.NET Core项目中使用RabbitMQ进行即时消息管理

为了在一个.NET Core项目中使用RabbitMQ进行即时消息管理,以下是详细的全程操作指南,包括安装、配置、编写代码和调试使用。 一、安装RabbitMQ 1. 安装Erlang RabbitMQ依赖Erlang,因此需要先安装Erlang。 Windows: 下载并运行Erlang安装…

切换笔记本键盘的启用与禁用状态

使用批处理脚本切换笔记本键盘的启用与禁用状态 背景描述详细步骤及代码解释1. 在管理员模式下运行脚本2. 脚本内容3. 解释 背景描述 在笔记本电脑中,在外接键盘的时候,有时我们希望禁用内置键盘,防止意外按键。Windows 系统中,键…

爬虫的流程

爬虫的流程 获取网页提取信息保存数据自动化程序能爬怎样的数据 获取网页 获取网页就是获取网页的源代码,源代码里包含了网页的部分有用信息,所以只要把源代码获取下来,就可以从中提取想要的信息浏览器访问网页的本质:浏览器向服…

微服务--Gateway网关

在微服务架构中,Gateway(网关)是一个至关重要的组件,它扮演着多种关键角色,包括路由、负载均衡、安全控制、监控和日志记录等。 Gateway网关的作用 统一访问入口: Gateway作为微服务的统一入口&#xff0c…

鸿蒙HarmonyOS开发:一次开发,多端部署(界面级)天气应用案例

文章目录 一、布局简介二、典型布局场景三、侧边栏 SideBarContainer1、子组件2、属性3、事件 四、案例 天气应用1、UX设计2、实现分析3、主页整体实现4、具体代码 五、运行效果 一、布局简介 布局可以分为自适应布局和响应式布局,二者的介绍如下表所示。 名称简介…

【C++掌中宝】深入理解函数重载:概念、规则与应用

文章目录 引言1. 什么是函数重载?2. 为什么需要函数重载?3. 编译器如何解决命名冲突?4. 为什么返回类型不参与重载?5. 重载函数的调用匹配规则6. 编译器如何解析重载函数的调用?7. 重载的限制与注意事项8. 总结结语 引…