文献笔记 - Reinforcement Learning for UAV Attitude Control

这篇博文是自己看文章顺手做的笔记  只是简单翻译和整理 仅做个人参考学习和分享

如果作者看到觉得内容不妥请联系我 我会及时处理 


Koch W, Mancuso R, West R, et al. Reinforcement learning for UAV attitude control[J]. ACM Transactions on Cyber-Physical Systems, 2019, 3(2): 1-21.






尽管在episodic task里面训练的,但是在没训练过的任务中也很好。



Using RL it is possible to develop optimal control policies for a UAV without making any assumptions about the aircraft dynamics. Recent work has shown RL to be effective for UAV autopilots, providing adequate path tracking [8].


A. Quadcopter Flight Dynamics
B. Reinforcement Learning


However these solutions still inherit disadvantages associated with PID control, such as integral windup, need for mixing, and most significantly, they are feedback controllers and therefore inherently reactive. On the other hand feedforward control (or predictive control) is proactive, and allows the controller to output control signals before an error occur. For feedforward control, a model of the system must exist. Learning-based intelligent control has been proposed to develop models of the aircraft for predictive control using artificial neural networks.

Online learning is an essential component to constructing a complete intelligent flight control system. It is fundamental however to develop accurate offline models to account for uncertainties encountered during online learning [2].

Known as the reality gap, transferring from simulation to the real-world has been researched extensively as being problematic without taking additional steps to increase realism in the simulator [26], [3]


In this section we describe our learning environment GYM FC for developing intelligent flight control systems using RL. The goal of proposed environment is to allow the agent to learn attitude control of an aircraft with only the knowledge of the number of actuators.

GYM FC has a multi-layer hierarchical architecture composed of three layers: (i) a digital twin layer, (ii) a communication layer, and (iii) an agent-environment interface layer.

A. Digital Twin Layer

At the heart of the learning environment is a high fidelity physics simulator which provides functionality and realism that is hard to achieve with an abstract mathematical model of the aircraft and environment.

For this reason, the simulated environment exposes identical interfaces to actuators and sensors as they would exist in the physical world.

B. Communication Layer

The communication layer is positioned in between the digital twin and the agent-environment interface.

C. Environment Interface Layer

The topmost layer interfacing with the agent is the environment interface layer which implements the OpenAI Gym [10]

Each OpenAI Gym environment defines an observation space and an action space.

Reward engineering can be challenging.For this work, with the goal of establishing a baseline of accuracy, we develop a reward to reflect the current angular velocity error (i.e. e = Ω∗ − Ω).

We translate the current error et at time t into into a derived reward rt normalized between [−1, 0] as follows,

Rewards are normalized to provide standardization and stabilization during training [30].



In this section we present our evaluation on the accuracy of studied neural-network-based attitude flight controllers trained with RL.

To our knowledge, this is the first RL baseline conducted for quadcopter attitude control.

A. Setup

We evaluate the RL algorithms DDGP, TRPO, and PPO using the implementations in the OpenAI Baselines project [3]. The goal of the OpenAI Baselines project is to establish a reference implementation of RL algorithms, providing baselines for researchers to compare approaches and build upon.

Training and evaluations were run on Ubuntu 16.04 with an eight-core i7- 7700 CPU and an NVIDIA GeForce GT 730 graphics card.

B. Results

局限性分析:模型的准确性(包括气动效应)没有说明   没有用于实际飞行   只有角速度环控制 



