This is a forked repository of OpenAI Gym, so that functionalities and environemtns are identical except we add two quadrotor models, rate control and ball bounding quadrotor model. Threfore, this repo only focuses on these models and please refer to OpenAI Gym for more detail (e.g., installation, and other models).
We use Mujoco 2.0 and Mujoco-py (python wrapper). It is recommended to create a python 3.x conda environment for simplicity. Once you install this repo, check installed environments by executing the following
$python
from gym import envs
print(envs.registry.all())
You should be able to find two custom environments; “QuadRate-v0” and “BallBouncingQuad-v0”. You also need the following repos for testing and training these environments.
- https://github.com/inkyusa/rl-baselines-zoo
- https://github.com/inkyusa/stable-baselines (installation required)
- https://github.com/inkyusa/gym_rotor (installation required)
- https://github.com/inkyusa/openai_train_scripts
The animation below is what you can expect after training your rate control model. Rate control implies that we command body-rate
for 3 axes; roll (rotating along x-axis which is forward direction of the vehicle), pitch (rotating along y-axis which is left), and yaw (rotating along z-axis which is up); . The units are rad/s
for rates and [0,1] for thrust.
The task of this environment is very simple that we provide a goal position, and a policy is trained to minimize goal to vehicle distance (i.e., maximize cumulative reward). For the detail for reward shaping, please have a look here. For training, we use PPO2 provided from stable-baselines. In summary, we have 4 input commands, , and 13 observations, . Note that is unit quaternion (4x1) and others are correspondence to position, linear-, angular-velocity respectively (3x1 vector for each).
quad_rate.py is OpenAI environment file and quadrotor_quat.xml is Mujoco model file that describes physical vehicle model and other simulation properties such as air-density, gravity, viscosity, and so on. quadrotor_quat_fancy.xml is only for fancier rendering (more effects for lighting, shadow etc.) which usually takes more time to visualize. It is thus recommdended to use quadrotor_quat.xml for the sake of training time.
We provide a pre-trained weight and you can obtain it from another repository.
- clone it and go to openai_train_scripts folder and execute the follow command
source ./rateQuad_test_script.sh ./model
You should be able to see the same animation we saw earlier. Please note that you need to change system dependent variables (e.g., RL_BASELINES_ZOO_PATH
) as of yours.
As we can see from the above animation, our agent is able to fly to the goal and hover at that position. In order to do this task, the policy has to learn underlying attitude
and position
controllers. The former governs to control attitudes of the vehicle which are roll, pitch, and yaw angles and the latter deals with regulating position (i.e., tracking position error and minimiing it).
It is often quite important to properly tune hyperparameters for a particular environment yet PPO2 is relatively robust to these params. We use the following setup as a suboptimal configuration and it seems to work well. But always you are more than welcome to tune your own params and test it. One can find hyperparameters from here
The table below summarizes hyperparameters used for training both rate control and ball bouncing quadrotor.
Name of param | Value |
---|---|
normalize | true |
n_envs | 32 |
n_timesteps | 50e7 |
policy | 'MlpPolicy' |
policy_act_fun | 'tanh' |
n_steps | 2048 |
nminibatches | 50e7 |
lam | 0.95 |
noptepochs | 10 |
ent_coef | 0.001 |
learning_rate | 2.5e-4 |
cliprange | 0.2 |
max_episode_steps | 8000 |
reward_threshold | 9600 |
Analogous to above testing, training can be easily done if you already installed dependencies. Go to openai_train_scripts folder and execute the follow command
source ./train_rateQuad_script_module.sh
Please note that you need to change system dependent variables (e.g., RL_BASELINES_ZOO_PATH
, --tensorboard-log
, --log-folder
, and
lockFile
) as of yours.
This environment is minor extension of the privous environment such is rate control. We introduce a ball above the vehicle and shape the reward in the way of hitting the ball at the center of the vehicle. Below animation demonstartes this.
Similar to the preivous example, we have 4 input commands , , but 19 observations, . Note that is unit quaternion (4x1) and others are correspondence to the vehicle and ball position, linear velocity of vehicle and ball, and vehicle angular velocity respectively (all 3x1).
One tricky thing for this model was simulating elastic collision (Mujoco 1.5 didn't fully suport this). According to their description regarding Mujoco 2.0, full elastic simulation is supported and a user can set it by specifying negative number in solref (see here). For those who want to know in-depth explanation, please refer to link1 and link2)
The trained policy performs well (I think) but sometimes it can't handle flowing off ball when bouncing is very small.
ball_bouncing_quad.py is OpenAI environment file and ball_bouncing_quad.xml is Mujoco model file that describes physical vehicle and ball models and other simulation properties such as air-density, gravity, viscosity, and so on. ball_bouncing_quad_fancy.xml is only for fancier rendering (more effects for lighting, shadow etc.) which usually takes more time to visualize. It is thus recommdended to use ball_bouncing_quad.xml for the sake of training time. Note that in this Mujoco model, we set contype
and conaffinity
as 0 for the vehicle arms and propellers to avoid possible collisions with ball. Only the top plate has contype
and conaffinity
of 1 to enable collision with ball. This may be different to real quadrotor scenario.
We provide a pre-trained weight and you can obtain it from another repository.
- clone it and go to openai_train_scripts folder and execute the follow command
source ./bbq_test_script.sh ./model
You should be able to see the same animation we saw earlier.
The same hyperparameters used as of rate control model.
Analogous to above testing, training can be easily done if you already installed dependencies. Go to openai_train_scripts folder and execute the follow command
source ./train_bbq_script_module.sh
WIP...but you can have a look our previous work on Control of a Quadrotor with Reinforcement Learning (i.e., outputting direct rotor speed commands instead rate command). Please stay tune and we will update once we have some interesting results.
If our work helps your works in an academic/research context, please cite the following publication(s):
- Jemin Hwangbo, Inkyu Sa, Roland Siegwart, Marco Hutter, "Control of a Quadrotor with Reinforcement Learning", 2017, IEEE Robotics and Automation Letters or (arxiv pdf)
@ARTICLE{7961277,
author={J. {Hwangbo} and I. {Sa} and R. {Siegwart} and M. {Hutter}},
journal={IEEE Robotics and Automation Letters},
title={Control of a Quadrotor With Reinforcement Learning},
year={2017},
volume={2},
number={4},
pages={2096-2103},
keywords={aircraft control;helicopters;learning systems;neurocontrollers;stability;step response;quadrotor control;reinforcement learning;neural network;step response;stabilization;Trajectory;Junctions;Learning (artificial intelligence);Computational modeling;Neural networks;Robots;Optimization;Aerial systems: mechanics and control;learning and adaptive systems},
doi={10.1109/LRA.2017.2720851},
ISSN={2377-3766},
month={Oct},}