Skip to content

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments for Robotics and Controls. The goal of this project is to include engineering applications for industrial optimization. I reproduce the results of several model-free and modelbased RL algorithms in continuous and discrete action domains.

License

Notifications You must be signed in to change notification settings

SHlHAB/Deep-Reinforcement-Learning-Optimal-Control

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep-Reinforcement-Learning-Optimal-Control

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments for Robotics and Controls. The goal of this project is to include engineering applications for industrial optimization. I reproduce the results of several model-free and modelbased RL algorithms in continuous and discrete action domains.

Travis CI contributions welcome

RL

Here a curated article list of the algorithm implemented

Algorithms Implemented

  1. Deep Q Learning (DQN) (Mnih et al. 2013)
  2. DQN with Fixed Q Targets (Mnih et al. 2013)
  3. Double DQN (DDQN) (Hado van Hasselt et al. 2015)
  4. DDQN with Prioritised Experience Replay (Schaul et al. 2016)
  5. Dueling DDQN (Wang et al. 2016)
  6. REINFORCE (Williams et al. 1992)
  7. Deep Deterministic Policy Gradients (DDPG) (Lillicrap et al. 2016 )
  8. Twin Delayed Deep Deterministic Policy Gradients (TD3) (Fujimoto et al. 2018)
  9. Soft Actor-Critic (SAC) (Haarnoja et al. 2018)
  10. Soft Actor-Critic for Discrete Actions (SAC-Discrete) (Christodoulou 2019)
  11. Asynchronous Advantage Actor Critic (A3C) (Mnih et al. 2016)
  12. Syncrhonous Advantage Actor Critic (A2C)
  13. Proximal Policy Optimisation (PPO) (Schulman et al. 2017)
  14. DQN with Hindsight Experience Replay (DQN-HER) (Andrychowicz et al. 2018)
  15. DDPG with Hindsight Experience Replay (DDPG-HER) (Andrychowicz et al. 2018 )
  16. Hierarchical-DQN (h-DQN) (Kulkarni et al. 2016)
  17. Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) (Florensa et al. 2017)
  18. Diversity Is All You Need (DIAYN) (Eyensbach et al. 2018)

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals).

I plan to add more RL algorithms related to engineering process.

To do: Environments to be Implemented as a next stage

  1. Bit Flipping Game (as described in Andrychowicz et al. 2018)
  2. Four Rooms Game (as described in Sutton et al. 1998)
  3. Long Corridor Game (as described in Kulkarni et al. 2016)
  4. Ant-{Maze, Push, Fall} (as desribed in Nachum et al. 2018 and their accompanying code)

Results

1. Environements: CartPole and Pendulum for Classic Control and Robotics

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Pendulum. We record the average result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files results/Cart_Pole.py and results/Pendulum.py.

2. Policy Gradients Algorithm Experiements

Below shows the performance of Actor Critic models such as DDPG, PPO, SAC and TD3 including learning acceleration methods using demonstrations for treating real applications with sparse rewards.

The results replicate the results found in the papers. In the next stage, I plan to show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

General Cart Pole Results for Actor Critic models Individual Cart Pole Results for DDPG General Cart Pole Results for SAC

3. DQN Learning Algorithm Experiments

The results of DQN's show how we could avoid instable or even divergent nonlinear function approximator presented in the action-value Q function. The instability is more often caused by the presence of correlation in the sequence of observations, DQN suggest two key ideas to address these instabilities with a novel variant of Q-learning: Replay buffer and Fixed Q-target.

The results replicate the results found in the papers for DQN, DoubleDQN, PrioritizedExperienceReplay and N-stepLearning.

General Pendulum Results for DQN and its variants models Individual Pendulum Results for DoubleDQN Individual Pendulum Results for DQNNStep

Usage

The repository's high-level structure is:

├── agents                    
    ├── Actor_critics  
        ├── Base_agent
        ├── A2C_Agent
        ├── DDPG_Agent
        ├── PPO_Agent
        ├── SAC_Agent
    ├── DQN         
        ├── Base_agent_DQN
        └── DQN_Agent
        └── DQN_PER_Agent
        └── DoubleDQN_Agent
        └── DQNNStep_Agent
├── environments   
├── tests
    ├── Cart_pole
    ├── Pendulum             
    └── data_and_graphs        
├── utilities             
    └── Replay_Buffer
    └── Actor_Critics_utilities
        └── A2C
        └── DDPG
        └── PPO
        └── SAC
        └── TD3
    └── Actor_Critics_utilities    
        └── DQN_Network
        └── Prioritized_Replay_Buffer
        └── Replay_Buffer_NStep

i) To watch the agents learn the above games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/vincehass/Arctic-Deep-Reinforcement-Learning-Benchmark.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt

python test/Cart_Pole.py

For other games change the last line to one of the other files in the test folder.

ii) To train the agents on another game

Most Open AI gym environments should work. All you would need to do is change the config.environment field (look at Results/Cart_Pole.py for an example of this).

You can also play with your own custom game if you create a separate class that inherits from gym.Env. See Environments for an example of a custom environment and then see the script test to see how to have agents play the environment.

4. Environements: WindFarm for Active Wake Control

Below shows various RL algorithms successfully learning discrete action. We record the average result from running the algorithms with 4 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used are large can be found in files config/action_representations_*.yml.

General Reward Performance Results for SAC, Noisy and Floris Algorithms

5. Hyperparameters: Could WindFarm Hyperparameters be optimized with a Surrogate Model?

About

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments for Robotics and Controls. The goal of this project is to include engineering applications for industrial optimization. I reproduce the results of several model-free and modelbased RL algorithms in continuous and discrete action domains.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%