Minimal implementation of Proximal Policy Optimization (PPO) in PyTorch
- support discrete and continuous action space
- In continuous action space, we use the constance std for sampling.
- utils to plot learning graphs in tensorboard
- 2023-09-09
- Update "Generative Adversarial Imitation Learning(GAIL)"
Find or make a config file and run the following command.
python main.py --config=configs/Ant-v4.yaml
--exp_name=test
--train
python make_expert_dataset.py --experiment_path=checkpoints/Ant/test
--load_postfix=last
--minimum_score=5000
--n_episode=30
python main.py --experiment_path=checkpoints/Ant/test
--eval
--eval_n_episode=50
--load_postfix=last
--video_path=videos/Ant
- load_path: pretrained model prefix(ex/ number of episode, 'best' or 'last') to play
Environment | Performance Chart | Evaluation Video |
---|---|---|
Ant-v4 |
ant.mp4 |
|
Ant-v4 (GAIL) |
ant_gail.mp4 |
|
Reacher-v4 |
reacher.mp4 |
|
HalfCheetah-v4 |
cheetah.mp4 |
- IMPLEMENTATION MATTERS IN DEEP POLICY GRADIENTS: A CASE STUDY ON PPO AND TRPO
- https://github.com/junkwhinger/PPO_PyTorch
- https://github.com/nikhilbarhate99/PPO-PyTorch