GitHub - dai-dao/PPO-Gluon: Implementation of PPO in Gluon / MXnet

Add entropy term to encourage exploration
GAE
Distributional
Other environments
Bigger -> SLower nets
The exploration noise causes NAN gradients, thus NAN outputs
Need experience replay because it's OBVIOUSLY forgetting stuff from the past.
Use OpenAI examples
Combine 2 nets into one -> Works -> Learns a bit slower I think
Tuned hyper-parameters, specifically the size of roll-outs, number of updates and batch size
Next step -> Try GAE estimation
After -> Train in distributed setting with harder environments
Compare to OpenAI baseline
Incorporate into StarCraft

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
main.py		main.py
model.py		model.py
params.py		params.py
ppo_cont.py		ppo_cont.py
ppo_discrete.py		ppo_discrete.py
test.ipynb		test.ipynb
test.py		test.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback