-
Add entropy term to encourage exploration
-
GAE
-
Distributional
-
Other environments
-
Bigger -> SLower nets
-
The exploration noise causes NAN gradients, thus NAN outputs
-
Need experience replay because it's OBVIOUSLY forgetting stuff from the past.
-
Use OpenAI examples
-
Combine 2 nets into one -> Works -> Learns a bit slower I think
-
Tuned hyper-parameters, specifically the size of roll-outs, number of updates and batch size
-
Next step -> Try GAE estimation
-
After -> Train in distributed setting with harder environments
-
Compare to OpenAI baseline
-
Incorporate into StarCraft
-
Notifications
You must be signed in to change notification settings - Fork 4
dai-dao/PPO-Gluon
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Implementation of PPO in Gluon / MXnet
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published