Match PPG implementation #186

dipamc · 2022-05-18T15:57:14Z

Description

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-05-18T15:57:19Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	May 28, 2022 at 0:09AM (UTC)

gitpod-io · 2022-05-18T15:57:19Z

vwxyzjn

This is massive! Thanks for making this PR.

I think before any additional changes, our next step is to establish a great baseline. We should try to compare our results against the original results, otherwise, it's hard to attest the quality of our PPG implementation given that we included divergent changes such as using shared optimizers.

The original paper ran experiments using the hard distribution mode, so we might have to re-run them. I made a fork available here https://github.com/openrlbenchmark/phasic-policy-gradient to run tracked experiments. Unfortunately, I was not able to run them due to insufficient GPU memory... Would you mind giving it a try? The benchmark commands is at https://github.com/openrlbenchmark/phasic-policy-gradient/blob/add-wandb/benchmark.sh

cleanrl/ppg_procgen.py

docs/rl-algorithms/ppg.md

vwxyzjn · 2022-05-27T02:50:19Z

Hey, @Dipamc77 I made some modifications to the documentation. Would you mind adding a section called "Explanation of the logged metrics"? (see here as an example). Everything else looks good on my end.

I did notice the wall-time performance of openai/phasic-policy-gradient is better, but the reason could be that we record the videos (which can be costly) or hardware differences. I am not too worried about it though.

vwxyzjn · 2022-05-27T02:53:30Z

docs/rl-algorithms/ppg.md

+    * Original PPO used orthogonal initialization of only the Policy head and Value heads with scale of 0.01 and 1. respectively.
+    * For PPG
+        * All weights are initialized with the default torch initialization (Kaiming Uniform)
+        * Each layer’s weights are divided by the L2 norm of the weights along the (which axis?), and multiplied by a scale factor.


Please clarify "which axis" here.

dipamc and others added 8 commits May 14, 2022 11:13

added nit changes from ppg code

419041d

change observation buffer to uint8

2e1190b

sample full rollouts

86f5be7

minor device fix

beff293

update optimizer settings

4cb85d5

add ppg documentation

d6ee26b

update mkdocs

fea4531

update images to png for codespell errors

20f15da

vercel bot deployed to Preview May 18, 2022 15:57 View deployment

trigger CI

6c3cb05

vercel bot deployed to Preview May 18, 2022 22:08 View deployment

Minor format change

631ab96

vercel bot deployed to Preview May 18, 2022 22:13 View deployment

format by running pre-commit

d961d0f

vercel bot deployed to Preview May 18, 2022 22:16 View deployment

removes trailing space

4cff11d

vercel bot deployed to Preview May 18, 2022 22:17 View deployment

vwxyzjn reviewed May 18, 2022

View reviewed changes

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved

docs/rl-algorithms/ppg.md Outdated Show resolved Hide resolved

Add an extra note

fb9c832

vercel bot deployed to Preview May 19, 2022 02:14 View deployment

argument names and documentation changes

31bb5c4

vercel bot deployed to Preview May 23, 2022 16:42 View deployment

add capture video

ed66604

vercel bot deployed to Preview May 23, 2022 17:00 View deployment

add experiment report

1610191

vercel bot deployed to Preview May 25, 2022 13:59 View deployment

Merge branch 'master' into ppg-dev

51c6aac

vercel bot deployed to Preview May 27, 2022 02:16 View deployment

Fix documentation for PPO

9c4edf8

vercel bot deployed to Preview May 27, 2022 02:54 View deployment

Add benchmark commands

23cd48e

vercel bot deployed to Preview May 27, 2022 02:56 View deployment

Add benchmark commands

8e4f977

vercel bot deployed to Preview May 27, 2022 02:57 View deployment

add metrics section

72e8cce

vercel bot deployed to Preview May 27, 2022 07:56 View deployment

Add more docs

aa695c1

vercel bot deployed to Preview May 27, 2022 14:47 View deployment

Quick fix on ddpg docs

0564584

vercel bot deployed to Preview May 27, 2022 14:48 View deployment

vwxyzjn added 2 commits May 27, 2022 10:59

Add procgen test cases

a08039e

Update CI

31a175c

vercel bot deployed to Preview May 27, 2022 15:00 View deployment

test CI

f063a7b

vercel bot deployed to Preview May 27, 2022 15:03 View deployment

test ci

60df2c8

vercel bot deployed to Preview May 27, 2022 15:03 View deployment

Update tests

e70c71a

vercel bot deployed to Preview May 27, 2022 17:53 View deployment

normalization axis documentation

6ebaaae

vercel bot deployed to Preview May 28, 2022 12:09 View deployment

vwxyzjn approved these changes May 28, 2022

View reviewed changes

vwxyzjn merged commit eba6452 into vwxyzjn:master May 28, 2022

This was referenced Jun 20, 2022

Adding Average Reward PPO proposal #210

Closed

Prototype TD3 with JAX #216

Closed

JAX Integration with CleanRL #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match PPG implementation #186

Match PPG implementation #186

dipamc commented May 18, 2022 •

edited

Loading

vercel bot commented May 18, 2022 •

edited

Loading

gitpod-io bot commented May 18, 2022

vwxyzjn left a comment •

edited

Loading

vwxyzjn commented May 27, 2022

vwxyzjn May 27, 2022

Match PPG implementation #186

Match PPG implementation #186

Conversation

dipamc commented May 18, 2022 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented May 18, 2022 • edited Loading

gitpod-io bot commented May 18, 2022

vwxyzjn left a comment • edited Loading

Choose a reason for hiding this comment

vwxyzjn commented May 27, 2022

vwxyzjn May 27, 2022

Choose a reason for hiding this comment

dipamc commented May 18, 2022 •

edited

Loading

vercel bot commented May 18, 2022 •

edited

Loading

vwxyzjn left a comment •

edited

Loading