Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match PPG implementation #186

Merged
merged 32 commits into from
May 28, 2022
Merged

Match PPG implementation #186

merged 32 commits into from
May 28, 2022

Conversation

dipamc
Copy link
Collaborator

@dipamc dipamc commented May 18, 2022

Description

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented May 18, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview May 28, 2022 at 0:09AM (UTC)

@gitpod-io
Copy link

gitpod-io bot commented May 18, 2022

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is massive! Thanks for making this PR.

I think before any additional changes, our next step is to establish a great baseline. We should try to compare our results against the original results, otherwise, it's hard to attest the quality of our PPG implementation given that we included divergent changes such as using shared optimizers.

The original paper ran experiments using the hard distribution mode, so we might have to re-run them. I made a fork available here https://github.com/openrlbenchmark/phasic-policy-gradient to run tracked experiments. Unfortunately, I was not able to run them due to insufficient GPU memory... Would you mind giving it a try? The benchmark commands is at https://github.com/openrlbenchmark/phasic-policy-gradient/blob/add-wandb/benchmark.sh

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved
cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved
cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved
docs/rl-algorithms/ppg.md Outdated Show resolved Hide resolved
@vwxyzjn
Copy link
Owner

vwxyzjn commented May 27, 2022

Hey, @Dipamc77 I made some modifications to the documentation. Would you mind adding a section called "Explanation of the logged metrics"? (see here as an example). Everything else looks good on my end.

image

I did notice the wall-time performance of openai/phasic-policy-gradient is better, but the reason could be that we record the videos (which can be costly) or hardware differences. I am not too worried about it though.

image

* Original PPO used orthogonal initialization of only the Policy head and Value heads with scale of 0.01 and 1. respectively.
* For PPG
* All weights are initialized with the default torch initialization (Kaiming Uniform)
* Each layer’s weights are divided by the L2 norm of the weights along the (which axis?), and multiplied by a scale factor.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify "which axis" here.

@vwxyzjn vwxyzjn merged commit eba6452 into vwxyzjn:master May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants