-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch Categorical GRU Policy #2196
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me.
One thing that the tests are missing from the tensor flow tests is the tests that build an input that concatenates the action and the observation.
Tbf I'm not sure why those exist, and what use case they're testing. I think that @krzentner can probably shed some light on this.
Otherwise this is pretty awesome and we'll wait on krs comments to address this. I think she and another student have been working on adding pytorch rnn support for the past few weeks and this may be helpful to them.
Oh wait one more thing @ManavR123. You'll want to include a test similar to: tests/garage/tf/models/test_categorical_gru_model.py The test titled test normalized outputs checks to ensure that your module outputs the correct values in a golden values test. This will ensure correctness. You can also obtain these golden values by using your module/policy in some benchmark to ensure that it works, then running the policy and module on a simplified environment like grid world env, capturing the weights of that policy/module, and then writing a test that compares the values of a trained policy to those golden values after a few optimization iterations. Does that make sense? Please ask me anything if that wasn't clear! Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed the test you wrote. I think there's a small bug see my comments and lmk what you think.
Wow, this PR looks good. The main feedback I have is that we don't actually yet have any algorithms implemented in PyTorch that can train an RNN. Unfortunately, our VPG (and PPO and TRPO) shuffle all timesteps individually in their batching step, which prevents training the RNN correctly. I'm currently working on fixing that, but probably won't have anything usable available before sometime next week. The more minor feedback I have is that I think we should actually streamline the |
@krzentner I'm curious: what is your implementation plan for RNNs in torch/VPG? Is it to make the optimization trajectory-oriented (as in tf/VPG) or something else? |
@krzentner Did this end up getting done? I'm also trying to implement an RNN policy in PyTorch but can't seem to find any working examples. |
In this PR, I add a Pytorch version of the Categorical GRU Policy. I believe there is some in-progress work with adding RNN support #2172, however, this PR seems to be in draft for a while and I personally needed this policy so I figured I would try and contribute. One thing missing would be an example using this policy, but it isn't too different from other similar policies.
I would appreciate any feedback on this!
@krzentner @avnishn