Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of on-policy algorithms #20

Open
nsidn98 opened this issue Jun 16, 2020 · 2 comments
Open

Implementation of on-policy algorithms #20

nsidn98 opened this issue Jun 16, 2020 · 2 comments

Comments

@nsidn98
Copy link

nsidn98 commented Jun 16, 2020

Hey, a great repository. How do you intend to implement the on-policy algorithms? As in, how would you implement the train_dataloder() method in the lightning module?

@djbyrne
Copy link
Owner

djbyrne commented Jun 17, 2020

Hey Siddarth, im glad you like it :). That's a great question and something I have been playing around with a lot.

My current plan is to do it very similarly to the off policy algorithms. There will be an experience source that will output experiences and this will then be feed into a buffer. This buffer will just store the sequences in order unlike the replay buffer and will probably have some functionality for calculating stats of the rollout for things like determining an advantage baseline. The experiences passed to the buffer will then be passed to the IterableDataset and used as batches for Lightning. After the batch the buffer is cleared and the process restarts.

Currently I have a base implementation of Vanilla Policy Gradient here https://github.com/djbyrne/core_rl/blob/master/algos/vanilla_policy_gradient/model.py . this uses an EpisodicExperienceSource and returns N batches of episode rollouts directly to the Dataset. I am currently not using a buffer inbetween the source and dataset but this will probably change.

@nsidn98
Copy link
Author

nsidn98 commented Jun 17, 2020

Thanks a lot for the reply. I'll go through the Vanilla Policy Gradient example in the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants