Implementation of on-policy algorithms #20

nsidn98 · 2020-06-16T11:53:11Z

Hey, a great repository. How do you intend to implement the on-policy algorithms? As in, how would you implement the train_dataloder() method in the lightning module?

djbyrne · 2020-06-17T06:25:19Z

Hey Siddarth, im glad you like it :). That's a great question and something I have been playing around with a lot.

My current plan is to do it very similarly to the off policy algorithms. There will be an experience source that will output experiences and this will then be feed into a buffer. This buffer will just store the sequences in order unlike the replay buffer and will probably have some functionality for calculating stats of the rollout for things like determining an advantage baseline. The experiences passed to the buffer will then be passed to the IterableDataset and used as batches for Lightning. After the batch the buffer is cleared and the process restarts.

Currently I have a base implementation of Vanilla Policy Gradient here https://github.com/djbyrne/core_rl/blob/master/algos/vanilla_policy_gradient/model.py . this uses an EpisodicExperienceSource and returns N batches of episode rollouts directly to the Dataset. I am currently not using a buffer inbetween the source and dataset but this will probably change.

nsidn98 · 2020-06-17T14:19:35Z

Thanks a lot for the reply. I'll go through the Vanilla Policy Gradient example in the repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of on-policy algorithms #20

Implementation of on-policy algorithms #20

nsidn98 commented Jun 16, 2020

djbyrne commented Jun 17, 2020

nsidn98 commented Jun 17, 2020

Implementation of on-policy algorithms #20

Implementation of on-policy algorithms #20

Comments

nsidn98 commented Jun 16, 2020

djbyrne commented Jun 17, 2020

nsidn98 commented Jun 17, 2020