You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, a great repository. How do you intend to implement the on-policy algorithms? As in, how would you implement the train_dataloder() method in the lightning module?
The text was updated successfully, but these errors were encountered:
Hey Siddarth, im glad you like it :). That's a great question and something I have been playing around with a lot.
My current plan is to do it very similarly to the off policy algorithms. There will be an experience source that will output experiences and this will then be feed into a buffer. This buffer will just store the sequences in order unlike the replay buffer and will probably have some functionality for calculating stats of the rollout for things like determining an advantage baseline. The experiences passed to the buffer will then be passed to the IterableDataset and used as batches for Lightning. After the batch the buffer is cleared and the process restarts.
Hey, a great repository. How do you intend to implement the on-policy algorithms? As in, how would you implement the train_dataloder() method in the lightning module?
The text was updated successfully, but these errors were encountered: