-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues in ReplayBuffer
#201
Comments
Thanks @hyeok9855 this is great. My thoughts - I think that there are many use-cases where a buffer would be better off having the full trajectory (for algorithmic reasons) - but I also agree that often you only want to store the terminal state. While making So based on this I think making a bunch of buffer types makes sense using subclassing. I am thinking the hierarchy could be that you have Agree RE: merging prioritized and normal buffers - but I could imagine more kinds of buffers in the future that should be their own class. Our "prioritized" buffer is really a specific kind of prioritized buffer that won't work for everyone unfortunately. Agree 100% for the device handling. |
I'm also happy to help with this :) |
Thanks -- re this - I'm wondering whether we can simply use the https://github.com/pytorch/rl/tree/main/torchrl/data/replay_buffers |
Reusing TorchRL ReplayBuffer is a very good idea. They decouple sampling from the replay buffer, which is a great design choice. I will look at the challenges to migrating to it more in depth |
Here I note several issues in the current
ReplayBuffer
, including the points raised by @younik in #193.ReplayBuffer
can takeTrajectories
,Transitions
(which are inherited fromContainer
), andtuple[States, States]
(which is not). This makes the overall code a bit more messy. Possible solutions are as follows:a. Make
State
inherit fromContainer
, and let theReplayBuffer
takeContainer
objects.b. Make subclasses of
ReplayBuffer
depending on the objectsReplayBuffer
is just storing a terminal state x (with reward) as in Max's code, and sample a training trajectory with backward policy. This is preferred since 1) it significantly reduces memory usage (vs. Trajectory-based), and 2) the backward sampling gives more diverse learning signals, which I believe helps training. We need to support this, but it needs to be considered along with the issue 1 above.PrioritizedReplayBuffer
by default (instead ofReplayBuffer
), with a proper optional parameter that enables the prioritization will make the code more concise.ReplayBuffer
when cuda is enabled. We need to check.I hope to discuss enough before tackling those!
The text was updated successfully, but these errors were encountered: