-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Mixed Experience Replay 🤝 #30
Conversation
Hello, I wanted to say that this is a great functionality. In my case I am interested in distributed RL with experience sharing among agents, so having a buffer where you can sample/add differently for personal use and sharing would be great. Is this a feature you are planning to merge soon? And if so, could you extend the notebook to show how you can add to the two buffers? |
@eleninisioti Hey, so yeah ideally we can merge this ASAP. I'm just waiting on one last thing from @callumtilbury but hes quite busy at the moment. If hes unable to complete it in the coming week, I'll take over and finish it. So hopefully sometime this week it will be merged :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
A simple utility to mix sampling of multiple buffers. Useful for offline-online stuff, and some off-policy variants that include portions of on-policy data (e.g. "combined experience replay," see here).
Important (& intentional) restrictions:
[x,y,z,...]
, with a jointsample_batch_size
when creating the mixer. We are still constrained by the underlying buffer sample functions, though. Suppose we havebuffer_a
which returns(4, ...)
andbuffer_b
which returns(16, ...)
. We could create a mixer[1,1]
with thesample_batch_size=6
. In that case, we get3
"batches" frombuffer_a
, and3
batches frombuffer_b
. But if we ask forsample_batch_size=10
with ratio[1,1]
(i.e. a batch size of10/2 = 5
from each buffer), we'll only get the4
batches frombuffer_a
, along with the5
batches frombuffer_b
—so, a totalsample_batch_size = 9
. This is the idea of a "best effort"—we'll try grab enough batches of data, but only if possible. If not, we return a smaller batch than desired—as much as we can from each buffer.It'd be great to test this out in a real system. Perhaps in Stoix, @EdanToledo? I can also look at stitching vaults together, etc.
See example notebook: https://colab.research.google.com/github/instadeepai/flashbax/blob/feat/mixed_experience_replay/examples/mixer_demonstration.ipynb (obviously won't run, unless you pip install the branch version, or run locally on a branch)