Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Mixed Experience Replay 🤝 #30

Merged
merged 11 commits into from
Sep 20, 2024
Merged

Conversation

callumtilbury
Copy link
Contributor

@callumtilbury callumtilbury commented Jul 17, 2024

A simple utility to mix sampling of multiple buffers. Useful for offline-online stuff, and some off-policy variants that include portions of on-policy data (e.g. "combined experience replay," see here).

Important (& intentional) restrictions:

  • The buffers must be of the same type. Why? We need to concatenate the samples, thus they need the same pytree structure. FlatBuffer returns an ExperiencePair, and this cannot be combined with a PrioritisedTrajectoryBufferSample, etc.
  • We can ask for any ratio, [x,y,z,...], with a joint sample_batch_size when creating the mixer. We are still constrained by the underlying buffer sample functions, though. Suppose we have buffer_a which returns (4, ...) and buffer_b which returns (16, ...). We could create a mixer [1,1] with the sample_batch_size=6. In that case, we get 3 "batches" from buffer_a, and 3 batches from buffer_b. But if we ask for sample_batch_size=10 with ratio [1,1] (i.e. a batch size of 10/2 = 5 from each buffer), we'll only get the 4 batches from buffer_a, along with the 5 batches from buffer_b—so, a total sample_batch_size = 9. This is the idea of a "best effort"—we'll try grab enough batches of data, but only if possible. If not, we return a smaller batch than desired—as much as we can from each buffer.

It'd be great to test this out in a real system. Perhaps in Stoix, @EdanToledo? I can also look at stitching vaults together, etc.

See example notebook: https://colab.research.google.com/github/instadeepai/flashbax/blob/feat/mixed_experience_replay/examples/mixer_demonstration.ipynb (obviously won't run, unless you pip install the branch version, or run locally on a branch)

@callumtilbury callumtilbury requested a review from EdanToledo July 17, 2024 13:55
@callumtilbury callumtilbury added the enhancement New feature or request label Jul 17, 2024
@eleninisioti
Copy link

Hello, I wanted to say that this is a great functionality. In my case I am interested in distributed RL with experience sharing among agents, so having a buffer where you can sample/add differently for personal use and sharing would be great. Is this a feature you are planning to merge soon? And if so, could you extend the notebook to show how you can add to the two buffers?

@EdanToledo
Copy link
Contributor

@eleninisioti Hey, so yeah ideally we can merge this ASAP. I'm just waiting on one last thing from @callumtilbury but hes quite busy at the moment. If hes unable to complete it in the coming week, I'll take over and finish it. So hopefully sometime this week it will be merged :)

Copy link
Contributor

@EdanToledo EdanToledo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@EdanToledo EdanToledo merged commit e0199d7 into main Sep 20, 2024
3 checks passed
@EdanToledo EdanToledo deleted the feat/mixed_experience_replay branch September 20, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants