Configurable ReplayBuffer #410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

pbontrager merged 9 commits into main from update_replaybuffer

Oct 15, 2025

Contributor

pbontrager commented Oct 14, 2025

Added custom Sample and Eviction policies to Replay buffer. Our existing replay buffer did not allow for custom handling of these two properties which are important for tuning async training runs. To allow these functions to be generic and user defined, I added two callable inputs, sample_policy and eviction_policy, they take in the current buffer, some buffer parameters (max_policy_age, max_resample_count) and return a list of indices to either evict or sample. An list of indices is returned instead of directly modifying the buffer so we can control how the buffer is updated ourself instead of the user.

Changelog:

BufferEntry type created to track episode sample_counts. This allows for user control over how often data can be resampled
default_evict and default_sample are default policies that match the existing behavior of the ReplayBuffer
max_buffer_size added with a first in first out policy for removing old data (the assumption is that stale data will have already been removed by evict
max_policy_age, max_resample_count, and max_buffer_size all support a None option now to remove this restriction
buffer changed from list to deque. This allows for efficient enforcement of max_buffer_size without constant reallocation of the entire buffer.
_collect method added to get the buffer entries given a list of indices. Needs special handling to do this efficiently with a deque object.
calling sample no longer removes sampled data from the buffer, is just increments the sample count for that data. It's up to the eviction policy to remove data based on sample count.
removed buffer_size parameter from sample to simplify api
updated unit_test for api changes and slight behavior change in when sampled data is removed.


          updated replay buffer

9dba3e6

meta-cla bot added the CLA Signed label


          lint

37a348c

pbontrager requested a review from joecummings

October 14, 2025 20:24


          fixed flakey test

4d2a1ce

joecummings reviewed

View reviewed changes

src/forge/actors/replay_buffer.py Outdated

    
                  sample_count: int = 0

              def default_evict(buffer, policy_version, max_samples=None, max_age=None):

Member

joecummings Oct 15, 2025

Can we put some type hints?

pbontrager and others added 4 commits

October 15, 2025 16:09


          added type hints

09e19f2


          ran linting

165f57b


          bug fix

8f7a77d


          another bug fix

03fe93c

joecummings reviewed

View reviewed changes

Member

joecummings left a comment

Can you also add a test specifically for _collect?

src/forge/actors/replay_buffer.py Outdated

    
                  return indices

              def default_sample(

Member

joecummings Oct 15, 2025

Rename this to the actual things they do lol.

default_sample -> random_sample

src/forge/actors/replay_buffer.py Outdated

    
                  sample_count: int = 0

              def default_evict(

Member

joecummings Oct 15, 2025

default_evict -> evict_if_too_old

src/forge/actors/replay_buffer.py Outdated

    
                      if self.seed is None:

                          self.seed = random.randint(0, 2**32)

                      random.seed(self.seed)

                      self.sampler = random.sample

Member

joecummings Oct 15, 2025

Please contain all sampling logic in one piece.


          responsed to review

98d7ecc

joecummings reviewed

View reviewed changes

src/forge/actors/replay_buffer.py Outdated

    
              def age_evict(

                  buffer: deque, policy_version: int, max_samples: int = None, max_age: int = None

              ):

Member

joecummings Oct 15, 2025

typehints?


          added collect test

6aa8b5a

joecummings approved these changes

View reviewed changes

pbontrager merged commit 839664c into main

9 checks passed

allenwang28 pushed a commit to allenwang28/forge that referenced this pull request


          Configurable ReplayBuffer (meta-pytorch#410)

7f35b26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels