Skip to content

Conversation

pbontrager
Copy link
Contributor

Added custom Sample and Eviction policies to Replay buffer. Our existing replay buffer did not allow for custom handling of these two properties which are important for tuning async training runs. To allow these functions to be generic and user defined, I added two callable inputs, sample_policy and eviction_policy, they take in the current buffer, some buffer parameters (max_policy_age, max_resample_count) and return a list of indices to either evict or sample. An list of indices is returned instead of directly modifying the buffer so we can control how the buffer is updated ourself instead of the user.

Changelog:

  • BufferEntry type created to track episode sample_counts. This allows for user control over how often data can be resampled
  • default_evict and default_sample are default policies that match the existing behavior of the ReplayBuffer
  • max_buffer_size added with a first in first out policy for removing old data (the assumption is that stale data will have already been removed by evict
  • max_policy_age, max_resample_count, and max_buffer_size all support a None option now to remove this restriction
  • buffer changed from list to deque. This allows for efficient enforcement of max_buffer_size without constant reallocation of the entire buffer.
  • _collect method added to get the buffer entries given a list of indices. Needs special handling to do this efficiently with a deque object.
  • calling sample no longer removes sampled data from the buffer, is just increments the sample count for that data. It's up to the eviction policy to remove data based on sample count.
  • removed buffer_size parameter from sample to simplify api
  • updated unit_test for api changes and slight behavior change in when sampled data is removed.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 14, 2025
sample_count: int = 0


def default_evict(buffer, policy_version, max_samples=None, max_age=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put some type hints?

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test specifically for _collect?

return indices


def default_sample(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to the actual things they do lol.

default_sample -> random_sample

sample_count: int = 0


def default_evict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_evict -> evict_if_too_old

if self.seed is None:
self.seed = random.randint(0, 2**32)
random.seed(self.seed)
self.sampler = random.sample
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please contain all sampling logic in one piece.


def age_evict(
buffer: deque, policy_version: int, max_samples: int = None, max_age: int = None
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehints?

@pbontrager pbontrager merged commit 839664c into main Oct 15, 2025
9 checks passed
allenwang28 pushed a commit to allenwang28/forge that referenced this pull request Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants