-
Notifications
You must be signed in to change notification settings - Fork 15
Configurable ReplayBuffer #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/forge/actors/replay_buffer.py
Outdated
sample_count: int = 0 | ||
|
||
|
||
def default_evict(buffer, policy_version, max_samples=None, max_age=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put some type hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test specifically for _collect?
src/forge/actors/replay_buffer.py
Outdated
return indices | ||
|
||
|
||
def default_sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to the actual things they do lol.
default_sample
-> random_sample
src/forge/actors/replay_buffer.py
Outdated
sample_count: int = 0 | ||
|
||
|
||
def default_evict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_evict
-> evict_if_too_old
src/forge/actors/replay_buffer.py
Outdated
if self.seed is None: | ||
self.seed = random.randint(0, 2**32) | ||
random.seed(self.seed) | ||
self.sampler = random.sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please contain all sampling logic in one piece.
src/forge/actors/replay_buffer.py
Outdated
|
||
def age_evict( | ||
buffer: deque, policy_version: int, max_samples: int = None, max_age: int = None | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typehints?
Added custom Sample and Eviction policies to Replay buffer. Our existing replay buffer did not allow for custom handling of these two properties which are important for tuning async training runs. To allow these functions to be generic and user defined, I added two callable inputs, sample_policy and eviction_policy, they take in the current buffer, some buffer parameters (max_policy_age, max_resample_count) and return a list of indices to either evict or sample. An list of indices is returned instead of directly modifying the buffer so we can control how the buffer is updated ourself instead of the user.
Changelog: