-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: vault #5
feat: vault #5
Conversation
…lsewhere, plus some precommit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpicks and a suggestion for the list issue. Review still in progress though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! thank you so much Callum for your hard work and dedicated time to support the offline RL community. They will be very appreciative and thankful.
Stale & now Sasha is on leave :')
What?
Vault is an efficient mechanism for saving flashbax buffers to persistent data storage.
How?
Vault uses tensorstore, which is also used by Google's orbax checkpointing library. Tensorstore has a useful ability to read and write slices of buffers at a time. i.e. I could have a 100TB file saved on disk, but easily and efficiently access the first element. Building on this feature, with flashbax buffers saved in the form of (Batch, Time, Experience), we slice data along the time axis.
Vault consumes flashbax buffer states and writes to disk from the last received time index to the buffer's current time index. This must be done before the ring buffer overwrites any stale data that has not yet been written to the vault. All other bookkeeping is done by vault itself, and using it is as simple as:
v.write(buffer_state)
It is usually helpful to look at the demonstrative notebook, which adds timesteps to the buffer, writing to the vault after each one. The following output is yielded:
Why?
Variety of reasons—but mainly useful for Offline RL methods, and saving/loading buffers from checkpoints.