-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tune] Save and restore stateful callbacks as part of experiment checkpoint #31957
[Tune] Save and restore stateful callbacks as part of experiment checkpoint #31957
Conversation
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
…/stateful_callbacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main question I have is why we need a new Restorable
interface, when most of the functionality is provided by python already.
I.e. there is obj.__getstate__()
and obj.__setstate__()
, and it's automatically used for pickling and unpickling. Shouldn't users just implement those?
For the callback list or other locations where we pickle callbacks, we can catch serialization errors and provide an actionable error message.
It feels like we're trying to solve a problem that's already solved. If we went with Restorable
, would the trial runner, searcher, etc also eventually inherit this interface?
Just using the python
Update: We will go with the current
|
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
…/stateful_callbacks
python/ray/tune/callback.py
Outdated
try: | ||
callback_state = callback.get_state() | ||
any_stateful_callbacks = True | ||
except NotImplementedError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check for AttributeError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced the get_state
with a default of returning None (stateless). No need to check for this error anymore. I think an attribute error should get raised immediately instead of caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
…kpoint (ray-project#31957) Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Why are these changes needed?
This PR allows
Callback
s to implement theget_state
andset_state
methods for save/restore. During experiment checkpoints, Tune will save callback states, if any callbacks implementget_state
. During experiment restore, Tune will load the latest callback states.Context
Callback state is not currently saved as part of experiment state. For user defined stateful callbacks, this means that their callbacks will not continue from where they left off on experiment restoration.
What's the benefit of doing this instead of just pickling the callbacks along with the
TrialRunner
state? We shift the responsibility of save/load to the user instead of doing everything in Tune. For example, this allows users to re-create non-serializable objects (ex: threading objects), if used in the callback.TODO
In a future PR, change Searcher/SearchAlgorithm to use this commonRestorable
interface.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.