-
Notifications
You must be signed in to change notification settings - Fork 7k
[train][V2] Implement Result::from_path in v2 #58216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
This reverts commit f309643. Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the Result::from_path method in Ray Train v2, which is a valuable feature for reconstructing results from stored checkpoints. The implementation is well-structured, leveraging the existing CheckpointManager to restore state, and is accompanied by a comprehensive set of tests covering local and remote storage, different path types, and error conditions.
I have one suggestion to improve the user-friendliness of an error message. Additionally, I noticed that the new _from_checkpoint_manager method duplicates logic from TrainController._build_result. A follow-up refactoring to consolidate this logic would be beneficial for maintainability.
justinvyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
## Description In this function, `Result::from_path` is implemented in ray train v2, which reconstructs a `Result` object from the checkpoints. This implementation leverages `CheckpointManager` and refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com>
## Description In this function, `Result::from_path` is implemented in ray train v2, which reconstructs a `Result` object from the checkpoints. This implementation leverages `CheckpointManager` and refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com>
## Description In this function, `Result::from_path` is implemented in ray train v2, which reconstructs a `Result` object from the checkpoints. This implementation leverages `CheckpointManager` and refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
## Description In this function, `Result::from_path` is implemented in ray train v2, which reconstructs a `Result` object from the checkpoints. This implementation leverages `CheckpointManager` and refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Description
In this function,
Result::from_pathis implemented in ray train v2, which reconstructs aResultobject from the checkpoints. This implementation leveragesCheckpointManagerand refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540