-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ready for review] Retrieve original run tasks and clone successful ones directly #1728
Conversation
346cfe0
to
702573a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine for the most part. My main comment/concern is that I think it could be expanded to cover the regular resume as well to speed that one up too. Right now, this only speeds up the reentrant resume if I am reading this right. WIth OB's approval of course but I feel that we would also benefit even for the regular resume (which is also a legit use case if not the most pressing one).
@@ -802,7 +802,10 @@ def resume( | |||
write_run_id(run_id_file, runtime.run_id) | |||
runtime.print_workflow_info() | |||
runtime.persist_constants() | |||
runtime.execute() | |||
if clone_only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to change the behavior only for our internal use case of clone first and then execute but it feels like this mechanism would be able to work even for a regular resume by first cloning and then continuing execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked offline, lets make this PR focusing on clone-only behavior first.
781605a
to
3a77db1
Compare
702573a
to
16baeb1
Compare
16baeb1
to
c97026c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing context on --clone-wait-only
use case, but apart from the minor changes this should be good to go.
metaflow/clone_util.py
Outdated
print( | ||
f"Cloning task from {flow_name}/{clone_run_id}/{step_name}/{task_id} to {flow_name}/{run_id}/{step_name}/{task_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentional print or for debugging purposes? The stdout for this seems to differ from regular MFLOG lines.
also as this is called as part of core, the f-string will break for old python so at least change that to older style if the output is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few other occurences of f-strings need changes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated print and f-string
4f9e3c9
to
f842b52
Compare
For reference and posterity: the way we use resume is in one of two ways:
This PR is the second in the line to improve that second behavior:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a rebase due to target branch getting merged, but otherwise looks good to go.
add more print profile time and threads workers with clone only command include origin_ds_set and s3 batch write clean up pr for review remove wait-only flag because it is no longer used address comments fix OSS resume
f842b52
to
8d4619e
Compare
No description provided.