-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spot] Switch to download and streaming for the failed user program logs #2330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll! Some questions.
sky/spot/controller.py
Outdated
@@ -74,6 +75,36 @@ def __init__(self, job_id: int, dag_yaml: str, | |||
job_id_env_vars) | |||
task.update_envs(task_envs) | |||
|
|||
def _download_log_and_stream(self, handle): | |||
""" Download the logs from spot cluster, and stream them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" Download the logs from spot cluster, and stream them | |
"""Downloads the logs of the latest job of a spot cluster, and streams them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to reduce the number of words a bit to fit it in one line. Modified. Thanks!
sky/spot/controller.py
Outdated
log_dir = list(log_dirs.values())[0] | ||
|
||
# Print the logs to the console. | ||
for log_file in os.listdir(log_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what is the layout of ~/sky_logs/spot_jobs/? Does it have a subdir or a file per spot job id (that has failed)? Wondering why we're printing every *.log file here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each spot job will have a subdir under the ~/sky_logs/spot_jobs
, and the log_dir
here is the folder for that particular job.
We can only print out the run.log
under the folder, but I was trying to make sure we don't miss any possible logs in future. Just changed back to run.log
for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll! Some nits.
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Fixes #2329
This should make the debugging of the user program much easier.
Tested (run the relevant ones):
bash format.sh
sky spot logs --controller
#2329.pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh