Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core/UX] Add Job API and support managed job (both on-demand and spot) #3419

Merged
merged 174 commits into from
May 5, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Apr 4, 2024

Note: this PR require a sky start -f sky-spot-controller-xx to make the commands like sky spot logs, sky spot queue work. (We found a way to handle this)

Refer to the latest doc: https://skypilot.readthedocs.io/en/job-api/examples/managed-jobs.html#using-both-spot-and-on-demand-instances

TODO:

  • Update the managed spot doc
  • Add doc for pipeline
  • Better hints when sky job queue runs on an old controller
  • Test backward compatibility

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
    • pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh
    • master: sky spot launch -n pipeline tests/test_yamls/pipeline.yaml; this PR: sky job queue; sky job logs; sky job dashboard; sky job cancel; sky job queue
    • bash -i tests/backward_comaptibility_tests.sh 0 7

Michaelvll and others added 3 commits May 3, 2024 23:32
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great work @Michaelvll !! It looks awesome to me. Left some nits and it should be ready to go! 🚀

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the job 13, 14 not contains any [spot]/[On-demand] suffixes? Does that means it not require any special resources?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not add the suffix for On-Demand only request for simplicity, and add the suffix for all other cases. Wdyt?

docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
Comment on lines 381 to 382
# TODO(zhwu): Backward compatibility for the old config for managed
# job controller.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done iiuc?

tests/test_smoke.py Outdated Show resolved Hide resolved
tests/test_smoke.py Outdated Show resolved Hide resolved
tests/test_smoke.py Outdated Show resolved Hide resolved
tests/test_smoke.py Show resolved Hide resolved
@Michaelvll Michaelvll merged commit 015061e into master May 5, 2024
20 checks passed
@Michaelvll Michaelvll deleted the job-api branch May 5, 2024 01:58
@Michaelvll Michaelvll mentioned this pull request May 5, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants