-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UX - SkyServe] user now is able to select a LB policy from a range of options #4061
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @AlexCuadron ! It will greatly expand our customizability. Left some discussion ;) We might also think of how to expose this feature to our end user, in our Service YAML - maybe add a load_balancing_policy
section under service
?
@cblmemo Done! PTAL again :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AlexCuadron for the awesome work! It mostly looks good to me. Left some discussions ;)
Thanks for the comments @cblmemo, fixed and ready for next round 💪 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix @AlexCuadron ! Mostly looks good to me. Left some discussions.
btw, I created a branch here and lets merge our PR to the branch first. Merging into master might need more time and we want to move fast ;)
https://github.com/skypilot-org/skypilot/tree/heterogeneous-lb
I changed the base branch and updated based on comments :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix @AlexCuadron ! Mostly looks good to me. Left some nits :))
Done! PTAL @cblmemo :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix @AlexCuadron ! Left some final nits 🚀
Co-authored-by: Tian Xia <cblmemo@gmail.com>
* [Catalog] Silently ignore TPU price not found. * assert for non tpu v6e * format
…ing (skypilot-org#4264) * fix race condition for setting job status to FAILED during INIT * Fix * fix * format * Add smoke tests * revert pending submit * remove update entirely for the job schedule step * wait for job 32 to finish * fix smoke * move and rename * Add comment * minor
…-org#4278) Set worker minimum port number
* [docs] use k8s instead of kubernetes in the CLI * fix docs build script for linux * Update docs/source/reference/kubernetes/kubernetes-getting-started.rst Co-authored-by: Romil Bhardwaj <romil.bhardwaj@gmail.com> --------- Co-authored-by: Romil Bhardwaj <romil.bhardwaj@gmail.com>
* [jobs] autodown managed job clusters If all goes correctly, the managed job controller should tear down a managed job cluster once the managed job completes. However, if the controller fails somehow (e.g. crashes, is terminated, etc), we don't want to leak resources. As a failsafe, set autodown on the job cluster. This is not foolproof, since the skylet on the cluster can also crash, but it's likely to catch many cases. * add comment about autodown duration * add leading _
* Update cloud_vm_ray_backend.py * Update cloud_vm_ray_backend.py * format
…4274) fix: multiple `job_id`
PTAL @Michaelvll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AlexCuadron ! Thanks for adding this. I tried this PR and got the following:
$ sky serve up examples/serve/load_balancing_policies_example.yaml
Service from YAML spec: examples/serve/load_balancing_policies_example.yaml
ValueError: Invalid service YAML: Found unsupported field 'load_balancing_policy'.
Should we also update the sky/utils/schemas.py
?
Co-authored-by: Tian Xia <cblmemo@gmail.com>
1. Add available policies to schema validation 2. Show available policies in error message when invalid policy is specified 3. Display load balancing policy in service spec repr when explicitly set
…policies Only 'round_robin' is currently implemented in LoadBalancingPolicy class
Move policy validation to code to avoid duplication and make it easier to maintain when adding new policies
09f3c41
to
8db4dd9
Compare
PTAL @cblmemo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix @AlexCuadron ! Left 2 final nits ;)
Co-authored-by: Tian Xia <cblmemo@gmail.com>
…mport inside function
Oops, sorry for the circular import 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix @AlexCuadron ! LGTM.
I added the option to specify different LB policies, by default, RoundRobin is used. This PR is intended to enable easy switching between LB policies to facilitate its development.
The default behaviour (without user interaction) doesn't modify the execution flow and the user is not allowed to use any other LB policy other than round-robing without them being added explicitly.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh