-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCP][Disk] Add supported instance type check for disk tier ultra #3935
Conversation
# Ultra disk tier (pd-extreme) only support m2, m3 and part of n2 | ||
# instance types, so we failover to lower tiers for other instance | ||
# types. Reference: | ||
# https://cloud.google.com/compute/docs/disks/extreme-persistent-disk#machine_shape_support # pylint: disable=line-too-long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we cannot use ultra disk tier with high end GPUs, such as A100, L4 or H100? In that case, this ultra disk sounds a bit useless here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.. This is a quick fix to unblock the blog post first. We should think of better solution here. cc @Conless here
return r.disk_tier | ||
# Failover disk tier from high to low. | ||
all_tiers = list(reversed(resources_utils.DiskTier)) | ||
start_index = all_tiers.index(GCP._translate_disk_tier(r.disk_tier)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this _translate_disk_tier
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert best
and None
to corresponding tier
@@ -916,6 +931,36 @@ def _check_instance_type_accelerators_combination( | |||
resources.instance_type, resources.accelerators, resources.zone, | |||
'gcp') | |||
|
|||
@classmethod | |||
def check_disk_tier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems not publicly exposed? Should we keep it private, or just merge into check_disk_tier_enabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Converted it into private. merge into check_disk_tier_enabled
is a little bit tricky as we want a function to return a boolean value, instead of raising an error, which is the expected behaviour of check_disk_tier_enabled
, across all clouds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I found a bug that requires to expose this function. Just changed back to public.
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix @cblmemo! I found the logging for sky launch --disk-tier ultra --cpus 2 --cloud gcp
super confusing, as it does not actually show the reason why we cannot launch the instance on GCP.
sky launch --cpus 2 --cloud gcp --disk-tier ultra
I 09-13 01:05:04 optimizer.py:1301] No resource satisfying GCP(cpus=2, disk_tier=ultra) on GCP.
sky.exceptions.ResourcesUnavailableError: Catalog does not contain any instances satisfying the request:
Task<name=sky-cmd>(run=<empty>)
resources: GCP(cpus=2, disk_tier=ultra).
To fix: relax or change the resource requirements.
Hint: sky show-gpus to list available accelerators.
sky check to check the enabled clouds.
Can we file an issue for that?
Done! #3943 |
https://cloud.google.com/compute/docs/disks/extreme-persistent-disk#machine_shape_support
On GCP, disk tier ultra only supports handful of instance type. This PR adds the check and fixes #3933.
From the documentation, it seems like we could still attach the pd-extreme disk to some instance type but will get a degraded performance. We should benchmark and see how severe the degradation is, as the instance type that officially supports the ultra tier is too small (no GPU instance types is officially supported).
Tested (run the relevant ones):
bash format.sh
sky launch --cloud gcp --disk-tier best --gpus L4
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh