-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New provisioner GCP #2681
New provisioner GCP #2681
Conversation
91c00ad
to
71ce79c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @suquark! It looks great! Left several comments.
I just tried the following and it seems working great:
sky launch -c test-gcp --cloud gcp
(2m5s)time sky launch -y -c test-gcp-32 --cloud gcp --num-nodes 32 --cpus 2
(3m24.091s with the two threadpool comments applied)
I found some issues:
- When I ctrl-c a
sky down
for the 32-node cluster above, and try tosky down
again, sometimes the following error occurs:
File "/home/gcpuser/skypilot/sky/provision/gcp/instance.py", line 430, in terminate_instances
_wait_for_operations(operations, project_id, zone)
File "/home/gcpuser/skypilot/sky/provision/gcp/instance.py", line 87, in _wait_for_operations
if handler.wait_for_operation(operation, project_id, zone):
File "/home/gcpuser/skypilot/sky/provision/gcp/instance_utils.py", line 414, in wait_for_operation
raise Exception(result['error'])
Exception: {'errors': [{'code': 'RESOURCE_NOT_FOUND', 'message': "The resource 'projects/skypilot-375900/zones/us-central1-a/instances/test-gcp-new-32-2-2514-worker-6a81764d-compute' was not found"}]}
Thanks for the comment @concretevitamin ! I just updated the logging to the cleaner version as suggested. It is a bit tricky to keep the same partially dimmed version. I suppose it should be fine to keep the whole line dimmed. |
Tested (b2bcbe8):
|
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
…t into new_provisioner_gcp
Tested:
|
Draft implementation for new provisioner for GCP.
For this PR, we only support non-tpuvm instances. We will support TPUVM in a later PR.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh