Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX] Rephrase service initialization timeout #3176

Merged
merged 5 commits into from
Feb 18, 2024

Conversation

cblmemo
Copy link
Collaborator

@cblmemo cblmemo commented Feb 17, 2024

The service initialization timeout is confusing with initial_delay_seconds. This PR rephrased to failed to register service to avoid such confusion.

$ sky serve up @temp/serve.yaml 
Service from YAML spec: @temp/serve.yaml
Service Spec:
Readiness probe method:           GET /
Readiness initial delay seconds:  1200
Replica autoscaling policy:       Fixed 1 replica        
Each replica will use the following resources (estimated):
I 02-17 07:55:20 optimizer.py:691] == Optimizer ==
I 02-17 07:55:20 optimizer.py:702] Target: minimizing cost
I 02-17 07:55:20 optimizer.py:714] Estimated cost: $0.4 / hour
I 02-17 07:55:20 optimizer.py:714] 
I 02-17 07:55:20 optimizer.py:837] Considered resources (1 node):
I 02-17 07:55:20 optimizer.py:907] ------------------------------------------------------------------------------------------------
I 02-17 07:55:20 optimizer.py:907]  CLOUD   INSTANCE          vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE     COST ($)   CHOSEN   
I 02-17 07:55:20 optimizer.py:907] ------------------------------------------------------------------------------------------------
I 02-17 07:55:20 optimizer.py:907]  AWS     m6i.2xlarge       8       32        -              us-east-1       0.38          ✔     
I 02-17 07:55:20 optimizer.py:907]  Azure   Standard_D8s_v5   8       32        -              eastus          0.38                
I 02-17 07:55:20 optimizer.py:907]  GCP     n2-standard-8     8       32        -              us-central1-a   0.39                
I 02-17 07:55:20 optimizer.py:907] ------------------------------------------------------------------------------------------------
I 02-17 07:55:20 optimizer.py:907] 
Launching a new service 'sky-service-8a75'. Proceed? [Y/n]: 
Launching controller for 'sky-service-8a75'...
I 02-17 07:55:30 cloud_vm_ray_backend.py:1421] To view detailed progress: tail -n100 -f /home/txia/sky_logs/sky-2024-02-17-07-55-21-715729/provision.log
I 02-17 07:55:30 cloud_vm_ray_backend.py:1332] Cluster 'sky-serve-controller-4a0782e9' (status: INIT) was previously launched in AWS us-east-1. Relaunching in that region.
I 02-17 07:55:31 provisioner.py:79] Launching on AWS us-east-1 (us-east-1a)
I 02-17 07:55:49 provisioner.py:454] Successfully provisioned or found existing instance.
I 02-17 07:56:15 provisioner.py:556] Successfully provisioned cluster: sky-serve-controller-4a0782e9
I 02-17 07:56:17 cloud_vm_ray_backend.py:4469] Processing file mounts.
I 02-17 07:56:17 cloud_vm_ray_backend.py:4501] To view detailed progress: tail -n100 -f ~/sky_logs/sky-2024-02-17-07-55-21-715729/file_mounts.log
I 02-17 07:56:17 backend_utils.py:1288] Syncing (to 1 node): /tmp/service-task-sky-service-8a75-jj4ckwaz -> ~/.sky/serve/sky_service_8a75/task.yaml.tmp
I 02-17 07:56:20 backend_utils.py:1288] Syncing (to 1 node): /tmp/tmp4hzxbl51 -> ~/.sky/serve/sky_service_8a75/config.yaml
I 02-17 07:56:22 cloud_vm_ray_backend.py:3209] Running setup on 1 node.
I 02-17 07:56:33 cloud_vm_ray_backend.py:3222] Setup completed.
I 02-17 07:56:42 cloud_vm_ray_backend.py:3319] Job submitted with Job ID: 1

E 02-17 07:57:45 subprocess_utils.py:84] ValueError: Failed to register service 'sky-service-8a75' on the SkyServe controller. Reason:
E 02-17 07:57:45 subprocess_utils.py:84] ValueError: No enabled clouds support opening ports. To fix: do not specify resources.ports, or enable a cloud that does support this feature.
E 02-17 07:57:45 subprocess_utils.py:84] Please try again later.
E 02-17 07:57:45 subprocess_utils.py:84] 
RuntimeError: Failed to spin up the service. Please check the logs above for more details.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

sky/serve/serve_utils.py Outdated Show resolved Hide resolved
sky/serve/serve_utils.py Outdated Show resolved Hide resolved
cblmemo and others added 4 commits February 18, 2024 07:30
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
…-org/skypilot into refactor-service-initialization
@cblmemo cblmemo merged commit bd64e18 into master Feb 18, 2024
19 checks passed
@cblmemo cblmemo deleted the refactor-service-initialization branch February 18, 2024 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants