Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Allow purging controller for identity mismatch #3094

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

Closes #3061. Allows purging of serve controller if cluster identity mismatch happens (common on k8s).

Still shows the prompt and requires user to enter 'delete'.

Blocked on #3093 and #3043 for full functionality on k8s.

$ sky down sky-serve-controller-2ea485ea --purge
sky.exceptions.ClusterOwnerIdentityMismatchError: 'sky-serve-controller-2ea485ea' (Kubernetes) is owned by account ['gke_skypilot-375900_us-central1-c_gkeusc2_gke_skypilot-375900_us-central1-c_gkeusc2-skypilot-sa_default'], but the activated account is ['gke_skypilot-375900_us-central1-c_gkeusc2_gke_skypilot-375900_us-central1-c_gkeusc2_default'].
Since --purge is set, errors will be ignored and controller will be removed from local state.
To proceed, please type 'delete':
  • Code formatting: bash format.sh
  • Any manual or new tests for this PR - tested manually with repro steps from the corresponding issues.

@romilbhardwaj romilbhardwaj marked this pull request as ready for review February 5, 2024 20:44
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @romilbhardwaj! LGTM.

@romilbhardwaj
Copy link
Collaborator Author

Thanks @Michaelvll! Will merge after doing an e2e test when #3043 is merged to make sure it works for k8s.

romilbhardwaj added a commit that referenced this pull request Feb 15, 2024
@romilbhardwaj romilbhardwaj merged commit d8e7d1d into master Feb 15, 2024
19 checks passed
@romilbhardwaj romilbhardwaj deleted the purge_controller branch February 15, 2024 23:15
romilbhardwaj added a commit that referenced this pull request May 8, 2024
* playing around

* wip with hacks

* wip refactor get_endpoints

* working get_endpoints

* wip

* fixed circular import

* Working for ingress and loadbalancer svc

* lint

* add purging from #3094

* Use local catalog on the controller too

* use externalip if available

* add dshm_size_limit

* optimize dependency installation

* Add todo

* optimize ingress

* fix

* fix

* remove autostop timing

* Fix URLs for raw IP:ports

* fixes

* wip

* SA wip

* Allow use of service accounts through remote_identity field

* Make purge work for no clusters in kubeconfig

* Handle ingress namespace not present

* setup optimizations and critical SA key fix

* fix docs

* fix docs

* Add support for skypilot.co/external-ip annotation for ingress

* Remove dshm_size_limit

* Undo kind changes

* Update service account docs

* minor docs

* update comment

* is_same_cloud to cloud_in_list

* refactor query_ports to use head_ip

* autodown + http prefixing in callers

* fix ssh key issues when user hash is reused

* linting

* lint

* lint, HOST_CONTROLLERS

* add serve smoke tests for k8s

* disallow file_mounts and workdir if no storage cloud is enabled

* minor

* lint

* update fastchat to use --host 127.0.0.1

* extend timeout

* docs comments

* rename to port

* add to core.py

* docstrs

* add docs on exec based auth

* expand elif

* add lb comment

* refactor

* refactor

* fix docs build

* add PODIP mode support

* make ssh services optional

* nits

* Revert "make ssh services optional"

This reverts commit 87d4d25.

* Revert "add PODIP mode support"

This reverts commit 750d4d4.

* nits

* use 0.0.0.0 when on k8s; use common impl for other clouds

* return dict instead of raising errors in core.endpoints()

* lint

* merge fixes

* merge fixes

* merge fixes

* lint

* fix smoke tests

* fix smoke tests

* comment

* add enum for remote identity

* lint

* add skip_status_check

* remove zone requirement

* fix timings for test

* silence curl download

* move jq from yaml to test_minimal

* move jq from yaml to test_minimal
romilbhardwaj added a commit that referenced this pull request May 8, 2024
* playing around

* wip with hacks

* wip refactor get_endpoints

* working get_endpoints

* wip

* fixed circular import

* Working for ingress and loadbalancer svc

* lint

* add purging from #3094

* Use local catalog on the controller too

* use externalip if available

* add dshm_size_limit

* optimize dependency installation

* Add todo

* optimize ingress

* fix

* fix

* remove autostop timing

* Fix URLs for raw IP:ports

* fixes

* wip

* SA wip

* Allow use of service accounts through remote_identity field

* Make purge work for no clusters in kubeconfig

* Handle ingress namespace not present

* setup optimizations and critical SA key fix

* fix docs

* fix docs

* Add support for skypilot.co/external-ip annotation for ingress

* Remove dshm_size_limit

* Undo kind changes

* Update service account docs

* minor docs

* update comment

* is_same_cloud to cloud_in_list

* refactor query_ports to use head_ip

* autodown + http prefixing in callers

* fix ssh key issues when user hash is reused

* linting

* lint

* lint, HOST_CONTROLLERS

* add serve smoke tests for k8s

* disallow file_mounts and workdir if no storage cloud is enabled

* minor

* lint

* update fastchat to use --host 127.0.0.1

* extend timeout

* docs comments

* rename to port

* add to core.py

* docstrs

* add docs on exec based auth

* expand elif

* add lb comment

* refactor

* refactor

* fix docs build

* add PODIP mode support

* make ssh services optional

* nits

* Revert "make ssh services optional"

This reverts commit 87d4d25.

* Revert "add PODIP mode support"

This reverts commit 750d4d4.

* nits

* use 0.0.0.0 when on k8s; use common impl for other clouds

* return dict instead of raising errors in core.endpoints()

* lint

* merge fixes

* merge fixes

* merge fixes

* lint

* fix smoke tests

* fix smoke tests

* comment

* add enum for remote identity

* lint

* disable autostop for kubernetes

* add skip_status_check

* remove zone requirement

* fix timings for test

* silence curl download

* move jq from yaml to test_minimal

* move jq from yaml to test_minimal

* add assert

* lint

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] --purge is not respected for downing Serve controller
2 participants