Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: New LKE clusters get stuck provisioning when HA price is undefined #9558

Merged

Conversation

jdamore-linode
Copy link
Contributor

Description 📝

This fixes an issue where creating an LKE Cluster in environments where REACT_APP_LKE_HIGH_AVAILABILITY_PRICE is undefined results in a cluster with nodes stuck in "provisioning" state, and the only way to get them un-stuck is to enable high availability, which is irreversible. The issue is fixed by passing false for control_plane.high_availability in cases where the environment variable is undefined.

I don't think this is strictly a Cloud Manager issue because the API does not respond with a 400, and the API docs state that control_plane.high_availability is treated as false by default. Regardless, explicitly passing false results in clusters that provision successfully, and leaving it absent seems to result in clusters that get stuck.

Major Changes 🔄

  • Pass false to control_plane.high_availability explicitly when the HA control plane prompt is not present

How to test 🧪

To Reproduce the Issue

  1. Check out develop, remove the REACT_APP_LKE_HIGH_AVAILABILITY_PRICE environment variable from your .env file if necessary, and build Cloud Manager
  2. Create an LKE cluster, observe that you're redirected to the cluster's details page
  3. Wait ~20 minutes and confirm that the Cluster's nodes are still in "provisioning" state

To Verify These Changes

  1. Check out this branch, remove the REACT_APP_LKE_HIGH_AVAILABILITY_PRICE environment variable from your .env file if necessary, and build Cloud Manager
  2. Create an LKE cluster, observe that you're redirected to the cluster's details page
  3. Confirm that the cluster finishes provisioning after a few minutes
  4. Re-add the REACT_APP_LKE_HIGH_AVAILABILITY_PRICE environment variable, create an LKE cluster with HA enabled and an LKE cluster with HA disabled, and confirm that both clusters finish provisioning after a few minutes

@jdamore-linode
Copy link
Contributor Author

I don't think we need to worry about fitting this into the 1.100 release since this issue does not impact cloud.linode.com.

@bnussman-akamai bnussman-akamai added Add'tl Approval Needed Waiting on another approval! and removed Ready for Review labels Aug 17, 2023
@TylerWJ
Copy link
Contributor

TylerWJ commented Aug 22, 2023

It took about 12 minutes to create a HA cluster with 3 shared 2 GB instances

@bnussman-akamai bnussman-akamai added Approved Multiple approvals and ready to merge! and removed Add'tl Approval Needed Waiting on another approval! labels Aug 22, 2023
@jdamore-linode jdamore-linode merged commit 76fc65e into linode:develop Sep 5, 2023
corya-akamai pushed a commit to corya-akamai/manager that referenced this pull request Sep 6, 2023
…ed (linode#9558)

* Pass `false` for `control_plane.high_availability` when creating LKE cluster when no HA price is defined

* Added changeset: Fix stuck LKE node pools when HA Control Plane is unavailable

---------

Co-authored-by: mjac0bs <mjacobs@akamai.com>
abailly-akamai pushed a commit that referenced this pull request Sep 7, 2023
…ed (#9558)

* Pass `false` for `control_plane.high_availability` when creating LKE cluster when no HA price is defined

* Added changeset: Fix stuck LKE node pools when HA Control Plane is unavailable

---------

Co-authored-by: mjac0bs <mjacobs@akamai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved Multiple approvals and ready to merge!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants