Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upgrade steps, instructions for 2023.9.1 #2029

Merged
merged 20 commits into from
Oct 2, 2023
Merged

Conversation

iameskild
Copy link
Member

Reference Issues or PRs

What does this implement/fix?

Supersedes #2021 (@kenafoster I couldn't push to your remote branch or open PR against that branch)

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

I tested these upgrade commands (and redeployment) on GCP and Azure. @kenafoster did you happen to test the upgrade command / redeployment on AWS?

Any other comments?

@iameskild iameskild added area: user experience 👩🏻‍💻 impact: high 🟥 This issue affects most of the nebari users or is a critical issue project: JATIC Work item needed for the JATIC project labels Sep 25, 2023
@iameskild iameskild changed the title Gcp k8s upgrade Add upgrade steps, instructions for 2023.9.1 Sep 25, 2023
@iameskild iameskild mentioned this pull request Sep 25, 2023
6 tasks
@iameskild iameskild added this to the Release 2023.9.1 milestone Sep 25, 2023
@kenafoster
Copy link
Contributor

Tested upgrade on AWS - looks good

Going from 2023.5.1 to 2023.9.1, the cluster was destroyed and rebuilt (@iameskild addressed this with more warnings in the upgrade step to 2023.7.1, where the cluster destroy occurs)
Going from 2023.7.1 to 2023.9.1, worked. I upgraded K8S from 1.24 to 1.26 (note in AWS you must manually upgrade node pools after upgrading the control plane). After setting 1.26 in nebari config, the 2023.9.1 deploy ran successfully. CDS dashboards were present in my original deployment, and the upgrade command removed the config then the deploy removed them from the app

@@ -8,7 +8,7 @@
# 04-kubernetes-ingress
DEFAULT_TRAEFIK_IMAGE_TAG = "2.9.1"

HIGHEST_SUPPORTED_K8S_VERSION = ("1", "26", "7")
HIGHEST_SUPPORTED_K8S_VERSION = ("1", "26", "9")
Copy link
Member

@fangchenli fangchenli Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. How is this determined? Have we tested newer versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this new version on all of the cloud providers and I bumped it so we could support DOKS version 1.26.9-do.0 since this appears to be only version of kubernetes 1.26 available on DO.

src/_nebari/upgrade.py Outdated Show resolved Hide resolved
src/_nebari/upgrade.py Outdated Show resolved Hide resolved
src/_nebari/upgrade.py Show resolved Hide resolved
src/_nebari/upgrade.py Outdated Show resolved Hide resolved
src/_nebari/upgrade.py Outdated Show resolved Hide resolved
src/_nebari/upgrade.py Show resolved Hide resolved
src/_nebari/upgrade.py Show resolved Hide resolved
Comment on lines +320 to +338
"""Return the major.minor version of the k8s version string."""

k8s_version = str(k8s_version)
# Split the input string by the first decimal point
parts = k8s_version.split(".", 1)

if len(parts) == 2:
# Extract the part before the second decimal point
before_second_decimal = parts[0] + "." + parts[1].split(".")[0]
try:
# Convert the extracted part to a float
result = float(before_second_decimal)
return result
except ValueError:
# Handle the case where the conversion to float fails
return None
else:
# Handle the case where there is no second decimal point
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use packaging for this:

>>> from packaging import version
>>> version.parse('2.3.4')
<Version('2.3.4')>


>>> version.parse('2.3.4') > version.parse('2.3.1')
True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DO versions aren't just numeric (example - 1.18.19-do.0) and I get back packaging.version.InvalidVersion when trying to use this library for them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a slug, not the actual k8s versions, you can get explicit kubernetes versions too, e.g.:

$ doctl kubernetes options versions

Slug            Kubernetes Version    Supported Features
1.28.2-do.0     1.28.2                cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.27.6-do.0     1.27.6                cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.26.9-do.0     1.26.9                cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.25.14-do.0    1.25.14               cluster-autoscaler, docr-integration, ha-control-plane, token-authentication

Alternatively you can split by - and use the first part.

src/_nebari/utils.py Show resolved Hide resolved
@@ -2,7 +2,7 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "4.83.0"
version = "4.8.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to revert to what we had in our previous release because when I tried to redeploy with a newer Kubernetes version, it complained about the following although many of those fields are indeed set:

Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings'] must be specified., badRequest

Otherwise, there doesn't seem away around this unless you delete the node groups and then remove them from the Terraform state which is an accident prone task...

@@ -57,10 +57,6 @@ resource "google_container_cluster" "main" {
}
}

cost_management_config {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because not supported with GCP Terraform provider version 4.8.0 (see other comment for reason why).

@@ -22,6 +27,8 @@
)
ARGO_JUPYTER_SCHEDULER_REPO = "https://github.com/nebari-dev/argo-jupyter-scheduler"

UPGRADE_KUBERNETES_MESSAGE = "Please see the [green][link=https://www.nebari.dev/docs/how-tos/kubernetes-version-upgrade]Kubernetes upgrade docs[/link][/green] for more information."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created these docs to walk folks through the Kubernetes upgrade process: nebari-dev/nebari-docs#367

I would like to test this for Digital Ocean, I just haven't had the time yet...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this on Digital Ocean.

Comment on lines +428 to +431
@pytest.mark.skipif(
_nebari.upgrade.__version__ < "2023.9.1",
reason="This test is only valid for versions <= 2023.9.1",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get these tests to run, we need to create a new tag 2023.9.1. You can do this locally to confirm they pass.

"-> The Kubernetes version is multiple minor versions behind the minimum required version. You will need to perform the upgrade one minor version at a time. For example, if your current version is 1.24, you will need to upgrade to 1.25, and then 1.26."
)
rich.print(
f"-> Update the value of [green]{provider_config_block}.kubernetes_version[/green] in your config file to a newer version of Kubernetes and redeploy."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that redeploying with a Kubernetes version one minor higher than the one the user is running will work for GCP and Azure. @kenafoster can you confirm that this will also work as advertised on AWS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it works on AWS. However, if they do it multiple versions, they'll need to upgrade the node pools manually, and that part must be done outside Nebari

@@ -8,7 +8,7 @@
# 04-kubernetes-ingress
DEFAULT_TRAEFIK_IMAGE_TAG = "2.9.1"

HIGHEST_SUPPORTED_K8S_VERSION = ("1", "26", "7")
HIGHEST_SUPPORTED_K8S_VERSION = ("1", "26", "9")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this new version on all of the cloud providers and I bumped it so we could support DOKS version 1.26.9-do.0 since this appears to be only version of kubernetes 1.26 available on DO.

src/_nebari/upgrade.py Show resolved Hide resolved
@iameskild iameskild requested a review from aktech September 29, 2023 03:33
Copy link
Member

@aktech aktech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong objection, that should stop this PR from getting merged, apart from some minor nitpicks. I am happy to get this in as long as tests pass.

thanks for all the work @kenafoster @iameskild

@iameskild iameskild merged commit 056b420 into develop Oct 2, 2023
25 checks passed
@iameskild iameskild deleted the gcp_k8s_upgrade branch October 2, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: user experience 👩🏻‍💻 impact: high 🟥 This issue affects most of the nebari users or is a critical issue project: JATIC Work item needed for the JATIC project
Projects
Development

Successfully merging this pull request may close these issues.

4 participants