Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upgrade steps, instructions for 2023.9.1 #2029

Merged
merged 20 commits into from
Oct 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion src/_nebari/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,3 @@ def backup_configuration(filename: pathlib.Path, extrasuffix: str = ""):
i = i + 1

filename.rename(backup_filename)
print(f"Backing up {filename} as {backup_filename}")
2 changes: 1 addition & 1 deletion src/_nebari/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# 04-kubernetes-ingress
DEFAULT_TRAEFIK_IMAGE_TAG = "2.9.1"

HIGHEST_SUPPORTED_K8S_VERSION = ("1", "26", "7")
HIGHEST_SUPPORTED_K8S_VERSION = ("1", "26", "9")
Copy link
Member

@fangchenli fangchenli Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. How is this determined? Have we tested newer versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this new version on all of the cloud providers and I bumped it so we could support DOKS version 1.26.9-do.0 since this appears to be only version of kubernetes 1.26 available on DO.

DEFAULT_GKE_RELEASE_CHANNEL = "UNSPECIFIED"

DEFAULT_NEBARI_DASK_VERSION = CURRENT_RELEASE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,6 @@ resource "google_container_cluster" "main" {
}
}

cost_management_config {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because not supported with GCP Terraform provider version 4.8.0 (see other comment for reason why).

enabled = true
}

lifecycle {
ignore_changes = [
node_locations
Expand Down
2 changes: 1 addition & 1 deletion src/_nebari/stages/infrastructure/template/gcp/versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "4.83.0"
version = "4.8.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to revert to what we had in our previous release because when I tried to redeploy with a newer Kubernetes version, it complained about the following although many of those fields are indeed set:

Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings'] must be specified., badRequest

Otherwise, there doesn't seem away around this unless you delete the node groups and then remove them from the Terraform state which is an accident prone task...

aktech marked this conversation as resolved.
Show resolved Hide resolved
}
}
required_version = ">= 1.0"
Expand Down
137 changes: 130 additions & 7 deletions src/_nebari/upgrade.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,14 @@
from rich.prompt import Prompt

from _nebari.config import backup_configuration
from _nebari.utils import load_yaml, yaml
from _nebari.utils import (
get_k8s_version_prefix,
get_provider_config_block_name,
load_yaml,
yaml,
)
from _nebari.version import __version__, rounded_ver_parse
from nebari import schema
from nebari.schema import ProviderEnum, is_version_accepted

logger = logging.getLogger(__name__)

Expand All @@ -22,6 +27,9 @@
)
ARGO_JUPYTER_SCHEDULER_REPO = "https://github.com/nebari-dev/argo-jupyter-scheduler"

UPGRADE_KUBERNETES_MESSAGE = "Please see the [green][link=https://www.nebari.dev/docs/how-tos/kubernetes-version-upgrade]Kubernetes upgrade docs[/link][/green] for more information."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created these docs to walk folks through the Kubernetes upgrade process: nebari-dev/nebari-docs#367

I would like to test this for Digital Ocean, I just haven't had the time yet...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this on Digital Ocean.

DESTRUCTIVE_UPGRADE_WARNING = "-> This version upgrade will result in your cluster being completely torn down and redeployed. Please ensure you have backed up any data you wish to keep before proceeding!!!"


def do_upgrade(config_filename, attempt_fixes=False):
config = load_yaml(config_filename)
Expand All @@ -40,15 +48,14 @@ def do_upgrade(config_filename, attempt_fixes=False):
)
return
except (ValidationError, ValueError) as e:
if schema.is_version_accepted(config.get("nebari_version", "")):
if is_version_accepted(config.get("nebari_version", "")):
# There is an unrelated validation problem
rich.print(
f"Your config file [purple]{config_filename}[/purple] appears to be already up-to-date for Nebari version [green]{__version__}[/green] but there is another validation error.\n"
)
raise e

start_version = config.get("nebari_version", "")
print("start_version: ", start_version)

UpgradeStep.upgrade(
config, start_version, __version__, config_filename, attempt_fixes
Expand Down Expand Up @@ -98,7 +105,6 @@ def upgrade(
"""
starting_ver = rounded_ver_parse(start_version or "0.0.0")
finish_ver = rounded_ver_parse(finish_version)
print("finish_ver: ", finish_ver)

if finish_ver < starting_ver:
raise ValueError(
Expand All @@ -116,8 +122,6 @@ def upgrade(
key=rounded_ver_parse,
)

print("step_versions: ", step_versions)

current_start_version = start_version
for stepcls in [cls._steps[str(v)] for v in step_versions]:
step = stepcls()
Expand Down Expand Up @@ -484,6 +488,24 @@ def _version_specific_upgrade(
return config


class Upgrade_2023_7_1(UpgradeStep):
version = "2023.7.1"

def _version_specific_upgrade(
self, config, start_version, config_filename: Path, *args, **kwargs
):
provider = config["provider"]
if provider == ProviderEnum.aws.value:
rich.print("\n ⚠️ DANGER ⚠️")
rich.print(
DESTRUCTIVE_UPGRADE_WARNING,
"The 'prevent_deploy' flag has been set in your config file and must be manually removed to deploy.",
)
config["prevent_deploy"] = True

return config


class Upgrade_2023_7_2(UpgradeStep):
version = "2023.7.2"

Expand All @@ -507,6 +529,107 @@ def _version_specific_upgrade(
return config


class Upgrade_2023_9_1(UpgradeStep):
version = "2023.9.1"
# JupyterHub Helm chart 2.0.0 (app version 3.0.0) requires K8S Version >=1.23. (reference: https://z2jh.jupyter.org/en/stable/)
# This released has been tested against 1.26
min_k8s_version = 1.26

def _version_specific_upgrade(
self, config, start_version, config_filename: Path, *args, **kwargs
):
# Upgrading to 2023.9.1 is considered high-risk because it includes a major refacto
# to introduce the extension mechanism system.
rich.print("\n ⚠️ Warning ⚠️")
rich.print(
f"-> Nebari version [green]{self.version}[/green] includes a major refactor to introduce an extension mechanism that supports the development of third-party plugins."
)
rich.print(
"-> Data should be backed up before performing this upgrade ([green][link=https://www.nebari.dev/docs/how-tos/manual-backup]see docs[/link][/green]) The 'prevent_deploy' flag has been set in your config file and must be manually removed to deploy."
)
rich.print(
"-> Please also run the [green]rm -rf stages[/green] so that we can regenerate an updated set of Terraform scripts for your deployment."
)

# Setting the following flag will prevent deployment and display guidance to the user
# which they can override if they are happy they understand the situation.
config["prevent_deploy"] = True

# Nebari version 2023.9.1 upgrades JupyterHub to 3.1. CDS Dashboards are only compatible with
# JupyterHub versions 1.X and so will be removed during upgrade.
rich.print("\n ⚠️ Deprecation Warning ⚠️")
rich.print(
f"-> CDS dashboards are no longer supported in Nebari version [green]{self.version}[/green] and will be uninstalled."
aktech marked this conversation as resolved.
Show resolved Hide resolved
)
if config.get("cdsdashboards"):
rich.print("-> Removing cdsdashboards from config file.")
del config["cdsdashboards"]

# Kubernetes version check
# JupyterHub Helm chart 2.0.0 (app version 3.0.0) requires K8S Version >=1.23. (reference: https://z2jh.jupyter.org/en/stable/)

provider = config["provider"]
provider_config_block = get_provider_config_block_name(provider)

# Get current Kubernetes version if available in config.
current_version = config.get(provider_config_block, {}).get(
"kubernetes_version", None
)

# Convert to decimal prefix
if provider in ["aws", "azure", "gcp", "do"]:
current_version = get_k8s_version_prefix(current_version)

# Try to convert known Kubernetes versions to float.
if current_version is not None:
try:
current_version = float(current_version)
except ValueError:
current_version = None

# Handle checks for when Kubernetes version should be detectable
if provider in ["aws", "azure", "gcp", "do"]:
# Kubernetes version not found in provider block
if current_version is None:
rich.print("\n ⚠️ Warning ⚠️")
rich.print(
f"-> Unable to detect Kubernetes version for provider {provider}. Nebari version [green]{self.version}[/green] requires Kubernetes version {str(self.min_k8s_version)}. Please confirm your Kubernetes version is configured before upgrading."
)
iameskild marked this conversation as resolved.
Show resolved Hide resolved

# Kubernetes version less than required minimum
if (
isinstance(current_version, float)
and current_version < self.min_k8s_version
):
rich.print("\n ⚠️ Warning ⚠️")
rich.print(
f"-> Nebari version [green]{self.version}[/green] requires Kubernetes version {str(self.min_k8s_version)}. Your configured Kubernetes version is [red]{current_version}[/red]. {UPGRADE_KUBERNETES_MESSAGE}"
)
version_diff = round(self.min_k8s_version - current_version, 2)
if version_diff > 0.01:
rich.print(
"-> The Kubernetes version is multiple minor versions behind the minimum required version. You will need to perform the upgrade one minor version at a time. For example, if your current version is 1.24, you will need to upgrade to 1.25, and then 1.26."
aktech marked this conversation as resolved.
Show resolved Hide resolved
)
rich.print(
f"-> Update the value of [green]{provider_config_block}.kubernetes_version[/green] in your config file to a newer version of Kubernetes and redeploy."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that redeploying with a Kubernetes version one minor higher than the one the user is running will work for GCP and Azure. @kenafoster can you confirm that this will also work as advertised on AWS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it works on AWS. However, if they do it multiple versions, they'll need to upgrade the node pools manually, and that part must be done outside Nebari

)

else:
rich.print("\n ⚠️ Warning ⚠️")
rich.print(
f"-> Unable to detect Kubernetes version for provider {provider}. Nebari version [green]{self.version}[/green] requires Kubernetes version {str(self.min_k8s_version)} or greater."
)
rich.print(
"-> Please ensure your Kubernetes version is up-to-date before proceeding."
)

if provider == "aws":
rich.print("\n ⚠️ DANGER ⚠️")
rich.print(DESTRUCTIVE_UPGRADE_WARNING)

return config


__rounded_version__ = ".".join([str(c) for c in rounded_ver_parse(__version__)])

# Manually-added upgrade steps must go above this line
Expand Down
36 changes: 36 additions & 0 deletions src/_nebari/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,3 +314,39 @@ def construct_azure_resource_group_name(
if base_resource_group_name:
return f"{base_resource_group_name}{suffix}"
return f"{project_name}-{namespace}{suffix}"


def get_k8s_version_prefix(k8s_version: str) -> str:
"""Return the major.minor version of the k8s version string."""

k8s_version = str(k8s_version)
# Split the input string by the first decimal point
parts = k8s_version.split(".", 1)

if len(parts) == 2:
# Extract the part before the second decimal point
before_second_decimal = parts[0] + "." + parts[1].split(".")[0]
try:
# Convert the extracted part to a float
result = float(before_second_decimal)
return result
except ValueError:
# Handle the case where the conversion to float fails
return None
else:
# Handle the case where there is no second decimal point
return None
Comment on lines +320 to +338
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use packaging for this:

>>> from packaging import version
>>> version.parse('2.3.4')
<Version('2.3.4')>


>>> version.parse('2.3.4') > version.parse('2.3.1')
True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DO versions aren't just numeric (example - 1.18.19-do.0) and I get back packaging.version.InvalidVersion when trying to use this library for them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a slug, not the actual k8s versions, you can get explicit kubernetes versions too, e.g.:

$ doctl kubernetes options versions

Slug            Kubernetes Version    Supported Features
1.28.2-do.0     1.28.2                cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.27.6-do.0     1.27.6                cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.26.9-do.0     1.26.9                cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.25.14-do.0    1.25.14               cluster-autoscaler, docr-integration, ha-control-plane, token-authentication

Alternatively you can split by - and use the first part.



def get_provider_config_block_name(provider):
PROVIDER_CONFIG_NAMES = {
"aws": "amazon_web_services",
"azure": "azure",
"do": "digital_ocean",
"gcp": "google_cloud_platform",
}
aktech marked this conversation as resolved.
Show resolved Hide resolved

if provider in PROVIDER_CONFIG_NAMES.keys():
return PROVIDER_CONFIG_NAMES[provider]
else:
return provider
Loading
Loading