helm: document migration to v12

gravitational · Jan 4, 2023 · 63c2172 · 63c2172
1 parent e67df9c
commit 63c2172
Showing 1 changed file with 248 additions and 0 deletions.
diff --git a/docs/pages/deploy-a-cluster/helm-deployments/migration-v12.mdx b/docs/pages/deploy-a-cluster/helm-deployments/migration-v12.mdx
@@ -0,0 +1,248 @@
+---
+title: Migrating to teleport-cluster v12
+description: How to upgrade to teleport-cluster version 12.
+---
+
+This guide covers the major changes of the `teleport-cluster` v12 chart
+and how to upgrade existing releases from version 11 to version 12.
+
+## Changes summary
+
+The main changes brought by the version 12 of the `teleport-cluster` chart are:
+
+- Teleport is split in two components: auth and proxies.
+  Auths are deployed through a statefulset, proxies are deployed via a Deployment.
+  Running Teleport with this new topology allows it to be more resilient to disruptions and scale better.
+- Proxies are now deployed as stateless workload. The `proxy` session recording mode uploads recordings asynchronously.
+  Non-uploaded records might be lost during rollouts (config changes or version upgrades for example).
+  `proxy-sync` ensures consistency and does not have this limitation.
+- `custom` mode is removed as it was broken by the topology change.
+  It is replaced by a new configuration override mechanism allowing you to pass arbitrary Teleport configuration.
+- the value `standalone.storage` that was previously deprecated in favor of `persistence` was removed.
+- The chart can now be scaled up in `standalone` mode.
+- The chart has always been versioned with Teleport but was often compatible with the previous Teleport major version.
+  This is not the case for v12. Using the chart v12 requires at least Teleport v12.
+
+## How to upgrade
+
+The upgrade path mainly depends on the `chartMode` used. If you used a "managed" mode like `aws`, `gcp` or `standalone`
+it should be relatively straightforward. If you relied on the `custom` chart mode, you will have to perform
+configuration changes.
+
+In any case:
+- backup the cluster content prior upgrading
+- test the upgrade in a test environment
+
+
+<Admonition type="warning">
+  During the upgrade, Kubernetes will delete existing deployments and create new ones.
+  **This is not seamless and will cause a downtime** until the new pods are up and all healthchecks are passing.
+  This usually takes around 5 minutes.
+</Admonition>
+
+### If you use `gcp`, `aws` or `standalone` mode
+
+The upgrade should not require configuration changes. Make sure you don't rely on `standalone.storage`.
+
+Upgrading to v12 will increase the amount of pods deployed as it will deploy auth and proxies separately.
+The chart will try to deploy multiple proxy replicas when possible (proxies can be replicated if certs
+are provided through a secret or cert-manager). Make sure you have enough room in your Kubernetes cluster
+to run the additional Teleport pods:
+
+- `aws` and `gcp` will deploy twice the amount of pods
+- `standalone` will deploy 2 to 3 pods (depending if the proxy can be replicated)
+
+### If you use `custom` mode
+
+The `custom` mode worked by passing the Teleport configuration through a ConfigMap.
+Due to the version 12 topology change, existing `custom` configuration won't work as-is and will
+need to be split in two separate configurations: one for the proxies and one for the auths.
+
+To avoid a surprise breaking upgrade, the `teleport-cluster` v12 chart will refuse
+to deploy in `custom` mode and point you to this migration guide.
+
+The version 12 introduced a new way to pass arbitrary configuration to Teleport without having to
+write a full configuration file. If you were using `custom` mode because of a missing chart feature
+(like etcd backend support for example) this might be a better fit for you than managing a fully-custom config.
+
+#### If you only needed a couple of custom configuration bits
+
+You can now use the existing modes `aws`, `gcp` and `standalone` and pass your custom
+configuration bits through the `auth.teleportConfig` and `proxy.teleportConfig` values.
+For most use-cases this is the recommended setup as you will automatically benefit
+from future configuration upgrades.
+
+For example a v11 custom configuration that looked like this:
+
+```yaml
+teleport:
+  log:
+    output: stderr
+    severity: INFO
+auth_service:
+  enabled: true
+  cluster_name: custom.example.com
+  tokens:                                        # This is custom configuration
+  - "proxy,node:my-secret-token"
+  - "trusted_cluster:my-other-secret-token"
+  listen_addr: 0.0.0.0:3025
+  public_addr: custom.example.com:3025
+proxy_service:
+  enabled: true
+  listen_addr: 0.0.0.0:3080
+  public_addr: custom.example.com:443
+  ssh_public_addr: ssh-custom.example.com:3023   # This is custom configuration
+```
+
+Can be converted into those values:
+
+```yaml
+chartMode: standalone
+clusterName: custom.example.com
+
+auth:
+  teleportConfig:
+    auth_service:
+      tokens:
+        - "proxy,node:my-secret-token"
+        - "trusted_cluster:my-other-secret-token"
+
+proxy:
+  teleportConfig:
+    proxy_service:
+      ssh_public_addr: ssh-custom.example.com:3023
+```
+
+<Admonition type="warning">
+  `teleport.cluster_name` and `teleport.auth_service.authentication.webauthn.rp_id` MUST NOT change.
+</Admonition>
+
+#### If you need to manage the full configuration
+
+If you need to manage the full configuration you must use the `scratch` mode.
+This mode will generate an empty configuration file and you will pass all your custom configuration
+through the `auth.teleportConfig` and `proxy.teleportConfig` values.
+
+You must split the configuration in two configurations, one for each node type:
+- The `proxy` configuration should contain at least the `proxy_service` section and the `teleport` section without the `storage` part.
+- The `auth` configuration should contain at least the `auth_service` section and the `teleport` one.
+
+For the proxy nodes to join the cluster, you must provide a token in their configuration.
+The chart creates a dynamic Kubernetes join token named after the release name you can use for a seamless join.
+
+For example a v11 custom configuration that looked like this:
+
+```yaml
+version: v1
+teleport:
+  log:
+    output: stderr
+    severity: INFO
+auth_service:
+  enabled: true
+  cluster_name: custom.example.com
+  tokens:
+  - "proxy,node:my-secret-token"
+  - "trusted_cluster:my-other-secret-token"
+  listen_addr: 0.0.0.0:3025
+  public_addr: custom.example.com:3025
+proxy_service:
+  enabled: true
+  listen_addr: 0.0.0.0:3080
+  public_addr: custom.example.com:443
+  ssh_public_addr: ssh-custom.example.com:3023
+```
+
+Can be split in two configurations and be deployed using those values:
+
+```yaml
+chartMode: scratch
+
+proxy:
+  teleportConfig:
+    version: v1
+    teleport:
+      log:
+        output: stderr
+        severity: INFO
+
+      # You MUST insert the following block, this tells the proxies
+      # how to connect to the auth. The helm chart will automatically create a
+      # Kubernetes join token named after the Helm release name so the proxies
+      # can join the cluster.
+      join_params:
+        method: kubernetes
+        token_name: "RELEASE-NAME-proxy"  # replace RELEASE-NAME by the Helm release name
+      auth_server: "RELEASE-NAME-auth.RELEASE-NAMESPACE.svc.cluster.local:3025"  # replace RELEASE-NAME and RELEASE-NAMESPACE
+
+    proxy_service:
+      enabled: true
+      listen_addr: 0.0.0.0:3080
+      public_addr: custom.example.com:443
+      ssh_public_addr: ssh-custom.example.com:3023
+
+auth:
+  teleportConfig:
+    version: v1
+    teleport:
+      log:
+        output: stderr
+        severity: INFO
+    auth_service:
+      enabled: true
+      cluster_name: custom.example.com
+      tokens:
+      - "proxy,node:my-secret-token"
+      - "trusted_cluster:my-other-secret-token"
+      listen_addr: 0.0.0.0:3025
+      public_addr: custom.example.com:3025
+```
+
+## Going further
+
+The new topology allows you to replicate the proxies to increase availability.
+You might also want to tune settings like Kubernetes resources or affinities.
+
+By default, each value applies to both `proxy` and `auth` pods, e.g.:
+
+```yaml
+resources:
+  requests:
+    cpu: "1"
+    memory: "2GiB"
+  limits:
+    cpu: "1"
+    memory: "2GiB"
+
+highAvailability:
+  requireAntiAffinity: true
+```
+
+But you can scope the value to a specific pod set by nesting it under the `proxy` or `auth` values.
+If both the value at the root and a set-specific value are set, the specific value takes precedence:
+
+```yaml
+# By default, all pods use those resources
+resources:
+  requests:
+    cpu: "1"
+    memory: "2GiB"
+  limits:
+    cpu: "1"
+    memory: "2GiB"
+
+proxy:
+  # But the proxy pods have have different resource requests and no cpu limits
+  resources:
+    requests:
+      cpu: "0.5"
+      memory: "1GiB"
+    limits:
+      cpu: ~  # Generic and specific config are merged: if you want to unset a value, you must do it explicitly
+      memory: "1GiB"
+
+auth:
+  # Only auth pods will require an anti-affinity
+  highAvailability:
+    requireAntiAffinity: true
+```