Skip to content

Commit

Permalink
helm: document migration to v12
Browse files Browse the repository at this point in the history
  • Loading branch information
hugoShaka committed Jan 4, 2023
1 parent e67df9c commit 63c2172
Showing 1 changed file with 248 additions and 0 deletions.
248 changes: 248 additions & 0 deletions docs/pages/deploy-a-cluster/helm-deployments/migration-v12.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
---
title: Migrating to teleport-cluster v12
description: How to upgrade to teleport-cluster version 12.
---

This guide covers the major changes of the `teleport-cluster` v12 chart
and how to upgrade existing releases from version 11 to version 12.

## Changes summary

The main changes brought by the version 12 of the `teleport-cluster` chart are:

- Teleport is split in two components: auth and proxies.
Auths are deployed through a statefulset, proxies are deployed via a Deployment.
Running Teleport with this new topology allows it to be more resilient to disruptions and scale better.
- Proxies are now deployed as stateless workload. The `proxy` session recording mode uploads recordings asynchronously.
Non-uploaded records might be lost during rollouts (config changes or version upgrades for example).
`proxy-sync` ensures consistency and does not have this limitation.
- `custom` mode is removed as it was broken by the topology change.
It is replaced by a new configuration override mechanism allowing you to pass arbitrary Teleport configuration.
- the value `standalone.storage` that was previously deprecated in favor of `persistence` was removed.
- The chart can now be scaled up in `standalone` mode.
- The chart has always been versioned with Teleport but was often compatible with the previous Teleport major version.
This is not the case for v12. Using the chart v12 requires at least Teleport v12.

## How to upgrade

The upgrade path mainly depends on the `chartMode` used. If you used a "managed" mode like `aws`, `gcp` or `standalone`
it should be relatively straightforward. If you relied on the `custom` chart mode, you will have to perform
configuration changes.

In any case:
- backup the cluster content prior upgrading
- test the upgrade in a test environment


<Admonition type="warning">
During the upgrade, Kubernetes will delete existing deployments and create new ones.
**This is not seamless and will cause a downtime** until the new pods are up and all healthchecks are passing.
This usually takes around 5 minutes.
</Admonition>

### If you use `gcp`, `aws` or `standalone` mode

The upgrade should not require configuration changes. Make sure you don't rely on `standalone.storage`.

Upgrading to v12 will increase the amount of pods deployed as it will deploy auth and proxies separately.
The chart will try to deploy multiple proxy replicas when possible (proxies can be replicated if certs
are provided through a secret or cert-manager). Make sure you have enough room in your Kubernetes cluster
to run the additional Teleport pods:

- `aws` and `gcp` will deploy twice the amount of pods
- `standalone` will deploy 2 to 3 pods (depending if the proxy can be replicated)

### If you use `custom` mode

The `custom` mode worked by passing the Teleport configuration through a ConfigMap.
Due to the version 12 topology change, existing `custom` configuration won't work as-is and will
need to be split in two separate configurations: one for the proxies and one for the auths.

To avoid a surprise breaking upgrade, the `teleport-cluster` v12 chart will refuse
to deploy in `custom` mode and point you to this migration guide.

The version 12 introduced a new way to pass arbitrary configuration to Teleport without having to
write a full configuration file. If you were using `custom` mode because of a missing chart feature
(like etcd backend support for example) this might be a better fit for you than managing a fully-custom config.

#### If you only needed a couple of custom configuration bits

You can now use the existing modes `aws`, `gcp` and `standalone` and pass your custom
configuration bits through the `auth.teleportConfig` and `proxy.teleportConfig` values.
For most use-cases this is the recommended setup as you will automatically benefit
from future configuration upgrades.

For example a v11 custom configuration that looked like this:

```yaml
teleport:
log:
output: stderr
severity: INFO
auth_service:
enabled: true
cluster_name: custom.example.com
tokens: # This is custom configuration
- "proxy,node:my-secret-token"
- "trusted_cluster:my-other-secret-token"
listen_addr: 0.0.0.0:3025
public_addr: custom.example.com:3025
proxy_service:
enabled: true
listen_addr: 0.0.0.0:3080
public_addr: custom.example.com:443
ssh_public_addr: ssh-custom.example.com:3023 # This is custom configuration
```
Can be converted into those values:
```yaml
chartMode: standalone
clusterName: custom.example.com

auth:
teleportConfig:
auth_service:
tokens:
- "proxy,node:my-secret-token"
- "trusted_cluster:my-other-secret-token"

proxy:
teleportConfig:
proxy_service:
ssh_public_addr: ssh-custom.example.com:3023
```
<Admonition type="warning">
`teleport.cluster_name` and `teleport.auth_service.authentication.webauthn.rp_id` MUST NOT change.
</Admonition>

#### If you need to manage the full configuration

If you need to manage the full configuration you must use the `scratch` mode.
This mode will generate an empty configuration file and you will pass all your custom configuration
through the `auth.teleportConfig` and `proxy.teleportConfig` values.

You must split the configuration in two configurations, one for each node type:
- The `proxy` configuration should contain at least the `proxy_service` section and the `teleport` section without the `storage` part.
- The `auth` configuration should contain at least the `auth_service` section and the `teleport` one.

For the proxy nodes to join the cluster, you must provide a token in their configuration.
The chart creates a dynamic Kubernetes join token named after the release name you can use for a seamless join.

For example a v11 custom configuration that looked like this:

```yaml
version: v1
teleport:
log:
output: stderr
severity: INFO
auth_service:
enabled: true
cluster_name: custom.example.com
tokens:
- "proxy,node:my-secret-token"
- "trusted_cluster:my-other-secret-token"
listen_addr: 0.0.0.0:3025
public_addr: custom.example.com:3025
proxy_service:
enabled: true
listen_addr: 0.0.0.0:3080
public_addr: custom.example.com:443
ssh_public_addr: ssh-custom.example.com:3023
```

Can be split in two configurations and be deployed using those values:

```yaml
chartMode: scratch
proxy:
teleportConfig:
version: v1
teleport:
log:
output: stderr
severity: INFO
# You MUST insert the following block, this tells the proxies
# how to connect to the auth. The helm chart will automatically create a
# Kubernetes join token named after the Helm release name so the proxies
# can join the cluster.
join_params:
method: kubernetes
token_name: "RELEASE-NAME-proxy" # replace RELEASE-NAME by the Helm release name
auth_server: "RELEASE-NAME-auth.RELEASE-NAMESPACE.svc.cluster.local:3025" # replace RELEASE-NAME and RELEASE-NAMESPACE
proxy_service:
enabled: true
listen_addr: 0.0.0.0:3080
public_addr: custom.example.com:443
ssh_public_addr: ssh-custom.example.com:3023
auth:
teleportConfig:
version: v1
teleport:
log:
output: stderr
severity: INFO
auth_service:
enabled: true
cluster_name: custom.example.com
tokens:
- "proxy,node:my-secret-token"
- "trusted_cluster:my-other-secret-token"
listen_addr: 0.0.0.0:3025
public_addr: custom.example.com:3025
```

## Going further

The new topology allows you to replicate the proxies to increase availability.
You might also want to tune settings like Kubernetes resources or affinities.

By default, each value applies to both `proxy` and `auth` pods, e.g.:

```yaml
resources:
requests:
cpu: "1"
memory: "2GiB"
limits:
cpu: "1"
memory: "2GiB"
highAvailability:
requireAntiAffinity: true
```

But you can scope the value to a specific pod set by nesting it under the `proxy` or `auth` values.
If both the value at the root and a set-specific value are set, the specific value takes precedence:

```yaml
# By default, all pods use those resources
resources:
requests:
cpu: "1"
memory: "2GiB"
limits:
cpu: "1"
memory: "2GiB"
proxy:
# But the proxy pods have have different resource requests and no cpu limits
resources:
requests:
cpu: "0.5"
memory: "1GiB"
limits:
cpu: ~ # Generic and specific config are merged: if you want to unset a value, you must do it explicitly
memory: "1GiB"
auth:
# Only auth pods will require an anti-affinity
highAvailability:
requireAntiAffinity: true
```

0 comments on commit 63c2172

Please sign in to comment.