Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework snapshot docs #349

Merged
merged 1 commit into from
Oct 31, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
297 changes: 217 additions & 80 deletions docs/cli/etcd-snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,97 +4,49 @@ title: etcd-snapshot

# k3s etcd-snapshot

:::info Version Gate

Available as of [v1.19.1+k3s1](https://github.com/k3s-io/k3s/releases/tag/v1.19.1%2Bk3s1)

:::

In this section, you'll learn how to create backups of the K3s embedded etcd datastore, and to restore the cluster from backup.

#### Creating Snapshots

Snapshots are enabled by default, at 00:00 and 12:00 system time, with 5 snapshots retained. To configure the snapshot interval or the number of retained snapshots, refer to the [options](#options).

The snapshot directory defaults to `${data-dir}/server/db/snapshots`. The data-dir value defaults to `/var/lib/rancher/k3s` and can be changed by setting the `--data-dir` flag.

#### Restoring a Cluster from a Snapshot

When K3s is restored from backup, the old data directory will be moved to `${data-dir}/server/db/etcd-old/`. Then K3s will attempt to restore the snapshot by creating a new data directory, then starting etcd with a new K3s cluster with one etcd member.

To restore the cluster from backup:

<Tabs queryString="etcdsnap">
<TabItem value="Single Server">

Run K3s with the `--cluster-reset` option, with the `--cluster-reset-restore-path` also given:

```bash
k3s server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
```
This page documents the management of etcd snapshots using the `k3s etcd-snapshot` CLI, as well as configuration of etcd scheduled snapshots for the `k3s server` process, and use of the `k3s server --cluster-reset` command to reset etcd cluster membership and optionally restore etcd snapshots.

**Result:** A message in the logs says that K3s can be restarted without the flags. Start k3s again and should run successfully and be restored from the specified snapshot.
## Creating Snapshots

</TabItem>
Snapshots are saved to the path set by the server's `--etcd-snapshot-dir` value, which defaults to `${data-dir}/server/db/snapshots`. The data-dir value defaults to `/var/lib/rancher/k3s` and can be changed independently by setting the `--data-dir` flag.

<TabItem value="High Availability">
### Scheduled Snapshots

In this example there are 3 servers, `S1`, `S2`, and `S3`. The snapshot is located on `S1`.
Scheduled snapshots are enabled by default, at 00:00 and 12:00 system time, with 5 snapshots retained. To configure the snapshot interval or the number of retained snapshots, refer to the [snapshot configuration options](#snapshot-configuration-options).

1. On S1, start K3s with the `--cluster-reset` option, with the `--cluster-reset-restore-path` also given:
Scheduled snapshots have a name that starts with `etcd-snapshot`, followed by the node name and timestamp. The base name can be changed with the `--etcd-snapshot-name` flag in the server configuration.

```bash
k3s server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
```
### On-demand Snapshots

**Result:** A message in the logs says that K3s can be restarted without the flags.
Snapshots can be saved manually by running the `k3s etcd-snapshot save` command.

2. On S2 and S3, stop K3s. Then delete the data directory, `/var/lib/rancher/k3s/server/db/`:
On-demand snapshots have a name that starts with `on-demand`, followed by the node name and timestamp. The base name can be changed with the `--name` flag when saving the snapshot.

```bash
systemctl stop k3s
rm -rf /var/lib/rancher/k3s/server/db/
```
### Snapshot Configuration Options

3. On S1, start K3s again:
These flags can be passed to the `k3s server` command to reset the etcd cluster, and optionally restore from a snapshot.

```bash
systemctl start k3s
```

4. On S2 and S3, start K3s again to join the restored cluster:

```bash
systemctl start k3s
```

</TabItem>
</Tabs>

#### Options
| Flag | Description |
| ----------- | --------------- |
| `--cluster-reset`| Forget all peers and become sole member of a new cluster. This can also be set with the environment variable `[$K3S_CLUSTER_RESET]` |
| `--cluster-reset-restore-path` | Path to snapshot file to be restored |

These options can be passed in with the command line, or in the [configuration file,](../installation/configuration.md#configuration-file ) which may be easier to use.
These flags are valid for both `k3s server` and `k3s etcd-snapshot`, however when passed to `k3s etcd-snapshot` the `--etcd-` prefix can be omitted to avoid redundancy.
Flags can be passed in with the command line, or in the [configuration file,](../installation/configuration.md#configuration-file ) which may be easier to use.

| Options | Description |
| Flag | Description |
| ----------- | --------------- |
| `--etcd-disable-snapshots` | Disable automatic etcd snapshots |
| `--etcd-snapshot-schedule-cron` value | Snapshot interval time in cron spec. eg. every 5 hours `0 */5 * * *`(default: `0 */12 * * *`) |
| `--etcd-snapshot-retention` value | Number of snapshots to retain (default: 5) |
| `--etcd-snapshot-dir` value | Directory to save db snapshots. (Default location: `${data-dir}/db/snapshots`) |
| `--cluster-reset` | Forget all peers and become sole member of a new cluster. This can also be set with the environment variable `[$K3S_CLUSTER_RESET]`.
| `--cluster-reset-restore-path` value | Path to snapshot file to be restored

#### S3 Compatible API Support
| `--etcd-disable-snapshots` | Disable scheduled snapshots |
| `--etcd-snapshot-compress` | Compress etcd snapshots |
| `--etcd-snapshot-dir` | Directory to save db snapshots. (Default location: `${data-dir}/db/snapshots`) |
| `--etcd-snapshot-retention` | Number of snapshots to retain (default: 5) |
| `--etcd-snapshot-schedule-cron` | Snapshot interval time in cron spec. eg. every 5 hours `0 */5 * * *` (default: `0 */12 * * *`) |

K3s supports writing etcd snapshots to and restoring etcd snapshots from systems with S3-compatible APIs. S3 support is available for both on-demand and scheduled snapshots.
### S3 Compatible Object Store Support

The arguments below have been added to the `server` subcommand. These flags exist for the `etcd-snapshot` subcommand as well however the `--etcd-s3` portion is removed to avoid redundancy.
K3s supports writing etcd snapshots to and restoring etcd snapshots from S3-compatible object stores. S3 support is available for both on-demand and scheduled snapshots.

| Options | Description |
| Flag | Description |
| ----------- | --------------- |
| `--etcd-s3` | Enable backup to S3 |
| `--etcd-s3-endpoint` | S3 endpoint url |
Expand All @@ -105,6 +57,10 @@ The arguments below have been added to the `server` subcommand. These flags exis
| `--etcd-s3-bucket` | S3 bucket name |
| `--etcd-s3-region` | S3 region / bucket location (optional). defaults to us-east-1 |
| `--etcd-s3-folder` | S3 folder |
| `--etcd-s3-proxy` | Proxy server to use when connecting to S3, overriding any proxy-releated environment variables |
| `--etcd-s3-insecure` | Disables S3 over HTTPS |
| `--etcd-s3-timeout` | S3 timeout (default: `5m0s`) |
| `--etcd-s3-config-secret` | Name of secret in the kube-system namespace used to configure S3, if etcd-s3 is enabled and no other etcd-s3 options are set |

To perform an on-demand etcd snapshot and save it to S3:

Expand All @@ -129,7 +85,51 @@ k3s server \
--etcd-s3-secret-key=<S3-SECRET-KEY>
```

#### Etcd Snapshot and Restore Subcommands
### S3 Configuration Secret Support

:::info Version Gate
S3 Configuration Secret support is available as of the August 2024 releases: v1.30.4+k3s1, v1.29.8+k3s1, v1.28.13+k3s1
:::

K3s supports reading etcd S3 snapshot configuration from a Kubernetes Secret.
This may be preferred to hardcoding credentials in K3s CLI flags or config files for security reasons, or if credentials need to be rotated without restarting K3s.
To pass S3 snapshot configuration via a Secret, start K3s with `--etcd-s3` and `--etcd-s3-config-secret=<SECRET-NAME>`.
The Secret does not need to exist when K3s is started, but it will be checked for every time a snapshot save/list/delete/prune operation is performed.

The S3 config Secret cannot be used when restoring a snapshot, as the apiserver is not available to provide the secret during a restore.
S3 configuration must be passed via the CLI when restoring a snapshot stored on S3.

:::note
Pass only the the `--etcd-s3` and `--etcd-s3-config-secret` flags to enable the Secret.
If any other S3 configuration flags are set, the Secret will be ignored.
:::

Keys in the Secret correspond to the `--etcd-s3-*` CLI flags listed above.
The `etcd-s3-endpoint-ca` key accepts a PEM-encoded CA bundle, or the `etcd-s3-endpoint-ca-name` key may be used to specify the name of a ConfigMap in the `kube-system` namespace containing one or more PEM-encoded CA bundles.

```yaml
apiVersion: v1
kind: Secret
metadata:
name: k3s-etcd-snapshot-s3-config
namespace: kube-system
type: etcd.k3s.cattle.io/s3-config-secret
stringData:
etcd-s3-endpoint: ""
etcd-s3-endpoint-ca: ""
etcd-s3-endpoint-ca-name: ""
etcd-s3-skip-ssl-verify: "false"
etcd-s3-access-key: "AWS_ACCESS_KEY_ID"
etcd-s3-secret-key: "AWS_SECRET_ACCESS_KEY"
etcd-s3-bucket: "bucket"
etcd-s3-folder: "folder"
etcd-s3-region: "us-east-1"
etcd-s3-insecure: "false"
etcd-s3-timeout: "5m"
etcd-s3-proxy: ""
```

## Managing Snapshots

k3s supports a set of subcommands for working with your etcd snapshots.

Expand All @@ -140,13 +140,9 @@ k3s supports a set of subcommands for working with your etcd snapshots.
| prune | Remove snapshots that exceed the configured retention count |
| save | Trigger an immediate etcd snapshot |

:::note
The `save` subcommand is the same as `k3s etcd-snapshot`. The latter will eventually be deprecated in favor of the former.
:::

These commands will perform as expected whether the etcd snapshots are stored locally or in an S3 compatible object store.

For additional information on the etcd snapshot subcommands, run `k3s etcd-snapshot`.
For additional information on the etcd snapshot subcommands, run `k3s etcd-snapshot --help`.

Delete a snapshot from S3.

Expand All @@ -168,3 +164,144 @@ k3s etcd-snapshot prune
```bash
k3s etcd-snapshot prune --snapshot-retention 10
```

### ETCDSnapshotFile Custom Resources

:::info Version Gate
ETCDSnapshotFiles are available as of the November 2023 releases: v1.28.4+k3s2, v1.27.8+k3s2, v1.26.11+k3s2, v1.25.16+k3s4
:::

Snapshots can be viewed remotely using any Kubernetes client by listing or describing cluster-scoped `ETCDSnapshotFile` resources.
Unlike the `k3s etcd-snapshot list` command, which only shows snapshots visible to that node, `ETCDSnapshotFile` resources track all snapshots present on cluster members.

```console
root@k3s-server-1:~# kubectl get etcdsnapshotfile
NAME SNAPSHOTNAME NODE LOCATION SIZE CREATIONTIME
local-on-demand-k3s-server-1-1730308816-3e9290 on-demand-k3s-server-1-1730308816 k3s-server-1 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-1-1730308816 2891808 2024-10-30T17:20:16Z
s3-on-demand-k3s-server-1-1730308816-79b15c on-demand-k3s-server-1-1730308816 s3 s3://etcd/k3s-test/on-demand-k3s-server-1-1730308816 2891808 2024-10-30T17:20:16Z
```

```console
root@k3s-server-1:~# kubectl describe etcdsnapshotfile s3-on-demand-k3s-server-1-1730308816-79b15c
Name: s3-on-demand-k3s-server-1-1730308816-79b15c
Namespace:
Labels: etcd.k3s.cattle.io/snapshot-storage-node=s3
Annotations: etcd.k3s.cattle.io/snapshot-token-hash: b4b83cda3099
API Version: k3s.cattle.io/v1
Kind: ETCDSnapshotFile
Metadata:
Creation Timestamp: 2024-10-30T17:20:16Z
Finalizers:
wrangler.cattle.io/managed-etcd-snapshots-controller
Generation: 1
Resource Version: 790
UID: bec9a51c-dbbe-4746-922e-a5136bef53fc
Spec:
Location: s3://etcd/k3s-test/on-demand-k3s-server-1-1730308816
Node Name: s3
s3:
Bucket: etcd
Endpoint: s3.example.com
Prefix: k3s-test
Region: us-east-1
Skip SSL Verify: true
Snapshot Name: on-demand-k3s-server-1-1730308816
Status:
Creation Time: 2024-10-30T17:20:16Z
Ready To Use: true
Size: 2891808
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ETCDSnapshotCreated 113s k3s-supervisor Snapshot on-demand-k3s-server-1-1730308816 saved on S3
```


## Restoring Snapshots

K3s runs through several steps when restoring a snapshot:
1. If the snapshot is stored on S3, the file is downloaded into the snapshot directory.
2. If the snapshot is compressed, it is decompressed.
3. If present, the current etcd database files are moved to `${data-dir}/server/db/etcd-old-$TIMESTAMP/`.
4. The snapshot's contents are extracted out to disk, and the checksum is verified.
5. Etcd is started, and all etcd cluster members except the current node are removed from the cluster.
6. CA Certificates and other confidential data are extracted from the datastore and written to disk, for later use.
7. The restore is complete, and K3s can be restarted and used normally on the server where the restore was performed.
8. (optional) Agents and control-plane servers can be started normally.
8. (optional) Etcd servers can be restarted to rejoin to the cluster after removing old database files.

### Snapshot Restore Steps

Select the tab below that matches your cluster configuration.

<Tabs queryString="etcdsnap">
<TabItem value="Single Server">

1. Stop the K3s service:
```bash
systemctl stop k3s
```

2. Run `k3s server` with the `--cluster-reset` flag, and `--cluster-reset-restore-path` indicating the path to the snapshot to restore.
If the snapshot is stored on S3, provide S3 configuration flags (`--etcd-s3`, `--etcd-s3-bucket`, and so on), and give only the filename name of the snapshot as the restore path.

:::note
Using the `--cluster-reset` flag without specifying a snapshot to restore simply resets the etcd cluster to a single member without restoring a snapshot.
:::

```bash
k3s server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
```

**Result:** K3s restores the snapshot and resets cluster membership, then prints a message indicating that it is ready to be restarted:
`Managed etcd cluster membership has been reset, restart without --cluster-reset flag now.`

3. Start K3s again:
```bash
systemctl start k3s
```
</TabItem>
<TabItem value="Multiple Servers">

In this example there are 3 servers, `S1`, `S2`, and `S3`. The snapshot is located on `S1`.

1. Stop K3s on all servers:
```bash
systemctl stop k3s
```

2. On S1, run `k3s server` with the `--cluster-reset` option, and `--cluster-reset-restore-path` indicating the path to the snapshot to restore.
If the snapshot is stored on S3, provide S3 configuration flags (`--etcd-s3`, `--etcd-s3-bucket`, and so on), and give only the filename name of the snapshot as the restore path.

:::note
Using the `--cluster-reset` flag without specifying a snapshot to restore simply resets the etcd cluster to a single member without restoring a snapshot.
:::

```bash
k3s server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
```

**Result:** K3s restores the snapshot and resets cluster membership, then prints a message indicating that it is ready to be restarted:
`Managed etcd cluster membership has been reset, restart without --cluster-reset flag now.`
`Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes.`

3. On S1, start K3s again:
```bash
systemctl start k3s
```

4. On S2 and S3, delete the data directory, `/var/lib/rancher/k3s/server/db/`:
```bash
rm -rf /var/lib/rancher/k3s/server/db/
```

5. On S2 and S3, start K3s again to join the restored cluster:
```bash
systemctl start k3s
```
</TabItem>
</Tabs>
Loading