Skip to content

Commit

Permalink
Merge pull request #968 from schlichtanders/postinstall_exec2
Browse files Browse the repository at this point in the history
adding postinstall_exec to hook restore commands
  • Loading branch information
mysticaltech authored Sep 8, 2023
2 parents 0e309ad + 52d5d87 commit 214d822
Show file tree
Hide file tree
Showing 4 changed files with 138 additions and 7 deletions.
122 changes: 122 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,128 @@ This can be helpful when you setup a mixed-architecture cluster, and there are m
</details>
<details>
<summary>Backup and restore a cluster</summary>
K3s allows for automated etcd backups to S3. Etcd is the default storage backend on kube-hetzner, even for a single control plane cluster, hence this should work for all cluster deployments.
**For backup do:**
1. Fill the kube.tf config `etcd_s3_backup`, it will trigger a regular automated backup to S3.
2. Add the k3s_token as an output to your kube.tf
```tf
output "k3s_token" {
value = module.kube-hetzner.k3s_token
sensitive = true
}
```
3. Make sure you can access the k3s_token via `terraform output k3s_token`.
**For restoration do:**
1. Before cluster creation, add the following to your kube.tf. Replace the local variables to match your values.
```tf
locals {
# ...
k3s_token = var.k3s_token # this is secret information, hence it is passed as an environment variable
# to get the corresponding etcd_version for a k3s version you need to
# - start k3s or have it running
# - run `curl -L --cacert /var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key /var/lib/rancher/k3s/server/tls/etcd/server-client.key https://127.0.0.1:2379/version`
# for details see https://gist.github.com/superseb/0c06164eef5a097c66e810fe91a9d408
etcd_version = "v3.5.9"
etcd_snapshot_name = "name-of-the-snapshot(no-path,just-the-name)"
etcd_s3_endpoint = "your-s3-endpoint(without-https://)"
etcd_s3_bucket = "your-s3-bucket"
etcd_s3_access_key = "your-s3-access-key"
etcd_s3_secret_key = var.etcd_s3_secret_key # this is secret information, hence it is passed as an environment variable
# ...
}
variable "k3s_token" {
sensitive = true
type = string
}
variable "etcd_s3_secret_key" {
sensitive = true
type = string
}
module "kube-hetzner" {
# ...
k3s_token = local.k3s_token
# ...
postinstall_exec = [
(
local.etcd_snapshot_name == "" ? "" :
<<-EOF
export CLUSTERINIT=$(cat /etc/rancher/k3s/config.yaml | grep -i '"cluster-init": true')
if [ -n "$CLUSTERINIT" ]; then
echo indeed this is the first control plane node > /tmp/restorenotes
k3s server \
--cluster-reset \
--etcd-s3 \
--cluster-reset-restore-path=${local.etcd_snapshot_name} \
--etcd-s3-endpoint=${local.etcd_s3_endpoint} \
--etcd-s3-bucket=${local.etcd_s3_bucket} \
--etcd-s3-access-key=${local.etcd_s3_access_key} \
--etcd-s3-secret-key=${local.etcd_s3_secret_key}
# renaming the k3s.yaml because it is used as a trigger for further downstream
# changes. Better to let `k3s server` create it as expected.
mv /etc/rancher/k3s/k3s.yaml /etc/rancher/k3s/k3s.backup.yaml
# download etcd/etcdctl for adapting the kubernetes config before starting k3s
ETCD_VER=${local.etcd_version}
case "$(uname -m)" in
aarch64) ETCD_ARCH="arm64" ;;
x86_64) ETCD_ARCH="amd64" ;;
esac;
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
rm -f /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz
curl -L $DOWNLOAD_URL/$ETCD_VER/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -o /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz
tar xzvf /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -C /usr/local/bin --strip-components=1
rm -f /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz
etcd --version
etcdctl version
# delete traefik service so that no load-balancer is accidently changed
nohup etcd --data-dir /var/lib/rancher/k3s/server/db/etcd &
echo $! > save_pid.txt
etcdctl del /registry/services/specs/traefik/traefik
etcdctl del /registry/services/endpoints/traefik/traefik
kill -9 `cat save_pid.txt`
rm save_pid.txt
else
echo this is not the first control plane node > /tmp/restorenotes
fi
EOF
)
]
# ...
}
```
2. Set the following sensible environment variables
- `export TF_VAR_k3s_token="..."` (Be careful, this token is like an admin password to the entire cluster. You need to use the same k3s_token which you saved when creating the backup.)
- `export etcd_s3_secret_key="..."`
3. Create the cluster as usual. You can also change the cluster-name and deploy it next to the original backuped cluster.
Awesome! You restored a whole cluster from a backup.
</details>
## Debugging
First and foremost, it depends, but it's always good to have a quick look into Hetzner quickly without logging in to the UI. That is where the `hcloud` cli comes in.
Expand Down
6 changes: 4 additions & 2 deletions locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ locals {
["timeout 180s /bin/sh -c 'while ! ping -c 1 ${var.address_for_connectivity_test} >/dev/null 2>&1; do echo \"Ready for k3s installation, waiting for a successful connection to the internet...\"; sleep 5; done; echo Connected'"]
)

common_post_install_k3s_commands = var.postinstall_exec

kustomization_backup_yaml = yamlencode({
apiVersion = "kustomize.config.k8s.io/v1beta1"
kind = "Kustomization"
Expand Down Expand Up @@ -102,10 +104,10 @@ locals {

install_k3s_server = concat(local.common_pre_install_k3s_commands, [
"curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_SKIP_SELINUX_RPM=true INSTALL_K3S_CHANNEL=${var.initial_k3s_channel} INSTALL_K3S_EXEC='server ${var.k3s_exec_server_args}' sh -"
], local.apply_k3s_selinux)
], local.apply_k3s_selinux, local.common_post_install_k3s_commands)
install_k3s_agent = concat(local.common_pre_install_k3s_commands, [
"curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true INSTALL_K3S_SKIP_SELINUX_RPM=true INSTALL_K3S_CHANNEL=${var.initial_k3s_channel} INSTALL_K3S_EXEC='agent ${var.k3s_exec_agent_args}' sh -"
], local.apply_k3s_selinux)
], local.apply_k3s_selinux, local.common_post_install_k3s_commands)

control_plane_nodes = merge([
for pool_index, nodepool_obj in var.control_plane_nodepools : {
Expand Down
10 changes: 5 additions & 5 deletions modules/host/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,11 @@ resource "hcloud_server" "server" {
EOT
]
}
provisioner "remote-exec" {
inline = [
"cloud-init status --wait"
]
}
# provisioner "remote-exec" {
# inline = [
# "cloud-init status --wait"
# ]
# }

}

Expand Down
7 changes: 7 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -706,6 +706,13 @@ variable "preinstall_exec" {
description = "Additional to execute before the install calls, for example fetching and installing certs."
}

variable "postinstall_exec" {
type = list(string)
default = []
description = "Additional to execute after the install calls, for example restoring a backup."
}


variable "extra_kustomize_deployment_commands" {
type = string
default = ""
Expand Down

0 comments on commit 214d822

Please sign in to comment.