Skip to content
ragsns edited this page Oct 6, 2021 · 8 revisions

7. Backup

K8ssandra includes Medusa for Apache Cassandra™ to handle backup and restore for your Cassandra nodes. Recently Medusa was upgraded to introduce support for all S3 compatible backends, including MinIO, the popular k8s-native object storage suite. Let’s see how to set up K8ssandra and MinIO to backup Cassandra in just a few steps.

Adjust StorageClasses for Civo Installs

✅ Step 0 (for civo installs only)

For Civo installs only, change the default StorageClass with the following command.

kubectl patch storageclass civo-volume -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Now running the following command you should be able to verify that the default class is set

✅ Step 1: Deploy MinIO

Similar to the multi-cloud capabilities of K8ssandra, MinIO can be deployed through Helm.

helm repo add minio https://helm.min.io/
helm repo update

The MinIO Helm charts allow you to do several things at once at install time:

  • Set the credentials to access MinIO
  • Create a bucket for your backups that can be set as default

You can create a k8ssandra-medusa bucket and use minio_key / minio_secret as the credentials, and deploy MinIO in a new namespace called minio by running the following command:

helm install --set accessKey=minio_key,secretKey=minio_secret,defaultBucket.enabled=true,defaultBucket.name=k8ssandra-medusa minio minio/minio -n minio --create-namespace

After the helm install command has completed, running the following command,

kubectl get all -n minio

You should see an output something similar to below in the minio namespace:

NAME                        READY   STATUS    RESTARTS   AGE
pod/minio-5fd4dd687-gzr8j   1/1     Running   0          109s

NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/minio   ClusterIP   10.96.144.61   <none>        9000/TCP   109s

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/minio   1/1     1            1           109s

NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/minio-5fd4dd687   1         1         1       109s

✅ Step 2: Deploy Medusa

Now that MinIO is up and running, you can install Medusa and create a secret to access the bucket. Create a medusa_secret.yaml file with the following content:

apiVersion: v1
kind: Secret
metadata:
 name: medusa-bucket-key
type: Opaque
stringData:
 # Note that this currently has to be set to medusa_s3_credentials!
 medusa_s3_credentials: |-
   [default]
   aws_access_key_id = minio_key
   aws_secret_access_key = minio_secret

Now apply the file:

kubectl apply -f medusa_secret.yaml

You should now see the medusa-bucket-key secret:

kubectl get secrets

and the output should be something like below.

NAME                  TYPE                                  DATA   AGE
default-token-twk5w   kubernetes.io/service-account-token   3      4m49s
medusa-bucket-key     Opaque                                1      45s

You can then deploy Medusa. Add this part to the Local and Civo installs: k8ssandra-local-civo.yaml and Datastax provided VMs: k8ssandra.yaml file and upgrade the deployment:

medusa:
  enabled: true
  storage: s3_compatible
  storage_properties:
      host: minio.minio.svc.cluster.local
      port: 9000
      secure: "False"
  bucketName: k8ssandra-medusa
  storageSecret: medusa-bucket-key

For local and Civo installs:

helm upgrade k8ssandra k8ssandra/k8ssandra -f k8ssandra-local-civo.yaml

OR

For Datastax provided VMs:

helm upgrade k8ssandra k8ssandra/k8ssandra -f k8ssandra.yaml

✅ Step 3: Create some data and back it up

Extract the username and password to access Cassandra into environment variables:

username=$(kubectl get secret k8ssandra-superuser -o jsonpath="{.data.username}" | base64 --decode)
password=$(kubectl get secret k8ssandra-superuser -o jsonpath="{.data.password}" | base64 --decode)
echo $username
echo $password

Connect through CQLSH on one of the nodes:

kubectl exec -it k8ssandra-dc1-default-sts-0 -c cassandra -- cqlsh -u $username -p $password

Create some data using the following statements by enetering the commands into the CQLSH prompt and press enter:

CREATE KEYSPACE medusa_test  WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE medusa_test;
CREATE TABLE users (email TEXT PRIMARY KEY, name TEXT, state TEXT);
INSERT INTO users (email, name, state) VALUES ('alice@example.com', 'Alice Smith', 'TX');
INSERT INTO users (email, name, state) VALUES ('bob@example.com', 'Bob Jones', 'VA');
INSERT INTO users (email, name, state) VALUES ('carol@example.com', 'Carol Jackson', 'CA');
INSERT INTO users (email, name, state) VALUES ('david@example.com', 'David Yang', 'NV');

Check that the rows were properly inserted:

SELECT * FROM medusa_test.users;

Which should output the data that was just input.

 email             | name          | state
-------------------+---------------+-------
 alice@example.com |   Alice Smith |    TX
   bob@example.com |     Bob Jones |    VA
 david@example.com |    David Yang |    NV
 carol@example.com | Carol Jackson |    CA
(4 rows)

Exit the CQL shell with the following command

exit

Now backup this data. To that end, use the following command:

helm install my-backup k8ssandra/backup --set name=backup1,cassandraDatacenter.name=dc1

Since the backup operation is asynchronous, you can monitor its completion by running the following command:

kubectl get cassandrabackup backup1 -o jsonpath={.status.finishTime}

As long as this doesn’t output a date and time, then the backup is still running. With the amount of data present and the fact that you’re using a locally accessible backend, this should complete quickly.

✅ Step 4: Deleting the data and restoring the backup

Let's enter the CQL shell again with the following command.

kubectl exec -it k8ssandra-dc1-default-sts-0 -c cassandra -- cqlsh -u $username -p $password

TRUNCATE the table and verify it is empty with the following commands.

TRUNCATE medusa_test.users;
SELECT * FROM medusa_test.users;

Which should yield the following output

 email | name | state
-------+------+-------
(0 rows)

Exit the CQL shecll with the following command

exit

Now restore the backup taken previously:

helm install restore-test k8ssandra/restore --set name=restore-backup1,backup.name=backup1,cassandraDatacenter.name=dc1

This operation will take a little longer as it requires to stop the StatefulSet pod and perform the restore as part of the init containers, before the Cassandra container can start. You can monitor progress using this command:

watch -d kubectl get cassandrarestore restore-backup1 -o jsonpath={.status} 

The restore operation is fully completed once the finishTime value appears in the output:

{"finishTime":"2021-03-23T13:58:36Z","restoreKey":"83977399-44dd-4752-b4c4-407273f0339e","startTime":"2021-03-23T13:55:35Z"}

Check that you can read the data from the previously truncated table:

kubectl exec -it k8ssandra-dc1-default-sts-0 -c cassandra -- cqlsh -u $username -p $password

and invoke the CQL command

SELECT * FROM medusa_test.users;

Should output the data before the backup as below.

 email             | name          | state
-------------------+---------------+-------
 alice@example.com |   Alice Smith |    TX
   bob@example.com |     Bob Jones |    VA
 david@example.com |    David Yang |    NV
 carol@example.com | Carol Jackson |    CA
(4 rows)

You’ve successfully restored your lost data in just a few commands!

Next Step(s)

Proceed to the Step VIII