In this guide, you will configure the management of Qdrant collection snapshots.
Beyond the obvious use case of backup, this can be beneficial for transferring data to a new Qdrant cluster or creating new test collections with a copy of real data.
Instant (spec.snapshots.backupNow
) and scheduled (spec.snapshots.backupSchedule
) snapshots are both possible. You will create a collection, add data to it, take a snapshot, and restore it to another collection using the spec.snapshots
parameter in the custom QdrantCollection custom resource.
Before you begin, it is necessary to create an S3 bucket and generate a pair of access/secret keys for accessing it.
Also, it is necessary to specify the correct endpoint for the S3 service (spec.snapshots.s3EndpointURL
), for example, https://storage.googleapis.com/
for GCP or https://s3.amazonaws.com/
for AWS.
The access/secret key pair should be stored in a Kubernetes secret (refer it in spec.snapshots.s3CredentialsSecretName
).
- Create a bucket with a unique name.
gcloud storage buckets create gs://unique-bucket-for-shapshots
- Create a service account to be used for accessing the bucket.
gcloud iam service-accounts create bucket-snapshots-sa \
--display-name="bucket-snapshots-sa"
- Save the email of the created service account in the variable SA_EMAIL.
export SA_EMAIL=$(gcloud iam service-accounts list \
--filter="displayName:bucket-snapshots-sa" --format='value(email)')
- Grant access to the bucket for the service account.
gcloud storage buckets add-iam-policy-binding gs://unique-bucket-for-shapshots \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/storage.objectUser"
- Generate credentials for the service account, save the access and secret keys, as they will be needed later.
gcloud storage hmac create $SA_EMAIL
You will get similar output:
kind: storage#hmacKey
metadata:
accessId: ACCESSKEY
...
secret: SECRETKEY
Save these values.
- Follow the official guide to create a bucket and an IAM user.
- Don't forget to save the access and secret keys, they will be displayed only once.
- Create a Kubernetes secret with credentials for accessing the bucket that will store snapshots.
kubectl create secret generic bucket-credentials \
--from-literal=ACCESS_KEY=YOURACCESSKEY \
--from-literal=SECRET_KEY=YOURSECRETKEY
- Create a new Qdrant cluster. In this example, we will use 3 replicas, API key authentication and TLS encryption for connections.
cat <<EOF | kubectl apply -f -
apiVersion: qdrant.operator/v1alpha1
kind: QdrantCluster
metadata:
name: my-cluster
spec:
replicas: 3
image: qdrant/qdrant:v1.7.4
apikey: 'true'
tls:
enabled: true
EOF
- Create a new source collection with sharding and replication enabled:
cat <<EOF | kubectl apply -f -
apiVersion: qdrant.operator/v1alpha1
kind: QdrantCollection
metadata:
name: source-collection
spec:
cluster: my-cluster
vectorSize: 4
shardNumber: 3
replicationFactor: 2
EOF
- Start a new client pod with API key and CA certificate mounted from corresponding Secrets:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: qdrantclient
spec:
containers:
- image: curlimages/curl
name: mycurlpod
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
env:
- name: APIKEY
valueFrom:
secretKeyRef:
name: my-cluster-apikey
key: api-key
volumeMounts:
- name: cert
readOnly: true
mountPath: "/cert/cacert.pem"
subPath: cacert.pem
volumes:
- name: cert
secret:
secretName: my-cluster-server-cert
items:
- key: cacert.pem
path: cacert.pem
EOF
- Connect to the client:
kubectl exec -it qdrantclient -- sh
- Upload some data to the source collection:
curl -L -X PUT "https://my-cluster.default:6333/collections/source-collection/points?wait=true" \
--cacert /cert/cacert.pem \
-H "api-key: ${APIKEY}" \
-H "Content-Type: application/json" \
--data-raw '{
"points": [
{"id": 1, "vector": [0.05, 0.61, 0.76, 0.74], "payload": {"city": "Berlin"}},
{"id": 2, "vector": [0.19, 0.81, 0.75, 0.11], "payload": {"city": "London"}}
]
}'
Press CTRL-D
to exit from the client pod.
- Add the
.snapshots
section to the source collection for creating an instant backup:
cat <<EOF | kubectl apply -f -
apiVersion: qdrant.operator/v1alpha1
kind: QdrantCollection
metadata:
name: source-collection
spec:
cluster: my-cluster
vectorSize: 4
shardNumber: 3
replicationFactor: 2
snapshots:
s3EndpointURL: https://storage.googleapis.com/
s3CredentialsSecretName: bucket-credentials
bucketName: unique-bucket-for-shapshots
backupNow: true
EOF
Make sure to replace these settings with your actual S3 endpoint and bucket name.
- Find newly created backup job and check the logs to ensure it was done:
kubectl get job
kubectl logs -f job.batch/source-collection-backup-355312
Added `S3` successfully.
Saved my-cluster/source-collection/2024-01-23-17-58/my-cluster-0.snapshot
Saved my-cluster/source-collection/2024-01-23-17-58/my-cluster-1.snapshot
Saved my-cluster/source-collection/2024-01-23-17-58/my-cluster-2.snapshot
`/app/my-cluster/source-collection/2024-01-23-17-58/my-cluster-0.snapshot` -> `S3/unique-bucket-for-shapshots/my-cluster/source-collection/2024-01-23-17-58/my-cluster-0.snapshot`
`/app/my-cluster/source-collection/2024-01-23-17-58/my-cluster-1.snapshot` -> `S3/unique-bucket-for-shapshots/my-cluster/source-collection/2024-01-23-17-58/my-cluster-1.snapshot`
`/app/my-cluster/source-collection/2024-01-23-17-58/my-cluster-2.snapshot` -> `S3/unique-bucket-for-shapshots/my-cluster/source-collection/2024-01-23-17-58/my-cluster-2.snapshot`
Total: 110.67 MiB, Transferred: 110.67 MiB, Speed: 2.67 MiB/s
Successfully stored "source-collection" backup in the "unique-bucket-for-shapshots" bucket.
Snapshot name is "my-cluster/source-collection/2024-01-23-17-58".
- Create a new
target
collection with the.snapshots.restoreSnapshotName
option to make a copy of the source collection:
cat <<EOF | kubectl apply -f -
apiVersion: qdrant.operator/v1alpha1
kind: QdrantCollection
metadata:
name: target-collection
spec:
cluster: my-cluster
vectorSize: 4
shardNumber: 3
replicationFactor: 2
snapshots:
s3EndpointURL: https://storage.googleapis.com/
s3CredentialsSecretName: bucket-credentials
bucketName: unique-bucket-for-shapshots
restoreSnapshotName: my-cluster/source-collection/2024-01-23-17-58
EOF
- Find the restore job and check the logs to ensure data was restores successfully:
kubectl get job
kubectl logs -f job.batch/target-collection-restore-357566
Added `S3` successfully.
`S3/unique-bucket-for-shapshots/my-cluster/source-collection/2024-01-23-17-58/my-cluster-0.snapshot` -> `my-cluster/source-collection/2024-01-23-17-58/my-cluster-0.snapshot`
`S3/unique-bucket-for-shapshots/my-cluster/source-collection/2024-01-23-17-58/my-cluster-1.snapshot` -> `my-cluster/source-collection/2024-01-23-17-58/my-cluster-1.snapshot`
`S3/unique-bucket-for-shapshots/my-cluster/source-collection/2024-01-23-17-58/my-cluster-2.snapshot` -> `my-cluster/source-collection/2024-01-23-17-58/my-cluster-2.snapshot`
Total: 110.67 MiB, Transferred: 110.67 MiB, Speed: 3.48 MiB/s
Snapshot my-cluster-0.snapshot restored successfully in time 2.408649.
Snapshot my-cluster-1.snapshot restored successfully in time 3.569833.
Snapshot my-cluster-2.snapshot restored successfully in time 1.762345.
- Connect to the client pod and run some search query over target collection to ensure data is valid:
kubectl exec -it qdrantclient -- sh
curl -L -X POST "https://my-cluster.default:6333/collections/target-collection/points/search" \
--cacert /cert/cacert.pem \
-H "api-key: ${APIKEY}" \
-H "Content-Type: application/json" \
--data-raw '{
"vector": [0.2,0.1,0.9,0.7],
"top": 1
}'
You should a similar answer:
{"result":[{"id":1,"version":0,"score":0.89463294,"payload":null,"vector":null}],"status":"ok","time":0.014474}
Press CTRL-D
to exec from the client pod.