Milvus uses MinIO or S3 as object storage to persist large-scale files, such as index files and binary logs. This topic introduces how to configure object storage dependencies when you install Milvus with Milvus Operator.
This topic assumes that you have deployed Milvus Operator.
See Deploy Milvus Operator for more information.
You need to specify a configuration file for using Milvus Operator to start a Milvus.
kubectl apply -f https://raw.githubusercontent.com/zilliztech/milvus-operator/main/config/samples/demo.yaml
You only need to edit the code template in demo.yaml
to configure third-party dependencies. The following sections introduce how to configure object storage.
Milvus supports object storage deployed external or in-cluster. By default, Milvus Operator deploy an in-cluster MinIO for milvus. You can change its configuration or use an external object storage through the spec.dependencies.storage
field in the Milvus
CRD. Let's take the demo instance as example:
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-release
labels:
app: milvus
spec:
# Omit other fields ...
dependencies:
# Omit other fields ...
storage:
inCluster:
values:
mode: standalone
resources:
requests:
memory: 100Mi
deletionPolicy: Delete # Delete | Retain, default: Retain
pvcDeletion: true # default: false
It configures the in-cluster MinIO to run in standalone mode, and set the memory limit to 100Mi
. The deletionPolicy
field specifies the deletion policy of the in-cluster MinIO. The pvcDeletion
field specifies whether to delete the PVC(Persistent Volume Claim) when the in-cluster MinIO is deleted.
The fields under inCluster.values
are the same as the values in its Helm Chart, the complete configuration fields can be found in (https://github.com/zilliztech/milvus-helm/blob/master/charts/minio/values.yaml)
You can set the
deletionPolicy
toRetain
before delete Milvus instance if you want to start the milvus later without removing the dependency service. Or you can setdeletionPolicy
toDelete
and thepvcDeletion
tofalse
to only keep your data volume (PVC).
Milvus supports any S3 compatible service as external object storage, like: external deployed MinIO, AWS S3, Google Cloud Storage(GCS), Azure Blob Storage, etc.
To use an external object storage, you need to properly set fields under spec.dependencies.storage
& spec.config.minio
in the Milvus
CRD.
Let's take a look at the AWS S3 example
An S3 bucket can usually be accessed by a pair of Access Key and Secret Key. You can create a secret to store them in your kubernetes:
# # change the <parameters> to match your environment
apiVersion: v1
kind: Secret
metadata:
name: my-release-s3-secret
type: Opaque
stringData:
accesskey: <my-access-key>
secretkey: <my-secret-key>
Now configure your Milvus instance to use the S3 bucket
# # change the <parameters> to match your environment
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-release
labels:
app: milvus
spec:
# Omit other fields ...
config:
minio:
# your bucket name
bucketName: <my-bucket>
# Optional, config the prefix of the bucket milvus will use
rootPath: milvus/my-release
useSSL: true
dependencies:
storage:
# enable external object storage
external: true
type: S3 # MinIO | S3
# the endpoint of AWS S3
endpoint: s3.amazonaws.com:443
# the secret storing the access key and secret key
secretRef: "my-release-s3-secret"
Accessing AWS S3 with fixed Ak/Sk is not secure enough. You can AssumeRole to access S3 with temporary credentials if you're using AWS EKS as your kubernetes cluster.
Suppose you have prepared a role to access AWS S3, you need to get its Arn <my-role-arn>
(It's usually in the pattern of arn:aws:iam::<your account id>:role/<role-name>
).
First create a ServiceAccount
for Milvus to assume the role:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-release-sa
annotations:
eks.amazonaws.com/role-arn: <my-role-arn>
Then configure your Milvus instance to use the above ServiceAccount
and enable AssumeRole by setting spec.config.minio.useIAM
to true
. And you need to use AWS S3 regional endpoint instead of the global endpoint.
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-release
labels:
app: milvus
spec:
# Omit other fields ...
components:
# use the above ServiceAccount
serviceAccountName: my-release-sa
config:
minio:
# enable AssumeRole
useIAM: true
# Omit other fields ...
dependencies:
storage:
# Omit other fields ...
# Note: you must use regional endpoint here, otherwise the minio client that milvus uses will fail to connect
endpoint: s3.<my-bucket-region>.amazonaws.com:443
secretRef: "" # we don't need to specify the secret here
The configuration very similar to AWS S3. You only need to change the endpoint to storage.googleapis.com:443
& set spec.config.minio.cloudProvider
to gcp
:
# # change the <parameters> to match your environment
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-release
labels:
app: milvus
spec:
# Omit other fields ...
config:
minio:
cloudProvider: gcp
dependencies:
storage:
# Omit other fields ...
endpoint: storage.googleapis.com:443
Similar to AWS S3, you can also use Workload Identity to access GCS with temporary credentials if you're using GKE as your kubernetes cluster.
The annotation of the ServiceAccount
is different from AWS EKS. You need to specify the GCP service account name instead of the role arn.
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-release-sa
annotations:
iam.gke.io/gcp-service-account: <my-gcp-service-account-name>
Then configure your Milvus instance to use the above ServiceAccount
and enable AssumeRole by setting spec.config.minio.useIAM
to true
.
apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
name: my-release
labels:
app: milvus
spec:
# Omit other fields ...
components:
# use the above ServiceAccount
serviceAccountName: my-release-sa
config:
minio:
cloudProvider: gcp
# enable AssumeRole
useIAM: true
# Omit other fields ...