CORTX Community Object Storage is 100% open-source object storage, uniquely optimized for mass capacity storage devices. This repository provides capability and support for deploying CORTX onto the Kubernetes container orchestration platform.
- Project Overview
- Reference Architecture
- CORTX on Kubernetes Prerequisites
- Kubernetes Reference Deployments
- Getting Started
- Solution YAML Overview
- Troubleshooting
- Glossary
- License
This repository provides application-specific Helm charts and deployment scripts for deploying CORTX on to an existing Kubernetes cluster.
Deploying and managing Kubernetes is outside the scope of this repository, however configuration and best practices are offered as guidance where appropriate, along with links to reference Kubernetes cluster deployment processes.
CORTX on Kubernetes consists of five primary components:
-
Prerequisite services, consisting of Consul and Apache Kafka.
-
CORTX Control Pods
- These pods maintain the CORTX control plane
- There is a default cardinality of one pod per CORTX deployment
-
CORTX Data Pods
- These pods maintain the CORTX data plane
- There is a default cardinality of one pod per defined CVG per CORTX node
-
CORTX Server Pods
- These pods maintain the CORTX API and user interfaces
- There is a default cardinality of three pods per CORTX node (but scalable based on system traffic)
-
CORTX HA Pods
- These pods maintain the overall high-availability of the CORTX deployment
- There is a default cardinality of one pod per CORTX deployment
For additional discussion on infrastructure prerequisites in support of other Kubernetes capabilities prior to installing CORTX, please reference the Prerequisite use cases for deploying CORTX on Kubernetes guide.
CORTX on Kubernetes is provided via Helm Charts. As such, you will need Helm installed locally to deploy CORTX on Kubernetes. You can find the specific installation instructions for your local platform via the Installing Helm section of the official Helm documentation.
yq is a command-line YAML processor and must be installed for use by the deployment scripts. Version 4.25.1 or later is required.
All Kubernetes Nodes must have a uniform device/drive setup across the Kubernetes cluster, i.e. all nodes will have the same /dev/sdb
, /dev/sdc
, /dev/sdN
, etc. device paths.
For configuration options in support of persistent device naming and stability across Kubernetes Node reboot support, reference the Persistent disk naming and node reboot support section of the Prerequisite use cases for deploying CORTX on Kubernetes guide.
Kernel parameter vm.max_map_count
must be set to a specific minimum level of 30000000
(thirty million) on the Kubernetes Nodes where cortx-data
Pods will run.
- The
prereq-deploy-cortx-cloud.sh
script will set this value prior to deployment if you choose to utilize it. - The
cortx-data
Pods include an initContainer that will check for this minimal value and halt deployment if not met.
Rancher Local Path Provisioner is used to manage dynamic provisioning of local storage for prerequisite services.
- The
prereq-deploy-cortx-cloud.sh
script will ensure this directory exists, if you choose to utilize it. - This directory prefix is configurable in the
solution.yaml
file via thesolution.common.storage_provisioner_path
, while appendinglocal-path-provisioner
to it.- You can manually create this path via the default values of
/mnt/fs-local-volume/local-path-provisioner
on every Kubernetes Node - Or you can customize the value of
solution.common.storage_provisioner_path
and create directories on every Kubernetes Node to match (i.e./mnt/cortx-k8s/local-volumes/local-path-provisioner
.
- You can manually create this path via the default values of
There are numerous ways to install and configure a complete Kubernetes cluster. As long as the prerequisites in the previous step are all satisfied, CORTX on Kubernetes should deploy successfully.
For reference material, we have provided existing Kubernetes deployment models that have been verified to work with CORTX on Kubernetes. These are only provided for reference and are not meant to be explicit deployment constraints.
- Seagate Internal Jenkins Job
- CORTX on AWS and Kubernetes - Quick Start Guide
- CORTX on minikube - Quick Start Guide
Should you have trouble deploying CORTX on Kubernetes to your Kubernetes cluster, please open an Issue in this repository for further troubleshooting.
All steps in this section assume the proper prerequisites have been installed or configured as described in CORTX on Kubernetes Prerequisites above.
If you have direct access to the underlying Kubernetes Nodes in your cluster, CORTX on Kubernetes provides a prerequisite deployment script that will configure the majority of the low-level system configuration requirements prior to CORTX deployment. This is not a required step if you choose to ensure all the prerequisites mentioned above are satisfied manually.
-
Copy
prereq-deploy-cortx-cloud.sh
script, and the solution yaml file to all worker nodes:scp prereq-deploy-cortx-cloud.sh <user>@<worker-node-IP-address>:<path-to-prereq-script> scp <solution_yaml_file> <user>@<worker-node-IP-address>:<path-to-solution-yaml>
-
Run prerequisite script on all worker nodes in the cluster, and any untainted control nodes which allow Pod scheduling.
<disk>
is a required input to run this script. This disk should NOT be any of the devices listed insolution.storage.cvg*
in thesolution.yaml
file:sudo ./prereq-deploy-cortx-cloud.sh -d <disk> [ -s <solution-file> ]
-
The
-d <disk>
flag is a required flag to pass the path of the disk or device to mount for secondary storage to theprereq-deploy-cortx-cloud.sh
script. This should be in the format of/dev/sdb
etc. -
The
-s <solution-file>
flag is an optional flag to theprereq-deploy-cortx-cloud.sh
script. Make sure to use the same solution file for prereqs, deploy and destroy scripts. The default<solution-file>
issolution.yaml
if the-s
flag is not supplied.
-
-
Clone this repository to a machine with connectivity to your Kubernetes cluster:
git clone https://github.com/Seagate/cortx-k8s
ℹ️ You can also use the latest released version of the CORTX on Kubernetes code via the Latest Releases page
-
For initial deployments, copy the example solution configuration file
./k8_cortx_cloud/solution.example.yaml
to./k8_cortx_cloud/solution.yaml
or to a filename of your choice. -
Update the solution configuration file to reflect your environment. The most common and expected updates are reflected below:
-
Update the namespace you want to deploy CORTX into. The default is "cortx". If the namespace does not exist then it will be created for you. There is currently a limitation on the maximum length of the namespace to 20 characters.
-
Update the
deployment_type
with the desired deployment mode. See under Global Parameters for more details. -
Update all passwords. The
csm-secret
should include one special character in cortx-secret. -
Update the images section with cortx-data, cortx-rgw, cortx-control image tags desired to be used.
- Each specific release of the CORTX on Kubernetes code will point to a specific predefined container image.
- This can be overridden as desired.
-
Update SNS and DIX durability values. The default value for both parameters is
1+0+0
. -
Update storage cvg devices for data and metadata with respect to the devices in your environment.
-
Update nodes section with proper node hostnames from your Kubernetes cluster.
- If the Kubernetes control plane nodes are required to be used for deployment, make sure to remove the taint from it before deploying CORTX.
- For further details and reference, you can view the official Kubernetes documentation topic on Taints & Tolerations
-
For further details on the solution configuration file specifics, review the Solution YAML Overview section below.
-
-
Run the
deploy-cortx-cloud.sh
script, passing in the path to your updatedsolution.yaml
file../deploy-cortx-cloud.sh solution.yaml
-
Validate CORTX on Kubernetes status
DATA_POD=$(kubectl get pods -l cortx.io/service-type=cortx-data --no-headers | awk '{print $1}' | head -n 1) kubectl exec -it $DATA_POD -c cortx-hax -- /bin/bash -c "hctl status"
ℹ️ As the CORTX on Kubernetes architecture is evolving, the upgrade path for CORTX on Kubernetes is evolving as well. As a workaround until more foundational upgrade capabilities exist, the following steps are available to manually upgrade your CORTX on Kubernetes environment to a more recent release.
This upgrade process updates all containers in all CORTX pods to the new specified image.
To upgrade a previously deployed CORTX cluster, run the upgrade-cortx-cloud.sh
script to patch the CORTX on Kubernetes deployments using an updated image (:information_source: You will want to update the TARGET_IMAGE
variable below to your desired image tag). The script will stop all CORTX Pods, update the Deployments and StatefulSets, and then re-start the Pods.
TARGET_IMAGE="ghcr.io/seagate/cortx-data:2.0.0-835"
./upgrade-cortx-cloud.sh -s solution.yaml -i $TARGET_IMAGE
Note: There are three separate CORTX images (cortx-data, cortx-rgw, and cortx-control). By specifying any one of these images, all images will be updated to that same version. For example, if the image ghcr.io/seagate/cortx-data:2.0.0-835
is specified, then:
ghcr.io/seagate/cortx-data:2.0.0-835
will be applied to the cortx-data and cortx-client containersghcr.io/seagate/cortx-rgw:2.0.0-835
will be applied to the cortx-server containersghcr.io/seagate/cortx-control:2.0.0-835
will be applied to the cortx-control and cortx-ha containers
To update the image for a specific CORTX Deployment or StatefulSet use kubectl set image
:
# Update image on all containers in a cortx-data statefulset
kubectl set image --namespace=${NAMESPACE} statefulset cortx-data '*=ghcr.io/seagate/cortx-data:2.0.0-835'
# Update image on all containers in a cortx-server statefulset
kubectl set image --namespace=${NAMESPACE} statefulset cortx-server '*=ghcr.io/seagate/cortx-rgw:2.0.0-835'
# Update image on all containers in a cortx-control deployment
kubectl set image --namespace=${NAMESPACE} deployment cortx-control '*=ghcr.io/seagate/cortx-control:2.0.0-835'
# Update image on all containers in a cortx-ha deployment
kubectl set image --namespace=${NAMESPACE} deployment cortx-ha '*=ghcr.io/seagate/cortx-control:2.0.0-835'
# Update image on all containers in a cortx-client statefulset
kubectl set image --namespace=${NAMESPACE} statefulset cortx-client '*=ghcr.io/seagate/cortx-client:2.0.0-835'
TODO Port this Confluence Page here or into a linked doc/readme
file.
To gather logs from a CORTX on Kubernetes deployment, run the logs-cortx-cloud.sh
script while passing in the solution.yaml
file to it.
./logs-cortx-cloud.sh --solution-config solution.yaml
Run the destroy-cortx-cloud.sh
script, passing in the path to the previously updated solution.yaml
file
./destroy-cortx-cloud.sh solution.yaml
Note: This script does not uninstall the local provisioner. If you need to uninstall the local provisioner
kubectl delete -f ./cortx-cloud-3rd-party-pkg/auto-gen-rancher-provisioner/local-path-storage.yaml
The CORTX solution configuration file consists of all parameters required to deploy CORTX on Kubernetes. The pre-req, deploy, and destroy scripts parse the solution configuration file and extract information they need to deploy and destroy CORTX.
An example solution configuration is provided by solution.example.yaml
.
All paths below are prefixed with solution.
for fully-qualified naming and are required to have a value unless explicitly marked as (Optional) below.
Name | Description | Default Value |
---|---|---|
namespace |
The Kubernetes namespace that all CORTX-related resources will be deployed into. Currently limited to a maximum of 20 characters. | |
deployment_type |
The type of deployment. This determines which Kubernetes resources are created. Valid values are standard and data-only . |
standard |
This section contains the CORTX and third-party authentication information used to deploy CORTX on Kubernetes.
A Kubernetes Secret is used to hold the various passwords and secret keys needed by the various components.
- If the
secrets.name
field is specified, then CORTX will create and populate this Secret object, using this specified name. For anysecrets.content
fields that are not specified or do not have a value specified, CORTX will generate a random password. - If the
secrets.external_secret
field is specified, then CORTX will expect a Kubernetes Secret object to already exist with the specified name, which contains the passwords for these fields. This allows an admin to specify passwords outside of solution.yaml. Note: If asecrets.external_secret
is used, then the specified Secret must define all CORTX-required passwords.
💡 To create a new Kubernetes Secret object with admin-specified values for required CORTX passwords:
kubectl create secret generic my-cortx-secret \
--from-literal=common_admin_secret=Password1@123 \
--from-literal=consul_admin_secret=Password2@123 \
--from-literal=kafka_admin_secret=Password3@123 \
--from-literal=s3_auth_admin_secret=Password4@123 \
--from-literal=csm_auth_admin_secret=Password5@123 \
--from-literal=csm_mgmt_admin_secret=Password6@123
Name | Description | Default Value |
---|---|---|
secrets.name |
Name for the Kubernetes Secret CORTX uses to store solution-specific secrets | cortx-secret |
secrets.content.kafka_admin_secret |
Administrator password for the Kafka required service | null |
secrets.content.consul_admin_secret |
Administrator password for the Consul required service | null |
secrets.content.common_admin_secret |
Administrator password for the CORTX common services | null |
secrets.content.s3_auth_admin_secret |
Administrator password for the S3 Auth CORTX component | null |
secrets.content.csm_auth_admin_secret |
Administrator password for the CSM Auth CORTX component | null |
secrets.content.csm_mgmt_admin_secret |
Administrator password for the CSM Management CORTX component | null |
secrets.external_secret |
Name of previously existing Secret that contains CORTX-required secrets. Note: This field is mutually exclusive with secrets.name . |
This section contains the CORTX and third-party images used to deploy CORTX on Kubernetes.
Name | Description | Default Value |
---|---|---|
images.cortxcontrol |
Image registry, repository, & tag for the CORTX Control components | ghcr.io/seagate/cortx-control:2.0.0-{VERSION} |
images.cortxdata |
Image registry, repository, & tag for the CORTX Data components | ghcr.io/seagate/cortx-data:2.0.0-{VERSION} |
images.cortxserver |
Image registry, repository, & tag for the CORTX Server components | ghcr.io/seagate/cortx-rgw:2.0.0-{VERSION} |
images.cortxha |
Image registry, repository, & tag for the CORTX HA components | ghcr.io/seagate/cortx-control:2.0.0-{VERSION} |
images.cortxclient |
Image registry, repository, & tag for the CORTX Client components | ghcr.io/seagate/cortx-data:2.0.0-{VERSION} |
images.consul |
Image registry, repository, & tag for the Consul required service | ghcr.io/seagate/consul:1.11.4 |
images.kafka |
Image registry, repository, & tag for the Kafka required service | ghcr.io/seagate/kafka:3.0.0-debian-10-r97 |
images.zookeeper |
Image registry, repository, & tag for the Zookeeper required service | ghcr.io/seagate/zookeeper:3.8.0-debian-10-r9 |
images.rancher |
Image registry, repository, & tag for the Rancher Local Path Provisioner container | ghcr.io/seagate/local-path-provisioner:v0.0.20 |
images.busybox |
Image registry, repository, & tag for the utility busybox container | ghcr.io/seagate/busybox:latest |
⚠️ This section is actively under construction!
This section contains common parameters that affect all CORTX components running on Kubernetes.
Name | Description | Default Value |
---|---|---|
common.storage_provisioner_path |
TODO | /mnt/fs-local-volume |
common.s3.default_iam_users.auth_admin |
Username for the default administrative user created for internal RGW interactions. Corresponds to secrets.content.s3_auth_admin_secret above. |
sgiamadmin |
common.s3.default_iam_users.auth_user |
Username for the default user created for internal RGW interactions. Corresponds to secrets.content.s3_auth_admin_secret above. |
user_name |
common.s3.max_start_timeout |
TODO | 240 |
common.s3.instances_per_node |
This field determines the number of CORTX Server Pods to be deployed per Node specified in the nodes section of the solution configuration file. |
1 |
common.s3.extra_configuration |
(Optional) Extra configuration settings to append to the RGW configuration. The value is a multi-line string included verbatim. | "" |
common.motr.num_client_inst |
TODO | 0 |
common.motr.extra_configuration |
(Optional) Extra configuration settings to append to the Motr configuration. The value is a multi-line string included verbatim. | "" |
common.hax.protocol |
Protocol that is used to communicate with HAX components running across Server and Data Pods. | http |
common.hax.port_num |
Port number that is used to communicate with HAX components running across Server and Data Pods. | 22003 |
common.external_services.s3.type |
Kubernetes Service type for external access to S3 IO | NodePort |
common.external_services.s3.count |
The number of service instances to create when service type is LoadBalancer |
1 |
common.external_services.s3.ports.http |
Non-secure (http) port number used for S3 IO | 8000 |
common.external_services.s3.ports.https |
Secure (https) service port number for S3 IO | 8443 |
common.external_services.s3.nodePorts.http |
(Optional) Node port for non-secure (http) S3 IO | null |
common.external_services.s3.nodePorts.https |
(Optional) Node port for secure (https) S3 IO | null |
common.external_services.control.type |
Kubernetes Service type for external access to CSM Management API | NodePort |
common.external_services.control.ports.https |
Secure (https) service port number for CSM Management API. | 8081 |
common.external_services.control.nodePorts.https |
(Optional) Node port for secure (https) CSM Management API. | null |
common.resource_allocation.**.storage |
The desired storage space allocated to PVCs used by that component or sub-component. | See solution.example.yaml |
common.resource_allocation.**.resources.requests.* |
CPU & Memory requested for Pods managed by a specific component or sub-component. | See solution.example.yaml |
common.resource_allocation.**.resources.limits.* |
CPU & Memory limits for Pods managed by a specific component or sub-component. | See solution.example.yaml |
⚠️ This section is actively under construction!
The metadata and data drives are defined in this section. All drives must be the same across all nodes on which CORTX Data will be deployed. A minimum of 1 CVG of type ios
with one metadata drive and one data drive is required.
Name | Description | Default Value |
---|---|---|
storage_sets |
A list of the storage defined for use by CORTX. At this time, only one storage set is supported. | See solution.example.yaml |
storage_sets[].name |
The name of an individual storage set. | storage-set-1 |
storage_sets[].durability.sns |
TBD |
1+0+0 |
storage_sets[].durability.dix |
TBD |
1+0+0 |
storage_sets[].container_group_size |
This value determines the number of Motr IO containers inside of a single CORTX Data Pod. This value can be tuned for optimal performance based upon different Kubernetes environments. | 1 |
storage_sets[].nodes |
The list of Kubernetes worker nodes that CORTX will use to manage data inside the defined storage set. | See solution.example.yaml |
storage_sets[].storage |
The list of CVGs (or Cylinder Volume Groups) that CORTX will use to store its data. All nodes defined in the parameter above must have all the same metadata and data drives available as defined in this parameter. | See solution.example.yaml |
storage_sets[].storage[].name |
This value is used to identify the specific collection of drives CORTX will use to store data. | cvg-01 |
storage_sets[].storage[].type |
TBD |
ios |
storage_sets[].storage[].devices |
The list of specific block devices CORTX will use to store both object metadata and data on inside this CVG. | See solution.example.yaml |
storage_sets[].storage[].devices.metadata[].path |
The block device path CORTX will use to store object metadata on for this CVG. | /dev/sdc |
storage_sets[].storage[].devices.metadata[].size |
The size of the block device CORTX will use to store object metadata on for this CVG. | 5Gi |
storage_sets[].storage[].devices.data[] |
The list of block devices CORTX will use to store its object data on for this CVG. This list can (and most often will) have multiple devices defined in it. | See solution.example.yaml |
storage_sets[].storage[].devices.data[].path |
The block device path CORTX will use to store some of its object data on for this CVG. | See solution.example.yaml |
storage_sets[].storage[].devices.data[].size |
The size of the block device CORTX will use to store some of its object data on for this CVG. | 5Gi |
The Helm charts work with both "stub" and "CORTX ALL" containers, allowing users to deploy both placeholder Kubernetes artifacts and functioning CORTX deployments using the same code base. If you are encountering issues deploying CORTX on Kubernetes, you can utilize the stub container method by setting the necessary component in solution.yaml
to use an image of ghcr.io/seagate/centos:7
instead of a CORTX-based image. This will deploy the same Kubernetes structure, expect the container entrypoints will be set to sleep 3650d
to allow for deployment progression and user inspection of the overall deployment.
After the CORTX Kubernetes resources are created, the deployment script will wait for those resources to finish installing and reach a ready state. This wait is guarded by a set of timeout values which can be overridden using environment variables. The values are duration strings, such as "30s"
or "10m"
. The wait can be disabled completely by setting CORTX_DEPLOY_NO_WAIT
to true
.
Environment Variable | Description | Default Value |
---|---|---|
CORTX_DEPLOY_CLIENT_TIMEOUT |
Client Deployment timeout duration | 10m (10 minutes) |
CORTX_DEPLOY_CONTROL_TIMEOUT |
Control Deployment timeout duration | 10m (10 minutes) |
CORTX_DEPLOY_DATA_TIMEOUT |
Data Deployment timeout duration | 10m (10 minutes) |
CORTX_DEPLOY_HA_TIMEOUT |
HA Deployment timeout duration | 4m (4 minutes) |
CORTX_DEPLOY_SERVER_TIMEOUT |
Server Deployment timeout duration | 10m (10 minutes) |
CORTX_DEPLOY_NO_WAIT |
Disable all waits when true |
false , wait is enabled |
During the deployment process, the CORTX Helm Chart is installed, based on the input from the solution.yaml
file. The solution file does not support all possible Chart value configuration settings. For those times you want to customize the deployment beyond what is available in solution.yaml
, you can specify a custom Chart values.yaml file using the CORTX_DEPLOY_CUSTOM_VALUES_FILE
environment variable. The custom file will override anything set in solution.yaml
or calculated by the deployment script, so be careful when using this feature. See the Chart documentation for details on possible configuration settings.
For example, this values file will enable the Consul Web UI:
consul:
ui:
# Enable the Consul Web UI
enable: true
Set the environment variable when deploying:
CORTX_DEPLOY_CUSTOM_VALUES_FILE="myvalues.yaml" ./deploy-cortx-cloud.sh solution.yaml
During CORTX deployments, there are edge cases where the InitContainers of a CORTX pod will fail into a CrashLoopBackoff state and it becomes difficult to capture the internal logs that provide necessary context for such error conditions. This command can be used to spin up a debugging container instance that has access to those same logs.
kubectl debug {crash-looping-pod-name} --copy-to=cortx-debug --container=cortx-setup -- sleep infinity;
kubectl exec -it cortx-debug -c cortx-setup -- sh
Once you are done with your debugging session, you can exit the shell session and delete the cortx-debug
pod.
Note: This requires a kubectl
minimum version of 1.20.
For any terms, acronyms, or phrases that are unfamiliar to you as an end-user, please consult the GLOSSARY page for a growing list of definitions and clarifications as needed.
CORTX is 100% Open Source. Most of the project is licensed under the Apache 2.0 License and the rest is under AGPLv3; check the specific License file of each CORTX submodule to determine which is which.