搭建 Rook Ceph #350
1. 先安装 rook ceph operator$ helm repo add rook-release https://charts.rook.io/release
$ helm upgrade --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph -f charts/rook-values.yaml
点击展开# Default values for rook-ceph-operator
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# -- Image
repository: docker.io/rook/ceph
# -- Image tag
# @default -- `master`
tag: v1.16.1
# -- Image pull policy
pullPolicy: IfNotPresent
# -- Whether the helm chart should create and update the CRDs. If false, the CRDs must be
# managed independently with deploy/examples/crds.yaml.
# **WARNING** Only set during first deployment. If later disabled the cluster may be DESTROYED.
# If the CRDs are deleted in this case, see
# [the disaster recovery guide](https://rook.io/docs/rook/latest/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion)
# to restore them.
enabled: true
# -- Pod resource requests & limits
memory: 512Mi
cpu: 200m
memory: 128Mi
# -- Kubernetes [`nodeSelector`](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) to add to the Deployment.
# Constraint rook-ceph-operator Deployment to nodes with label `disktype: ssd`.
# For more info, see https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
# disktype: ssd
# -- List of Kubernetes [`tolerations`](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to add to the Deployment.
# -- Delay to use for the `node.kubernetes.io/unreachable` pod failure toleration to override
# the Kubernetes default of 5 minutes
unreachableNodeTolerationSeconds: 5
# -- Whether the operator should watch cluster CRD in its own namespace or not
currentNamespaceOnly: false
# -- Pod annotations
annotations: {}
# -- Global log level for the operator.
# Options: `ERROR`, `WARNING`, `INFO`, `DEBUG`
logLevel: INFO
# -- If true, create & use RBAC resources
rbacEnable: true
# -- If true, create a ClusterRole aggregated to [user facing roles](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles) for objectbucketclaims
enableOBCs: false
# -- If true, create & use PSP resources
pspEnable: false
# -- Set the priority class for the rook operator deployment if desired
# -- Set the container security context for the operator
runAsNonRoot: true
runAsUser: 2016
runAsGroup: 2016
drop: ["ALL"]
# -- If true, loop devices are allowed to be used for osds in test clusters
allowLoopDevices: false
# Settings for whether to disable the drivers or other daemons if they are not
# needed
# -- Enable Ceph CSI RBD driver
enableRbdDriver: true
# -- Enable Ceph CSI CephFS driver
enableCephfsDriver: true
# -- Disable the CSI driver.
disableCsiDriver: "false"
# -- Enable host networking for CSI CephFS and RBD nodeplugins. This may be necessary
# in some network configurations where the SDN does not provide access to an external cluster or
# there is significant drop in read/write performance
enableCSIHostNetwork: true
# -- Enable Snapshotter in CephFS provisioner pod
enableCephfsSnapshotter: true
# -- Enable Snapshotter in NFS provisioner pod
enableNFSSnapshotter: true
# -- Enable Snapshotter in RBD provisioner pod
enableRBDSnapshotter: true
# -- Enable Host mount for `/etc/selinux` directory for Ceph CSI nodeplugins
enablePluginSelinuxHostMount: false
# -- Enable Ceph CSI PVC encryption support
enableCSIEncryption: false
# -- Enable volume group snapshot feature. This feature is
# enabled by default as long as the necessary CRDs are available in the cluster.
enableVolumeGroupSnapshot: true
# -- PriorityClassName to be set on csi driver plugin pods
pluginPriorityClassName: system-node-critical
# -- PriorityClassName to be set on csi driver provisioner pods
provisionerPriorityClassName: system-cluster-critical
# -- Policy for modifying a volume's ownership or permissions when the RBD PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
rbdFSGroupPolicy: "File"
# -- Policy for modifying a volume's ownership or permissions when the CephFS PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
cephFSFSGroupPolicy: "File"
# -- Policy for modifying a volume's ownership or permissions when the NFS PVC is being mounted.
# supported values are documented at https://kubernetes-csi.github.io/docs/support-fsgroup.html
nfsFSGroupPolicy: "File"
# -- OMAP generator generates the omap mapping between the PV name and the RBD image
# which helps CSI to identify the rbd images for CSI operations.
# `CSI_ENABLE_OMAP_GENERATOR` needs to be enabled when we are using rbd mirroring feature.
# By default OMAP generator is disabled and when enabled, it will be deployed as a
# sidecar with CSI provisioner pod, to enable set it to true.
enableOMAPGenerator: false
# -- Set CephFS Kernel mount options to use https://docs.ceph.com/en/latest/man/8/mount.ceph/#options.
# Set to "ms_mode=secure" when connections.encrypted is enabled in CephCluster CR
# -- Enable adding volume metadata on the CephFS subvolumes and RBD images.
# Not all users might be interested in getting volume/snapshot details as metadata on CephFS subvolume and RBD images.
# Hence enable metadata is false by default
enableMetadata: false
# -- Set replicas for csi provisioner deployment
provisionerReplicas: 2
# -- Cluster name identifier to set as metadata on the CephFS subvolume and RBD images. This will be useful
# in cases like for example, when two container orchestrator clusters (Kubernetes/OCP) are using a single ceph cluster
# -- Set logging level for cephCSI containers maintained by the cephCSI.
# Supported values from 0 to 5. 0 for general useful logs, 5 for trace level verbosity.
logLevel: 0
# -- Set logging level for Kubernetes-csi sidecar containers.
# Supported values from 0 to 5. 0 for general useful logs (the default), 5 for trace level verbosity.
# @default -- `0`
# -- CSI driver name prefix for cephfs, rbd and nfs.
# @default -- `namespace name where rook-ceph operator is deployed`
# -- CSI RBD plugin daemonset update strategy, supported values are OnDelete and RollingUpdate
# @default -- `RollingUpdate`
# -- A maxUnavailable parameter of CSI RBD plugin daemonset update strategy.
# @default -- `1`
# -- CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate
# @default -- `RollingUpdate`
# -- A maxUnavailable parameter of CSI cephFS plugin daemonset update strategy.
# @default -- `1`
# -- CSI NFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate
# @default -- `RollingUpdate`
# -- Set GRPC timeout for csi containers (in seconds). It should be >= 120. If this value is not set or is invalid, it defaults to 150
grpcTimeoutInSeconds: 150
# -- Burst to use while communicating with the kubernetes apiserver.
# -- QPS to use while communicating with the kubernetes apiserver.
# -- The volume of the CephCSI RBD plugin DaemonSet
# - name: lib-modules
# hostPath:
# path: /run/booted-system/kernel-modules/lib/modules/
# - name: host-nix
# hostPath:
# path: /nix
# -- The volume mounts of the CephCSI RBD plugin DaemonSet
# - name: host-nix
# mountPath: /nix
# readOnly: true
# -- The volume of the CephCSI CephFS plugin DaemonSet
# - name: lib-modules
# hostPath:
# path: /run/booted-system/kernel-modules/lib/modules/
# - name: host-nix
# hostPath:
# path: /nix
# -- The volume mounts of the CephCSI CephFS plugin DaemonSet
# - name: host-nix
# mountPath: /nix
# readOnly: true
# -- CEPH CSI RBD provisioner resource requirement list
# csi-omap-generator resources will be applied only if `enableOMAPGenerator` is set to `true`
# @default -- see values.yaml
csiRBDProvisionerResource: |
- name : csi-provisioner
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-resizer
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-attacher
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-snapshotter
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-rbdplugin
memory: 512Mi
memory: 1Gi
- name : csi-omap-generator
memory: 512Mi
cpu: 250m
memory: 1Gi
- name : liveness-prometheus
memory: 128Mi
cpu: 50m
memory: 256Mi
# -- CEPH CSI RBD plugin resource requirement list
# @default -- see values.yaml
csiRBDPluginResource: |
- name : driver-registrar
memory: 128Mi
cpu: 50m
memory: 256Mi
- name : csi-rbdplugin
memory: 512Mi
cpu: 250m
memory: 1Gi
- name : liveness-prometheus
memory: 128Mi
cpu: 50m
memory: 256Mi
# -- CEPH CSI CephFS provisioner resource requirement list
# @default -- see values.yaml
csiCephFSProvisionerResource: |
- name : csi-provisioner
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-resizer
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-attacher
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-snapshotter
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-cephfsplugin
memory: 512Mi
cpu: 250m
memory: 1Gi
- name : liveness-prometheus
memory: 128Mi
cpu: 50m
memory: 256Mi
# -- CEPH CSI CephFS plugin resource requirement list
# @default -- see values.yaml
csiCephFSPluginResource: |
- name : driver-registrar
memory: 128Mi
cpu: 50m
memory: 256Mi
- name : csi-cephfsplugin
memory: 512Mi
cpu: 250m
memory: 1Gi
- name : liveness-prometheus
memory: 128Mi
cpu: 50m
memory: 256Mi
# -- CEPH CSI NFS provisioner resource requirement list
# @default -- see values.yaml
csiNFSProvisionerResource: |
- name : csi-provisioner
memory: 128Mi
cpu: 100m
memory: 256Mi
- name : csi-nfsplugin
memory: 512Mi
cpu: 250m
memory: 1Gi
- name : csi-attacher
memory: 512Mi
cpu: 250m
memory: 1Gi
# -- CEPH CSI NFS plugin resource requirement list
# @default -- see values.yaml
csiNFSPluginResource: |
- name : driver-registrar
memory: 128Mi
cpu: 50m
memory: 256Mi
- name : csi-nfsplugin
memory: 512Mi
cpu: 250m
memory: 1Gi
# Set provisionerTolerations and provisionerNodeAffinity for provisioner pod.
# The CSI provisioner would be best to start on the same nodes as other ceph daemons.
# -- Array of tolerations in YAML format which will be added to CSI provisioner deployment
# - key: key
# operator: Exists
# effect: NoSchedule
# -- The node labels for affinity of the CSI provisioner deployment [^1]
provisionerNodeAffinity: #key1=value1,value2; key2=value3
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - ceph
# Set pluginTolerations and pluginNodeAffinity for plugin daemonset pods.
# The CSI plugins need to be started on all the nodes where the clients need to mount the storage.
# -- Array of tolerations in YAML format which will be added to CephCSI plugin DaemonSet
# - key: key
# operator: Exists
# effect: NoSchedule
# -- The node labels for affinity of the CephCSI RBD plugin DaemonSet [^1]
pluginNodeAffinity: # key1=value1,value2; key2=value3
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - ceph
# -- Enable Ceph CSI Liveness sidecar deployment
enableLiveness: false
# -- CSI CephFS driver metrics port
# @default -- `9081`
# -- CSI Addons server port
# @default -- `9070`
# -- Enable Ceph Kernel clients on kernel < 4.17. If your kernel does not support quotas for CephFS
# you may want to disable this setting. However, this will cause an issue during upgrades
# with the FUSE client. See the [upgrade guide](https://rook.io/docs/rook/v1.2/ceph-upgrade.html)
forceCephFSKernelClient: true
# -- Ceph CSI RBD driver metrics port
# @default -- `8080`
# -- Enable ServiceMonitor for Ceph CSI drivers
enabled: false
# -- Service monitor scrape interval
interval: 10s
# -- ServiceMonitor additional labels
labels: {}
# -- Use a different namespace for the ServiceMonitor
# -- Kubelet root directory path (if the Kubelet uses a different path for the `--root-dir` flag)
# @default -- `/var/lib/kubelet`
# -- Duration in seconds that non-leader candidates will wait to force acquire leadership.
# @default -- `137s`
# -- Deadline in seconds that the acting leader will retry refreshing leadership before giving up.
# @default -- `107s`
# -- Retry period in seconds the LeaderElector clients should wait between tries of actions.
# @default -- `26s`
# -- Ceph CSI image repository
repository: quay.io/cephcsi/cephcsi
# -- Ceph CSI image tag
tag: v3.13.0
# -- Kubernetes CSI registrar image repository
repository: registry.k8s.io/sig-storage/csi-node-driver-registrar
# -- Registrar image tag
tag: v2.11.1
# -- Kubernetes CSI provisioner image repository
repository: registry.k8s.io/sig-storage/csi-provisioner
# -- Provisioner image tag
tag: v5.0.1
# -- Kubernetes CSI snapshotter image repository
repository: registry.k8s.io/sig-storage/csi-snapshotter
# -- Snapshotter image tag
tag: v8.2.0
# -- Kubernetes CSI Attacher image repository
repository: registry.k8s.io/sig-storage/csi-attacher
# -- Attacher image tag
tag: v4.6.1
# -- Kubernetes CSI resizer image repository
repository: registry.k8s.io/sig-storage/csi-resizer
# -- Resizer image tag
tag: v1.11.1
# -- Image pull policy
imagePullPolicy: IfNotPresent
# -- Labels to add to the CSI CephFS Deployments and DaemonSets Pods
cephfsPodLabels: #"key1=value1,key2=value2"
# -- Labels to add to the CSI NFS Deployments and DaemonSets Pods
nfsPodLabels: #"key1=value1,key2=value2"
# -- Labels to add to the CSI RBD Deployments and DaemonSets Pods
rbdPodLabels: #"key1=value1,key2=value2"
# -- Enable CSIAddons
enabled: false
# -- CSIAddons sidecar image repository
repository: quay.io/csiaddons/k8s-sidecar
# -- CSIAddons sidecar image tag
tag: v0.11.0
# -- Enable the nfs csi driver
enabled: false
# -- Enable topology based provisioning
enabled: false
# NOTE: the value here serves as an example and needs to be
# updated with node labels that define domains of interest
# -- domainLabels define which node labels to use as domains
# for CSI nodeplugins to advertise their domains
# - kubernetes.io/hostname
# - topology.kubernetes.io/zone
# - topology.rook.io/rack
# -- Whether to skip any attach operation altogether for CephFS PVCs. See more details
# [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
# If cephFSAttachRequired is set to false it skips the volume attachments and makes the creation
# of pods using the CephFS PVC fast. **WARNING** It's highly discouraged to use this for
# CephFS RWO volumes. Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
cephFSAttachRequired: true
# -- Whether to skip any attach operation altogether for RBD PVCs. See more details
# [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
# If set to false it skips the volume attachments and makes the creation of pods using the RBD PVC fast.
# **WARNING** It's highly discouraged to use this for RWO volumes as it can cause data corruption.
# csi-addons operations like Reclaimspace and PVC Keyrotation will also not be supported if set
# to false since we'll have no VolumeAttachments to determine which node the PVC is mounted on.
# Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
rbdAttachRequired: true
# -- Whether to skip any attach operation altogether for NFS PVCs. See more details
# [here](https://kubernetes-csi.github.io/docs/skip-attach.html#skip-attach-with-csi-driver-object).
# If cephFSAttachRequired is set to false it skips the volume attachments and makes the creation
# of pods using the NFS PVC fast. **WARNING** It's highly discouraged to use this for
# NFS RWO volumes. Refer to this [issue](https://github.com/kubernetes/kubernetes/issues/103305) for more details.
nfsAttachRequired: true
# -- Enable discovery daemon
enableDiscoveryDaemon: false
# -- Set the discovery daemon device discovery interval (default to 60m)
discoveryDaemonInterval: 60m
# -- The timeout for ceph commands in seconds
cephCommandsTimeoutSeconds: "15"
# -- If true, run rook operator on the host network
# -- If true, scale down the rook operator.
# This is useful for administrative actions where the rook operator must be scaled down, while using gitops style tooling
# to deploy your helm charts.
scaleDownOperator: false
## Rook Discover configuration
## toleration: NoSchedule, PreferNoSchedule or NoExecute
## tolerationKey: Set this to the specific key of the taint to tolerate
## tolerations: Array of tolerations in YAML format which will be added to agent deployment
## nodeAffinity: Set to labels of the node to match
# -- Toleration for the discover pods.
# Options: `NoSchedule`, `PreferNoSchedule` or `NoExecute`
# -- The specific key of the taint to tolerate
# -- Array of tolerations in YAML format which will be added to discover deployment
# - key: key
# operator: Exists
# effect: NoSchedule
# -- The node labels for affinity of `discover-agent` [^1]
# key1=value1,value2; key2=value3
# or
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: storage-node
# operator: Exists
# -- Labels to add to the discover pods
podLabels: # "key1=value1,key2=value2"
# -- Add resources to discover daemon pods
# - limits:
# memory: 512Mi
# - requests:
# cpu: 100m
# memory: 128Mi
# -- Runs Ceph Pods as privileged to be able to write to `hostPaths` in OpenShift with SELinux restrictions.
hostpathRequiresPrivileged: false
# -- Whether to create all Rook pods to run on the host network, for example in environments where a CNI is not enabled
enforceHostNetwork: false
# -- Disable automatic orchestration when new devices are discovered.
disableDeviceHotplug: false
# -- The revision history limit for all pods created by Rook. If blank, the K8s default is 10.
# -- Blacklist certain disks according to the regex provided.
# -- imagePullSecrets option allow to pull docker images from private docker registry. Option will be passed to all service accounts.
# - name: my-registry-secret
# -- Whether the OBC provisioner should watch on the operator namespace or not, if not the namespace of the cluster will be used
enableOBCWatchOperatorNamespace: true
# -- Specify the prefix for the OBC provisioner in place of the cluster namespace
# @default -- `ceph cluster namespace`
# -- Enable monitoring. Requires Prometheus to be pre-installed.
# Enabling will also create RBAC rules to allow Operator to create ServiceMonitors
enabled: false 验证安装结果: ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl get pod -n rook-ceph
rook-ceph-operator-68d9f8b984-w7tmj 1/1 Running 0 18m 2. 配置 rook ceph 集群有了 operator,那我们就可以利用 k8s CRD 完成集群的配置了。 $ kubectl apply -f cluster.yaml
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.
# For example, to create the cluster:
# kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# kubectl create -f cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
name: rook-ceph
namespace: rook-ceph # namespace:cluster
# The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
# v18 is Reef, v19 is Squid
# RECOMMENDATION: In production, use a specific version tag instead of the general v19 flag, which pulls the latest release and could result in different
# versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
# If you want to be more precise, you can always use a timestamp tag such as quay.io/ceph/ceph:v19.2.0-20240927
# This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
image: quay.io/ceph/ceph:v19.2.0
# Whether to allow unsupported versions of Ceph. Currently Reef and Squid are supported.
# Future versions such as Tentacle (v20) would require this to be set to `true`.
# Do not set to true in production.
allowUnsupported: false
# The path on the host where configuration files will be persisted. Must be specified. If there are multiple clusters, the directory must be unique for each cluster.
# Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
# In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
dataDirHostPath: /var/lib/rook
# Whether or not upgrade should continue even if a check fails
# This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
# Use at your OWN risk
# To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/latest/ceph-upgrade.html#ceph-version-upgrades
skipUpgradeChecks: false
# Whether or not continue if PGs are not clean during an upgrade
continueUpgradeAfterChecksEvenIfNotHealthy: false
# WaitTimeoutForHealthyOSDInMinutes defines the time (in minutes) the operator would wait before an OSD can be stopped for upgrade or restart.
# If the timeout exceeds and OSD is not ok to stop, then the operator would skip upgrade for the current OSD and proceed with the next one
# if `continueUpgradeAfterChecksEvenIfNotHealthy` is `false`. If `continueUpgradeAfterChecksEvenIfNotHealthy` is `true`, then operator would
# continue with the upgrade of an OSD even if its not ok to stop after the timeout. This timeout won't be applied if `skipUpgradeChecks` is `true`.
# The default wait timeout is 10 minutes.
waitTimeoutForHealthyOSDInMinutes: 10
# Whether or not requires PGs are clean before an OSD upgrade. If set to `true` OSD upgrade process won't start until PGs are healthy.
# This configuration will be ignored if `skipUpgradeChecks` is `true`.
# Default is false.
upgradeOSDRequiresHealthyPGs: false
# Set the number of mons to be started. Generally recommended to be 3.
# For highest availability, an odd number of mons should be specified.
count: 3
# The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.
# Mons should only be allowed on the same node for test environments where data loss is acceptable.
allowMultiplePerNode: false
# When higher availability of the mgr is needed, increase the count to 2.
# In that case, one mgr will be active and one in standby. When Ceph updates which
# mgr is active, Rook will update the mgr services to match the active mgr.
count: 2
allowMultiplePerNode: false
# List of modules to optionally enable or disable.
# Note the "dashboard" and "monitoring" modules are already configured by other settings in the cluster CR.
- name: rook
enabled: true
# enable the ceph dashboard for viewing cluster status
enabled: true
# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
# urlPrefix: /ceph-dashboard
# serve the dashboard at the given port.
# port: 8443
# serve the dashboard using SSL
ssl: false
# The url of the Prometheus instance
# prometheusEndpoint: <protocol>://<prometheus-host>:<port>
# Whether SSL should be verified if the Prometheus server is using https
# prometheusEndpointSSLVerify: false
# enable prometheus alerting for cluster
# requires Prometheus to be pre-installed
enabled: false
# Whether to disable the metrics reported by Ceph. If false, the prometheus mgr module and Ceph exporter are enabled.
# If true, the prometheus mgr module and Ceph exporter are both disabled. Default is false.
metricsDisabled: false
# Ceph exporter metrics config.
# Specifies which performance counters are exported.
# Corresponds to --prio-limit Ceph exporter flag
# 0 - all counters are exported
perfCountersPrioLimit: 5
# Time to wait before sending requests again to exporter server (seconds)
# Corresponds to --stats-period Ceph exporter flag
statsPeriodSeconds: 5
# Whether to encrypt the data in transit across the wire to prevent eavesdropping the data on the network.
# The default is false. When encryption is enabled, all communication between clients and Ceph daemons, or between Ceph daemons will be encrypted.
# When encryption is not enabled, clients still establish a strong initial authentication and data integrity is still validated with a crc check.
# IMPORTANT: Encryption requires the 5.11 kernel for the latest nbd and cephfs drivers. Alternatively for testing only,
# you can set the "mounter: rbd-nbd" in the rbd storage class, or "mounter: fuse" in the cephfs storage class.
# The nbd and fuse drivers are *not* recommended in production since restarting the csi driver pod will disconnect the volumes.
enabled: false
# Whether to compress the data in transit across the wire. The default is false.
# See the kernel requirements above for encryption.
enabled: false
# Whether to require communication over msgr2. If true, the msgr v1 port (6789) will be disabled
# and clients will be required to connect to the Ceph cluster with the v2 port (3300).
# Requires a kernel that supports msgr v2 (kernel 5.11 or CentOS 8.4 or newer).
requireMsgr2: false
# enable host networking
#provider: host
# enable the Multus network provider
#provider: multus
# The selector keys are required to be `public` and `cluster`.
# Based on the configuration, the operator will do the following:
# 1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface
# 2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'
# In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus
# public: public-conf --> NetworkAttachmentDefinition object name in Multus
# cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus
# Provide internet protocol version. IPv6, IPv4 or empty string are valid options. Empty string would mean IPv4
#ipFamily: "IPv6"
# Ceph daemons to listen on both IPv4 and Ipv6 networks
#dualStack: false
# Enable multiClusterService to export the mon and OSD services to peer cluster.
# This is useful to support RBD mirroring between two clusters having overlapping CIDRs.
# Ensure that peer clusters are connected using an MCS API compatible application, like Globalnet Submariner.
# enabled: false
# enable the crash collector for ceph daemon crash collection
disable: false
# Uncomment daysToRetain to prune ceph crash entries older than the
# specified number of days.
#daysToRetain: 30
# enable log collector, daemons will log on files and rotate
enabled: true
periodicity: daily # one of: hourly, daily, weekly, monthly
maxLogSize: 500M # SUFFIX may be 'M' or 'G'. Must be at least 1M.
# automate [data cleanup process](https://github.com/rook/rook/blob/master/Documentation/Storage-Configuration/ceph-teardown.md#delete-the-data-on-hosts) in cluster destruction.
# Since cluster cleanup is destructive to data, confirmation is required.
# To destroy all Rook data on hosts during uninstall, confirmation must be set to "yes-really-destroy-data".
# This value should only be set when the cluster is about to be deleted. After the confirmation is set,
# Rook will immediately stop configuring the cluster and only wait for the delete command.
# If the empty string is set, Rook will not destroy any data on hosts during uninstall.
confirmation: ""
# sanitizeDisks represents settings for sanitizing OSD disks on cluster deletion
# method indicates if the entire disk should be sanitized or simply ceph's metadata
# in both case, re-install is possible
# possible choices are 'complete' or 'quick' (default)
method: quick
# dataSource indicate where to get random bytes from to write on the disk
# possible choices are 'zero' (default) or 'random'
# using random sources will consume entropy from the system and will take much more time then the zero source
dataSource: zero
# iteration overwrite N times instead of the default (1)
# takes an integer value
iteration: 1
# allowUninstallWithVolumes defines how the uninstall should be performed
# If set to true, cephCluster deletion does not wait for the PVs to be deleted.
allowUninstallWithVolumes: false
# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
# tolerate taints with a key of 'storage-node'.
# all:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - ceph
# The above placement information can also be specified for mon, osd, and mgr components
# mon:
# Monitor deployments may contain an anti-affinity rule for avoiding monitor
# collocation on the same node. This is a required rule when host network is used
# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
# preferred rule with weight: 50.
# osd:
# prepareosd:
# mgr:
# cleanup:
# all:
# mon:
# mgr:
# osd:
# exporter:
# crashcollector:
# cleanup:
# prepareosd:
# cmdreporter is for jobs to detect ceph and csi versions, and check network status
# cmdreporter:
# clusterMetadata annotations will be applied to only `rook-ceph-mon-endpoints` configmap and the `rook-ceph-mon` and `rook-ceph-admin-keyring` secrets.
# And clusterMetadata annotations will not be merged with `all` annotations.
# clusterMetadata:
# kubed.appscode.com/sync: "true"
# If no mgr annotations are set, prometheus scrape annotations will be set by default.
# mgr:
# all:
# mon:
# osd:
# cleanup:
# mgr:
# prepareosd:
# These labels are applied to ceph-exporter servicemonitor only
# exporter:
# monitoring is a list of key-value pairs. It is injected into all the monitoring resources created by operator.
# These labels can be passed as LabelSelector to Prometheus
# monitoring:
# crashcollector:
#The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
# mgr:
# limits:
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# The above example requests/limits can also be added to the other components
# mon:
# osd:
# For OSD it also is a possible to specify requests/limits based on device class
# osd-hdd:
# osd-ssd:
# osd-nvme:
# prepareosd:
# mgr-sidecar:
# crashcollector:
# logcollector:
# cleanup:
# exporter:
# cmd-reporter:
# The option to automatically remove OSDs that are out and are safe to destroy.
removeOSDsIfOutAndSafeToRemove: false
#all: rook-ceph-default-priority-class
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
#crashcollector: rook-ceph-crashcollector-priority-class
storage: # cluster level storage configuration and selection
useAllNodes: false
useAllDevices: false
# crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
# metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
# databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
# osdsPerDevice: "1" # this value can be overridden at the node or device level
# encryptedDevice: "true" # the default value for this option is "false"
# deviceClass: "myclass" # specify a device class for OSDs in the cluster
allowDeviceClassUpdate: false # whether to allow changing the device class of an OSD after it is created
allowOsdCrushWeightUpdate: false # whether to allow resizing the OSD crush weight after osd pvc is increased
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
- name: pve-ubuntu
- name: vda
- name: pve-gmk-ubuntu
- name: vda
- name: nhan-ubuntu
- name: vdb
# nodes:
# - name: ""
# devices: # specific devices to use for storage can be specified for each node
# - name: "sdb"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
# config: # configuration can be specified at the node level which overrides the cluster level config
# - name: ""
# deviceFilter: "^sd."
# Whether to always schedule OSD pods on nodes declared explicitly in the "nodes" section, even if they are
# temporarily not schedulable. If set to true, consider adding placement tolerations for unschedulable nodes.
scheduleAlways: false
# when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd
onlyApplyOSDPlacement: false
# Time for which an OSD pod will sleep before restarting, if it stopped due to flapping
# flappingRestartIntervalHours: 24
# The ratio at which Ceph should block IO if the OSDs are too full. The default is 0.95.
# fullRatio: 0.95
# The ratio at which Ceph should stop backfilling data if the OSDs are too full. The default is 0.90.
# backfillFullRatio: 0.90
# The ratio at which Ceph should raise a health warning if the OSDs are almost full. The default is 0.85.
# nearFullRatio: 0.85
# The section for configuring management of daemon disruptions during upgrade or fencing.
# If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
# via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
# block eviction of OSDs by default and unblock them safely when drains are detected.
managePodBudgets: true
# A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
# default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.
osdMaintenanceTimeout: 30
# A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.
# Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.
# No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
pgHealthCheckTimeout: 0
# csi defines CSI Driver settings applied per cluster.
# Enable read affinity to enable clients to optimize reads from an OSD in the same topology.
# Enabling the read affinity may cause the OSDs to consume some extra memory.
# For more details see this doc:
# https://rook.io/docs/rook/latest/Storage-Configuration/Ceph-CSI/ceph-csi-drivers/#enable-read-affinity-for-rbd-volumes
enabled: false
# cephfs driver specific settings.
# Set CephFS Kernel mount options to use https://docs.ceph.com/en/latest/man/8/mount.ceph/#options.
# kernelMountOptions: ""
# Set CephFS Fuse mount options to use https://docs.ceph.com/en/latest/man/8/ceph-fuse/#options.
# fuseMountOptions: ""
# healthChecks
# Valid values for daemons are 'mon', 'osd', 'status'
disabled: false
interval: 45s
disabled: false
interval: 60s
disabled: false
interval: 60s
# Change pod liveness probe timing or threshold values. Works for all mon,mgr,osd daemons.
disabled: false
disabled: false
disabled: false
# Change pod startup probe timing or threshold values. Works for all mon,mgr,osd daemons.
disabled: false
disabled: false
disabled: false
验证安装结果: ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl get pod -n rook-ceph
csi-cephfsplugin-29kd9 3/3 Running 2 (3d16h ago) 3d18h
csi-cephfsplugin-k572j 3/3 Running 0 3d18h
csi-cephfsplugin-provisioner-74b95c5758-6k7w9 6/6 Running 0 3d18h
csi-cephfsplugin-provisioner-74b95c5758-ktxhw 6/6 Running 0 3d18h
csi-cephfsplugin-s77bw 3/3 Running 0 3d18h
csi-rbdplugin-9jshj 3/3 Running 0 3d18h
csi-rbdplugin-kl4sv 3/3 Running 1 (3d18h ago) 3d18h
csi-rbdplugin-provisioner-5fd9fbf6f8-225sl 6/6 Running 0 3d18h
csi-rbdplugin-provisioner-5fd9fbf6f8-w9r5s 6/6 Running 0 3d18h
csi-rbdplugin-skjc8 3/3 Running 0 3d18h
rook-ceph-crashcollector-nhan-ubuntu-ddbddd8d9-2frpp 1/1 Running 0 3d17h
rook-ceph-crashcollector-pve-gmk-ubuntu-89bf6dcf-cwcmg 1/1 Running 0 3d17h
rook-ceph-crashcollector-pve-ubuntu-68cff9fd47-dnbpf 1/1 Running 0 3d17h
rook-ceph-exporter-nhan-ubuntu-745dbcf58c-hqcq4 1/1 Running 0 3d17h
rook-ceph-exporter-pve-gmk-ubuntu-6f777d855c-fd54v 1/1 Running 0 3d17h
rook-ceph-exporter-pve-ubuntu-76f44c66fc-2hlw7 1/1 Running 0 3d17h
rook-ceph-mgr-a-c69f44568-bnvlh 3/3 Running 0 3d17h
rook-ceph-mgr-b-d66bcd6d7-6xqqg 3/3 Running 0 3d17h
rook-ceph-mon-a-54c677947c-4bhnf 2/2 Running 1 (31h ago) 3d18h
rook-ceph-mon-b-fdf46d6f8-nb9jb 2/2 Running 0 3d17h
rook-ceph-mon-c-6498cbd495-lfvp4 2/2 Running 0 3d17h
rook-ceph-operator-68d9f8b984-w7tmj 1/1 Running 0 3d18h
rook-ceph-osd-0-6dccbf576-2vq6q 2/2 Running 0 3d17h
rook-ceph-osd-1-79cd54f5bd-ppkpr 2/2 Running 0 3d17h
rook-ceph-osd-2-6d75f79768-rkmrb 2/2 Running 0 3d17h
rook-ceph-osd-prepare-nhan-ubuntu-hgrqf 0/1 Completed 0 3d17h
rook-ceph-osd-prepare-pve-gmk-ubuntu-v4gwg 0/1 Completed 0 3d17h
rook-ceph-osd-prepare-pve-ubuntu-x5vz4 0/1 Completed 0 3d17h 最后安装 toolbox 和 dashboard: $ kubectl apply -f dashoard-ingress.yaml
$ kubectl apply -f toolbox-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
name: rook-ceph-mgr-dashboard
namespace: rook-ceph
ingressClassName: nginx
- host: rook-ceph-dashboard.ziyuan360.host
- path: /
pathType: Prefix
name: rook-ceph-mgr-dashboard
number: 7000
点击展开apiVersion: apps/v1
kind: Deployment
name: rook-ceph-tools
namespace: rook-ceph # namespace:cluster
app: rook-ceph-tools
replicas: 1
app: rook-ceph-tools
app: rook-ceph-tools
dnsPolicy: ClusterFirstWithHostNet
serviceAccountName: rook-ceph-default
- name: rook-ceph-tools
image: quay.io/ceph/ceph:v19
- /bin/bash
- -c
- |
# Replicate the script from toolbox.sh inline so the ceph image
# can be run directly, instead of requiring the rook toolbox
# create a ceph config file in its default location so ceph/rados tools can be used
# without specifying any arguments
write_endpoints() {
endpoints=$(cat ${MON_CONFIG})
# filter out the mon names
# external cluster can have numbers or hyphens in mon names, handling them in regex
# shellcheck disable=SC2001
mon_endpoints=$(echo "${endpoints}"| sed 's/[a-z0-9_-]\+=//g')
echo "$DATE writing mon endpoints to ${CEPH_CONFIG}: ${endpoints}"
cat <<EOF > ${CEPH_CONFIG}
mon_host = ${mon_endpoints}
keyring = ${KEYRING_FILE}
# watch the endpoints config file and update if the mon endpoints ever change
watch_endpoints() {
# get the timestamp for the target of the soft link
real_path=$(realpath ${MON_CONFIG})
initial_time=$(stat -c %Z "${real_path}")
while true; do
real_path=$(realpath ${MON_CONFIG})
latest_time=$(stat -c %Z "${real_path}")
if [[ "${latest_time}" != "${initial_time}" ]]; then
sleep 10
# read the secret from an env var (for backward compatibility), or from the secret file
if [[ "$ceph_secret" == "" ]]; then
ceph_secret=$(cat /var/lib/rook-ceph-mon/secret.keyring)
# create the keyring file
key = ${ceph_secret}
# write the initial config file
# continuously update the mon endpoints if they fail over
imagePullPolicy: IfNotPresent
tty: true
runAsNonRoot: true
runAsUser: 2016
runAsGroup: 2016
drop: ["ALL"]
name: rook-ceph-mon
key: ceph-username
- mountPath: /etc/ceph
name: ceph-config
- name: mon-endpoint-volume
mountPath: /etc/rook
- name: ceph-admin-secret
mountPath: /var/lib/rook-ceph-mon
readOnly: true
- name: ceph-admin-secret
secretName: rook-ceph-mon
optional: false
- key: ceph-secret
path: secret.keyring
- name: mon-endpoint-volume
name: rook-ceph-mon-endpoints
- key: data
path: mon-endpoints
- name: ceph-config
emptyDir: {}
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 5
利用 toolbox 验证集群是否正常: ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-5.1$ ceph status
id: 87554414-e7da-449e-bc2e-549a11a92c1c
mon a is low on available space
mon: 3 daemons, quorum b,a,c (age 4m)
mgr: a(active, since 3d), standbys: b
osd: 3 osds: 3 up (since 3d), 3 in (since 3d)
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 81 MiB used, 60 GiB / 60 GiB avail
pgs: 1 active+clean 我这里倒是因为 node 磁盘空间不足产生了 WARN,扩容即可。 |
创建 CephFS 给 K8s Pod 使用首先创建 fs: apiVersion: ceph.rook.io/v1
kind: CephFilesystem
name: myfs
namespace: rook-ceph
size: 3
- name: replicated
size: 3
# 设置为true表示:删除文件系统时不要删除文件
preserveFilesystemOnDelete: true
activeCount: 1
activeStandby: true 然后创建 storageclass: apiVersion: storage.k8s.io/v1
kind: StorageClass
name: rook-cephfs-monitoring
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.cephfs.csi.ceph.com
# clusterID is the namespace where the rook cluster is running
# If you change this namespace, also change the namespace below where the secret namespaces are defined
clusterID: rook-ceph
# CephFS filesystem name into which the volume shall be created
fsName: myfs
# Ceph pool into which the volume shall be created
# Required for provisionVolume: "true"
pool: myfs-replicated
# The secrets contain Ceph admin credentials. These are generated automatically by the operator
# in the same namespace as the cluster.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# 删除PVC时保留数据
reclaimPolicy: Retain 最后创建 PVC,我这里是要给 uptime 使用: apiVersion: v1
kind: PersistentVolumeClaim
name: cephfs-uptime
namespace: monitoring
- ReadWriteMany
storageClassName: rook-cephfs-monitoring
storage: 2Gi 最后配置 pod 使用即可: # 省略了其他配置...
- name: uptime-kuma
image: louislam/uptime-kuma:1.23.13
- containerPort: 3001
- name: uptime-kuma-pvc-local
mountPath: /app/data
- name: uptime-kuma-pvc-local
claimName: cephfs-uptime |
对 Ceph 压测新建一个工具 Pod: apiVersion: v1
kind: Pod
name: migration-pod
namespace: monitoring
- name: cephfs-uptime
claimName: cephfs-uptime
- name: ubuntu-container
image: ubuntu:latest
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 5; done;"]
- name: cephfs-uptime
mountPath: /data/uptime2 启动并进入容器: ziyuan@pve-gmk-ubuntu:~/k8s$ kubectl -n monitoring exec -it migration-pod -- /bin/bash 压测:
root@migration-pod:/data/uptime2# dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 104.89 s, 10.2 MB/s
root@migration-pod:/data/uptime2# dd if=/dev/zero of=512 bs=512 count=1000 oflag=direct
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 66.6174 s, 7.7 kB/s 可以看到性能差的离谱,下一步排查性能。 |
k8s 集群有个问题,有状态应用的存储怎么处理?
所以这里考虑搭建 Ceph 集群,来解决上述问题。其 CephFS 实现了标准 POSIX。而 Rook-Ceph 又是利用 k8s 简化了 Ceph 的搭建,所以这里直接用 Rook-Ceph。
The text was updated successfully, but these errors were encountered: