diff --git a/content/en/docs/concepts/storage/storage-capacity.md b/content/en/docs/concepts/storage/storage-capacity.md new file mode 100644 index 0000000000000..f8302bcb11972 --- /dev/null +++ b/content/en/docs/concepts/storage/storage-capacity.md @@ -0,0 +1,113 @@ +--- +reviewers: +- jsafrane +- saad-ali +- msau42 +- xing-yang +- pohly +title: Storage Capacity +content_type: concept +weight: 45 +--- + + + +Storage capacity is limited and may vary depending on the node on +which a pod runs: network-attached storage might not be accessible by +all nodes, or storage is local to a node to begin with. + +This page describes how Kubernetes keeps track of storage capacity and +how the scheduler uses that information to schedule pods. + + + + +## Enabling the feature + +Storage capacity tracking is an *alpha feature* and only enabled when +the `CSIStorageCapacity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled. A quick check +whether a Kubernetes cluster supports the feature is to list +`CSIStorageCapacity` objects with: +```shell +kubectl get csistoragecapacities --all-namespaces +``` + +If supported, the response will a list of objects or: +``` +No resources found +``` + +If not supported, this error is printed instead: +``` +error: the server doesn't have a resource type "csistoragecapacities" +``` + +In addition to enabling the feature in the cluster, a [CSI +driver](/docs/concepts/storage/volumes/#csi) deployment also has to +support it. Please refer to the driver's documentation for +details. The feature is not supported for non-CSI storage systems. + +Without this support, there will be no information about storage +capacity available through the driver and the scheduler will schedule +Pods with volumes provided by the driver without looking for capacity +information. + +## API + +There are two API extensions for this feature: +- [`CSIStorageCapacity` objects](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io): + these get produced by a CSI driver in the namespace + where the driver is installed. Each object contains capacity + information for one storage class and defines which nodes have + access to that storage. +- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io): + when set to `true`, the Kubernetes scheduler will consider storage + capacity for volumes that use the CSI driver. + +## Scheduling + +Storage capacity information is used by the Kubernetes scheduler if: +- the `CSIStorageCapacity` feature gate is true, +- a Pod uses a volume that has not been created yet, +- that volume uses a storage class which references a CSI driver and + uses [`WaitForFirstConsumer` volume binding + mode](/docs/concepts/storage/storage-classes/#volume-binding-mode), + and +- the `CSIDriver` object for the driver has `StorageCapacity` set to + true. + +In that case, the scheduler only considers nodes for the Pod which +have enough storage available to them. This check is very +simplistic and only compares the size of the volume against the +capacity listed in `CSIStorageCapacity` objects with a topology that +includes the node. Without storage capacity tracking, nodes are picked +without this check. + +For volumes with `Immediate` volume binding mode, the storage driver +decides where to create the volume, independently of Pods that will +use the volume. The scheduler then schedules Pods onto nodes where the +volume is available after the volume has been created. + +For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi), +scheduling always happens without considering storage capacity. This +is based on the assumption that this volume type is only used by +special CSI drivers which are local to a node and do not need +significant resources there. + +## Rescheduling + +When a node has been selected for a Pod with `WaitForFirstConsumer` +volumes, that decision is still tentative. The next step is that the +CSI storage driver gets asked to create the volume with a hint that the +volume is supposed to be available on the selected node. + +Because Kubernetes might have chosen a node based on out-dated +capacity information, it is possible that the volume cannot really be +created. The node selection is then reset and the Kubernetes scheduler +tries again to find a node for the Pod. + +## {{% heading "whatsnext" %}} + + - For more information on the design, see the +[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md). +- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472). diff --git a/content/en/docs/concepts/storage/volumes.md b/content/en/docs/concepts/storage/volumes.md index b7bcb370ad23e..49c4f6b2a0f80 100644 --- a/content/en/docs/concepts/storage/volumes.md +++ b/content/en/docs/concepts/storage/volumes.md @@ -1291,8 +1291,11 @@ Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the `csi` volume type to attach, mount, etc. the volumes exposed by the CSI driver. -The `csi` volume type does not support direct reference from Pod and may only be -referenced in a Pod via a `PersistentVolumeClaim` object. +A `csi` volume can be used in a pod in three different ways: +- through a reference to a [`persistentVolumeClaim`](#persistentvolumeclaim) +- with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/) +- with a [CSI ephemeral volume](#csi-ephemeral-volume) if the driver + supports that The following fields are available to storage administrators to configure a CSI persistent volume: diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 5422fb92025e8..77a25f980965c 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -76,6 +76,7 @@ different Kubernetes components. | `CSIMigrationGCEComplete` | `false` | Alpha | 1.17 | | | `CSIMigrationOpenStack` | `false` | Alpha | 1.14 | | | `CSIMigrationOpenStackComplete` | `false` | Alpha | 1.17 | | +| `CSIStorageCapacity` | `false` | Alpha | 1.19 | | | `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | | | `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | | | `CustomResourceDefaulting` | `false` | Alpha| 1.15 | 1.15 | @@ -388,6 +389,7 @@ Each feature gate is designed for enabling/disabling a specific feature: - `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a [CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md) compatible volume plugin. +- `CSIStorageCapacity`: Enables CSI drivers to publish storage capacity information and the Kubernetes scheduler to use that information when scheduling pods. See [Storage Capacity](/docs/concepts/storage/storage-capacity/). Check the [`csi` volume type](/docs/concepts/storage/volumes/#csi) documentation for more details. - `CustomCPUCFSQuotaPeriod`: Enable nodes to change CPUCFSQuotaPeriod. - `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.