From 9bdfeae3396b8790122c893038a959de2bd72227 Mon Sep 17 00:00:00 2001 From: Michelle Au Date: Fri, 7 Sep 2018 16:30:51 -0700 Subject: [PATCH 1/5] Document topology aware volume binding feature --- .../concepts/storage/dynamic-provisioning.md | 7 ++ .../docs/concepts/storage/storage-classes.md | 96 ++++++++++++++++--- .../feature-gates.md | 2 +- content/en/docs/setup/multiple-zones.md | 16 ++-- 4 files changed, 102 insertions(+), 19 deletions(-) diff --git a/content/en/docs/concepts/storage/dynamic-provisioning.md b/content/en/docs/concepts/storage/dynamic-provisioning.md index ee8c0777d8d3f..3ffdab1f49a84 100644 --- a/content/en/docs/concepts/storage/dynamic-provisioning.md +++ b/content/en/docs/concepts/storage/dynamic-provisioning.md @@ -124,6 +124,13 @@ Note that there can be at most one *default* storage class on a cluster, or a `PersistentVolumeClaim` without `storageClassName` explicitly specified cannot be created. +## Topology Awareness + +In [multi-zone](/docs/setup/multiple-zones) clusters, pods can be spread across +zones and single-zone storage backends should be provisioned in the zones where +pods are scheduled. This can be accomplished by setting the [volume binding +mode](/docs/concepts/storage/storage-classes/#volume-binding-mode). + {{% /capture %}} diff --git a/content/en/docs/concepts/storage/storage-classes.md b/content/en/docs/concepts/storage/storage-classes.md index 639de22ed0194..9cc86d9df2858 100644 --- a/content/en/docs/concepts/storage/storage-classes.md +++ b/content/en/docs/concepts/storage/storage-classes.md @@ -57,6 +57,7 @@ parameters: reclaimPolicy: Retain mountOptions: - debug +volumeBindingMode: Immediate ``` ### Provisioner @@ -66,7 +67,7 @@ for provisioning PVs. This field must be specified. | Volume Plugin | Internal Provisioner| Config Example | | :--- | :---: | :---: | -| AWSElasticBlockStore | ✓ | [AWS](#aws) | +| AWSElasticBlockStore | ✓ | [AWS EBS](#aws-ebs) | | AzureFile | ✓ | [Azure File](#azure-file) | | AzureDisk | ✓ | [Azure Disk](#azure-disk) | | CephFS | - | - | @@ -74,7 +75,7 @@ for provisioning PVs. This field must be specified. | FC | - | - | | FlexVolume | - | - | | Flocker | ✓ | - | -| GCEPersistentDisk | ✓ | [GCE](#gce) | +| GCEPersistentDisk | ✓ | [GCE PD](#gce-pd) | | Glusterfs | ✓ | [Glusterfs](#glusterfs) | | iSCSI | - | - | | Quobyte | ✓ | [Quobyte](#quobyte) | @@ -120,6 +121,75 @@ If the volume plugin does not support mount options but mount options are specified, provisioning will fail. Mount options are not validated on either the class or PV, so mount of the PV will simply fail if one is invalid. +### Volume Binding Mode + +{{< feature-state for_k8s_version="v1.12" state="beta" >}} + +**Note:** This feature requires the `VolumeScheduling` feature gate to be +enabled. + +The `volumeBindingMode` field controls when [volume binding and dynamic +provisioning](/docs/concepts/storage/persistent-volumes/#provisioning) should occur. + +By default, the `Immediate` mode indicates that volume binding and dynamic +provisioning occurs once the PVC is created. For storage +backends that are topology-constrained and not globally accessible from all nodes +in the cluster (for example, a zonal disk, or a local volume), this causes +volumes to be bound or provisioned without knowledge of the pod's scheduling +requirements, and can result in unschedulable pods. + +To address this issue, the `WaitForFirstConsumer` mode can be specified which +will delay binding and provisioning until a pod using the PVC is created. +Volumes will be selected or provisioned with the appropriate topology that is +compatible with the pod's scheduling constraints, including but not limited to, [resource +requirements](/docs/concepts/configuration/manage-compute-resources-container), +[node selectors](/docs/concepts/configuration/assign-pod-node/#nodeselector), +[pod affinity and +anti-affinity](/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity), +and [taints and tolerations](/docs/concepts/configuration/taint-and-toleration). + +The following plugins support `WaitForFirstConsumer` with dynamic provisioning: + +* [AWSElasticBlockStore](#aws-ebs) +* [GCEPersistentDisk](#gce-pd) +* [AzureDisk](#azure-disk) + +The following plugins support `WaitForFirstConsumer` with pre-created PV binding: + +* All of the above +* [Local](#local) + +### Allowed Topologies +{{< feature-state for_k8s_version="v1.12" state="beta" >}} + +**Note:** This feature requires the `VolumeScheduling` feature gate to be +enabled. + +When `WaitForFirstConsumer` volume binding mode is specified, it is no longer necessary +to restrict provisioning to specific topologies in most situations. However, +if still required, `allowedTopologies` can be specified. + +This example demonstrates how to restrict topology of provisioned volumes to specific +zones and should be used as a replacement for the `zone` and `zones` parameters for the +supported plugins. + +```yaml +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: standard +provisioner: kubernetes.io/gce-pd +parameters: + type: pd-standard +volumeBindingMode: WaitForFirstConsumer +allowedTopologies: +- matchLabelExpressions: + - key: failure-domain.beta.kubernetes.io/zone + values: + - us-central1-a + - us-cetnral1-b +``` + ## Parameters Storage classes have parameters that describe volumes belonging to the storage @@ -128,7 +198,7 @@ class. Different parameters may be accepted depending on the `provisioner`. For `iopsPerGB` are specific to EBS. When a parameter is omitted, some default is used. -### AWS +### AWS EBS ```yaml kind: StorageClass @@ -138,17 +208,16 @@ metadata: provisioner: kubernetes.io/aws-ebs parameters: type: io1 - zones: us-east-1d, us-east-1c iopsPerGB: "10" ``` * `type`: `io1`, `gp2`, `sc1`, `st1`. See [AWS docs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) for details. Default: `gp2`. -* `zone`: AWS zone. If neither `zone` nor `zones` is specified, volumes are +* `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node. `zone` and `zones` parameters must not be used at the same time. -* `zones`: A comma separated list of AWS zone(s). If neither `zone` nor `zones` +* `zones` (Deprecated): A comma separated list of AWS zone(s). If neither `zone` nor `zones` is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node. `zone` and `zones` parameters must not be used at the same time. @@ -164,7 +233,10 @@ parameters: encrypting the volume. If none is supplied but `encrypted` is true, a key is generated by AWS. See AWS docs for valid ARN value. -### GCE +**Note:** `zone` and `zones` parameters are deprecated and replaced with +[allowedTopologies](#allowed-topologies) + +### GCE PD ```yaml kind: StorageClass @@ -174,15 +246,14 @@ metadata: provisioner: kubernetes.io/gce-pd parameters: type: pd-standard - zones: us-central1-a, us-central1-b replication-type: none ``` * `type`: `pd-standard` or `pd-ssd`. Default: `pd-standard` -* `zone`: GCE zone. If neither `zone` nor `zones` is specified, volumes are +* `zone` (Deprecated): GCE zone. If neither `zone` nor `zones` is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node. `zone` and `zones` parameters must not be used at the same time. -* `zones`: A comma separated list of GCE zone(s). If neither `zone` nor `zones` +* `zones` (Deprecated): A comma separated list of GCE zone(s). If neither `zone` nor `zones` is specified, volumes are generally round-robin-ed across all active zones where Kubernetes cluster has a node. `zone` and `zones` parameters must not be used at the same time. @@ -199,6 +270,9 @@ specified, Kubernetes will arbitrarily choose among the specified zones. If the `zones` parameter is omitted, Kubernetes will arbitrarily choose among zones managed by the cluster. +**Note:** `zone` and `zones` parameters are deprecated and replaced with +[allowedTopologies](#allowed-topologies) + ### Glusterfs ```yaml @@ -678,4 +752,4 @@ Delaying volume binding allows the scheduler to consider all of a pod's scheduling constraints when choosing an appropriate PersistentVolume for a PersistentVolumeClaim. -{{% /capture %}} \ No newline at end of file +{{% /capture %}} diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index ee0669ac953a2..441188ad6fae9 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -56,7 +56,7 @@ different Kubernetes components. | `DevicePlugins` | `true` | Beta | 1.10 | | | `DynamicKubeletConfig` | `false` | Alpha | 1.4 | 1.10 | | `DynamicKubeletConfig` | `true` | Beta | 1.11 | | -| `DynamicProvisioningScheduling` | `false` | Alpha | 1.11 | | +| `DynamicProvisioningScheduling` | `false` | Alpha | 1.11 | 1.11 | | `DynamicVolumeProvisioning` | `true` | Alpha | 1.3 | 1.7 | | `DynamicVolumeProvisioning` | `true` | GA | 1.8 | | | `EnableEquivalenceClassCache` | `false` | Alpha | 1.8 | | diff --git a/content/en/docs/setup/multiple-zones.md b/content/en/docs/setup/multiple-zones.md index fb6c313fca2e3..6a95eb19a282e 100644 --- a/content/en/docs/setup/multiple-zones.md +++ b/content/en/docs/setup/multiple-zones.md @@ -72,18 +72,20 @@ available and can tolerate the loss of a zone, the control plane is located in a single zone. Users that want a highly available control plane should follow the [high availability](/docs/admin/high-availability) instructions. +### Volume limitations +The following limitations are addressed with [topology-aware volume binding](/docs/concepts/storage/storage-classes/#volume-binding-mode). + * StatefulSet volume zone spreading when using dynamic provisioning is currently not compatible with -pod affinity or anti-affinity policies. + pod affinity or anti-affinity policies. * If the name of the StatefulSet contains dashes ("-"), volume zone spreading -may not provide a uniform distribution of storage across zones. + may not provide a uniform distribution of storage across zones. * When specifying multiple PVCs in a Deployment or Pod spec, the StorageClass -needs to be configured for a specific, single zone, or the PVs need to be -statically provisioned in a specific zone. Another workaround is to use a -StatefulSet, which will ensure that all the volumes for a replica are -provisioned in the same zone. - + needs to be configured for a specific, single zone, or the PVs need to be + statically provisioned in a specific zone. Another workaround is to use a + StatefulSet, which will ensure that all the volumes for a replica are + provisioned in the same zone. ## Walkthrough From 6655d3f07beaaeae40a434fb0d24c2db05dd0dcf Mon Sep 17 00:00:00 2001 From: Zach Arnold Date: Sun, 9 Sep 2018 17:41:39 -0500 Subject: [PATCH 2/5] update for readability --- content/en/docs/concepts/storage/dynamic-provisioning.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/en/docs/concepts/storage/dynamic-provisioning.md b/content/en/docs/concepts/storage/dynamic-provisioning.md index 3ffdab1f49a84..cb180fb706f72 100644 --- a/content/en/docs/concepts/storage/dynamic-provisioning.md +++ b/content/en/docs/concepts/storage/dynamic-provisioning.md @@ -126,10 +126,10 @@ be created. ## Topology Awareness -In [multi-zone](/docs/setup/multiple-zones) clusters, pods can be spread across -zones and single-zone storage backends should be provisioned in the zones where -pods are scheduled. This can be accomplished by setting the [volume binding -mode](/docs/concepts/storage/storage-classes/#volume-binding-mode). +In [Multi-Zone](/docs/setup/multiple-zones) clusters, Pods can be spread across +Zones in a Region. Single-Zone storage backends should be provisioned in the Zones where +Pods are scheduled. This can be accomplished by setting the [Volume Binding +Mode](/docs/concepts/storage/storage-classes/#volume-binding-mode). {{% /capture %}} From c11b1deb92fafa9aea1ee7658e0a5ef9f1265ee2 Mon Sep 17 00:00:00 2001 From: Zach Arnold Date: Sun, 9 Sep 2018 18:24:54 -0500 Subject: [PATCH 3/5] Update storage-classes.md --- .../docs/concepts/storage/storage-classes.md | 21 +++++++++---------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/content/en/docs/concepts/storage/storage-classes.md b/content/en/docs/concepts/storage/storage-classes.md index 9cc86d9df2858..8cc095f6e768a 100644 --- a/content/en/docs/concepts/storage/storage-classes.md +++ b/content/en/docs/concepts/storage/storage-classes.md @@ -133,15 +133,14 @@ provisioning](/docs/concepts/storage/persistent-volumes/#provisioning) should oc By default, the `Immediate` mode indicates that volume binding and dynamic provisioning occurs once the PVC is created. For storage -backends that are topology-constrained and not globally accessible from all nodes -in the cluster (for example, a zonal disk, or a local volume), this causes -volumes to be bound or provisioned without knowledge of the pod's scheduling -requirements, and can result in unschedulable pods. - -To address this issue, the `WaitForFirstConsumer` mode can be specified which -will delay binding and provisioning until a pod using the PVC is created. -Volumes will be selected or provisioned with the appropriate topology that is -compatible with the pod's scheduling constraints, including but not limited to, [resource +backends that are topology-constrained and not globally accessible from all Nodes +in the cluster, Volumes will be bound or provisioned without knowledge of the Pod's scheduling +requirements. This may result in unschedulable Pods. + +A cluster administrator can address this issue by specifying the `WaitForFirstConsumer` mode which +will delay the binding and provisioning of a Volume until a Pod using the PVC is created. +Volumes will be selected or provisioned conforming to the topology that is +specified by the Pod's scheduling constraints. These include, but are not limited to, [resource requirements](/docs/concepts/configuration/manage-compute-resources-container), [node selectors](/docs/concepts/configuration/assign-pod-node/#nodeselector), [pod affinity and @@ -165,11 +164,11 @@ The following plugins support `WaitForFirstConsumer` with pre-created PV binding **Note:** This feature requires the `VolumeScheduling` feature gate to be enabled. -When `WaitForFirstConsumer` volume binding mode is specified, it is no longer necessary +When a cluster operactor specifies the `WaitForFirstConsumer` volume binding mode, it is no longer necessary to restrict provisioning to specific topologies in most situations. However, if still required, `allowedTopologies` can be specified. -This example demonstrates how to restrict topology of provisioned volumes to specific +This example demonstrates how to restrict the topology of provisioned volumes to specific zones and should be used as a replacement for the `zone` and `zones` parameters for the supported plugins. From 8f7001f024c02ff34573c0999eef71a5007b02a7 Mon Sep 17 00:00:00 2001 From: Zach Arnold Date: Sun, 9 Sep 2018 18:25:44 -0500 Subject: [PATCH 4/5] comma splice --- content/en/docs/setup/multiple-zones.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/setup/multiple-zones.md b/content/en/docs/setup/multiple-zones.md index 6a95eb19a282e..8125527931c71 100644 --- a/content/en/docs/setup/multiple-zones.md +++ b/content/en/docs/setup/multiple-zones.md @@ -82,7 +82,7 @@ The following limitations are addressed with [topology-aware volume binding](/do may not provide a uniform distribution of storage across zones. * When specifying multiple PVCs in a Deployment or Pod spec, the StorageClass - needs to be configured for a specific, single zone, or the PVs need to be + needs to be configured for a specific single zone, or the PVs need to be statically provisioned in a specific zone. Another workaround is to use a StatefulSet, which will ensure that all the volumes for a replica are provisioned in the same zone. From 99888660145db57ddd0de779c154a90ad1980f3c Mon Sep 17 00:00:00 2001 From: Michelle Au Date: Mon, 10 Sep 2018 10:41:32 -0700 Subject: [PATCH 5/5] don't abbreviate --- content/en/docs/concepts/storage/storage-classes.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/content/en/docs/concepts/storage/storage-classes.md b/content/en/docs/concepts/storage/storage-classes.md index 8cc095f6e768a..984ed35e14e35 100644 --- a/content/en/docs/concepts/storage/storage-classes.md +++ b/content/en/docs/concepts/storage/storage-classes.md @@ -132,14 +132,14 @@ The `volumeBindingMode` field controls when [volume binding and dynamic provisioning](/docs/concepts/storage/persistent-volumes/#provisioning) should occur. By default, the `Immediate` mode indicates that volume binding and dynamic -provisioning occurs once the PVC is created. For storage +provisioning occurs once the PersistentVolumeClaim is created. For storage backends that are topology-constrained and not globally accessible from all Nodes -in the cluster, Volumes will be bound or provisioned without knowledge of the Pod's scheduling +in the cluster, PersistentVolumes will be bound or provisioned without knowledge of the Pod's scheduling requirements. This may result in unschedulable Pods. A cluster administrator can address this issue by specifying the `WaitForFirstConsumer` mode which -will delay the binding and provisioning of a Volume until a Pod using the PVC is created. -Volumes will be selected or provisioned conforming to the topology that is +will delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created. +PersistentVolumes will be selected or provisioned conforming to the topology that is specified by the Pod's scheduling constraints. These include, but are not limited to, [resource requirements](/docs/concepts/configuration/manage-compute-resources-container), [node selectors](/docs/concepts/configuration/assign-pod-node/#nodeselector), @@ -153,7 +153,7 @@ The following plugins support `WaitForFirstConsumer` with dynamic provisioning: * [GCEPersistentDisk](#gce-pd) * [AzureDisk](#azure-disk) -The following plugins support `WaitForFirstConsumer` with pre-created PV binding: +The following plugins support `WaitForFirstConsumer` with pre-created PersistentVolume binding: * All of the above * [Local](#local) @@ -186,7 +186,7 @@ allowedTopologies: - key: failure-domain.beta.kubernetes.io/zone values: - us-central1-a - - us-cetnral1-b + - us-central1-b ``` ## Parameters