Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: CSIStorageCapacity #21634

Merged
merged 1 commit into from
Jul 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,7 @@ of the scheduler:
* Learn about [configuring multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/)
* Learn about [topology management policies](/docs/tasks/administer-cluster/topology-manager/)
* Learn about [Pod Overhead](/docs/concepts/configuration/pod-overhead/)
* Learn about scheduling of Pods that use volumes in:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should volume topology be here too? I guess though, there's a lot of scheduler plugins beyond storage. Do we want to list them all out here? cc @ahg-g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is volume topology documented, or more specifically, what should I link to here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode and allowedtopologies below that are probably the best sections to link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does sound like it might be worth filing an issue about volume topology. @pohly can you describe the page you were hoping to find, in an issue against k/website?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see #22506

* [Volume Topology Support](/docs/concepts/storage/storage-classes/#volume-binding-mode)
* [Storage Capacity Tracking](/docs/concepts/storage/storage-capacity/)
* [Node-specific Volume Limits](/docs/concepts/storage/storage-limits/)
134 changes: 134 additions & 0 deletions content/en/docs/concepts/storage/storage-capacity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
reviewers:
- jsafrane
- saad-ali
- msau42
- xing-yang
- pohly
title: Storage Capacity
content_type: concept
weight: 45
---

<!-- overview -->

Storage capacity is limited and may vary depending on the node on
which a pod runs: network-attached storage might not be accessible by
all nodes, or storage is local to a node to begin with.

pohly marked this conversation as resolved.
Show resolved Hide resolved
{{< feature-state for_k8s_version="v1.19" state="alpha" >}}

This page describes how Kubernetes keeps track of storage capacity and
how the scheduler uses that information to schedule Pods onto nodes
that have access to enough storage capacity for the remaining missing
volumes. Without storage capacity tracking, the scheduler may choose a
node that doesn't have enough capacity to provision a volume and
multiple scheduling retries will be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'd like to see {{< glossary_tooltip text="CSI" term_id="csi" >}} worked into the early part of this page, so that readers see an explanatory tooltip. {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} (CSI) is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Tracking storage capacity is supported for {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} (CSI) drivers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually made it even more complex by making it a link to /docs/concepts/storage/volumes/#csi - let's see whether that renders correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does (checked in the preview).

Tracking storage capacity is supported for {{< glossary_tooltip
text="Container Storage Interface" term_id="csi" >}} (CSI) drivers and
[needs to be enabled](#enabling-storage-capacity-tracking) when installing a CSI driver.

<!-- body -->

## API

There are two API extensions for this feature:
- [CSIStorageCapacity](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io) objects:
Copy link
Contributor

@kbhawkey kbhawkey Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these resource links will work once v1.19 is released.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

these get produced by a CSI driver in the namespace
where the driver is installed. Each object contains capacity
information for one storage class and defines which nodes have
access to that storage.
- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io):
when set to `true`, the Kubernetes scheduler will consider storage
capacity for volumes that use the CSI driver.

## Scheduling

Storage capacity information is used by the Kubernetes scheduler if:
- the `CSIStorageCapacity` feature gate is true,
- a Pod uses a volume that has not been created yet,
- that volume uses a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} which references a CSI driver and
uses `WaitForFirstConsumer` [volume binding
mode](/docs/concepts/storage/storage-classes/#volume-binding-mode),
and
- the `CSIDriver` object for the driver has `StorageCapacity` set to
true.

In that case, the scheduler only considers nodes for the Pod which
have enough storage available to them. This check is very
simplistic and only compares the size of the volume against the
capacity listed in `CSIStorageCapacity` objects with a topology that
includes the node.

For volumes with `Immediate` volume binding mode, the storage driver
decides where to create the volume, independently of Pods that will
use the volume. The scheduler then schedules Pods onto nodes where the
volume is available after the volume has been created.

For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi),
scheduling always happens without considering storage capacity. This
is based on the assumption that this volume type is only used by
special CSI drivers which are local to a node and do not need
significant resources there.

## Rescheduling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a section for known issues/limitations with some workarounds? Things like a Pod requesting 2 volumes, # retries when capacity is close to fully utilized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a "Limitations" section. It's mostly about the "multiple volume" case. I prefer to not make too many statements about # retries because we don't have much actual experience with that situation yet.


When a node has been selected for a Pod with `WaitForFirstConsumer`
volumes, that decision is still tentative. The next step is that the
CSI storage driver gets asked to create the volume with a hint that the
volume is supposed to be available on the selected node.

Because Kubernetes might have chosen a node based on out-dated
capacity information, it is possible that the volume cannot really be
created. The node selection is then reset and the Kubernetes scheduler
tries again to find a node for the Pod.

## Limitations

Storage capacity tracking increases the chance that scheduling works
on the first try, but cannot guarantee this because the scheduler has
to decide based on potentially out-dated information. Usually, the
same retry mechanism as for scheduling without any storage capacity
information handles scheduling failures.

One situation where scheduling can fail permanently is when a Pod uses
multiple volumes: one volume might have been created already in a
topology segment which then does not have enough capacity left for
another volume. Manual intervention is necessary to recover from this,
for example by increasing capacity or deleting the volume that was
already created. [Further
work](https://github.com/kubernetes/enhancements/pull/1703) is needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to list a link to a pull request? This link is not stable.
Could you list this issue in the release notes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the KEP hasn't even be merged as provisional yet, all we have is this link to the proposal.

I'll make sure to list it in the release notes.

to handle this automatically.

## Enabling storage capacity tracking

Storage capacity tracking is an *alpha feature* and only enabled when
the `CSIStorageCapacity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled. A quick check
whether a Kubernetes cluster supports the feature is to list
CSIStorageCapacity objects with:
```shell
kubectl get csistoragecapacities --all-namespaces
```

If your cluster supports CSIStorageCapacity, the response is either a list of CSIStorageCapacity objects or:
```
No resources found
```

If not supported, this error is printed instead:
```
error: the server doesn't have a resource type "csistoragecapacities"
```

In addition to enabling the feature in the cluster, a CSI
driver also has to
support it. Please refer to the driver's documentation for
details.

## {{% heading "whatsnext" %}}

sftim marked this conversation as resolved.
Show resolved Hide resolved
- For more information on the design, see the
[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md).
- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472).
Copy link
Contributor

@kbhawkey kbhawkey Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a concept page, linking to the enhancement tracking issue does not add much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I beg to differ here. For alpha features, one key question for users is when the feature will graduate or how it will change during future development. The enhancement issue is where they can find answers to those questions.

I can remove it in a follow-up PR if this argument is convincing, but let's merge this PR first, okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

- Learn about [Kubernetes Scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/)
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ different Kubernetes components.
| `CSIMigrationOpenStackComplete` | `false` | Alpha | 1.17 | |
| `CSIMigrationvSphere` | `false` | Beta | 1.19 | |
| `CSIMigrationvSphereComplete` | `false` | Beta | 1.19 | |
| `CSIStorageCapacity` | `false` | Alpha | 1.19 | |
| `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | |
| `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | |
| `CustomResourceDefaulting` | `false` | Alpha| 1.15 | 1.15 |
Expand Down Expand Up @@ -397,6 +398,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a
[CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md)
compatible volume plugin.
- `CSIStorageCapacity`: Enables CSI drivers to publish storage capacity information and the Kubernetes scheduler to use that information when scheduling pods. See [Storage Capacity](/docs/concepts/storage/storage-capacity/).
Check the [`csi` volume type](/docs/concepts/storage/volumes/#csi) documentation for more details.
- `CustomCPUCFSQuotaPeriod`: Enable nodes to change CPUCFSQuotaPeriod.
- `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.
Expand Down