-
Notifications
You must be signed in to change notification settings - Fork 474
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add initial OpenShift swap enhancement
- Loading branch information
Showing
1 changed file
with
265 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
--- | ||
title: node-swap | ||
authors: | ||
- "@ehashman" | ||
reviewers: | ||
- "@rphilips" | ||
- "@sjenning" | ||
- "???" | ||
approvers: | ||
- "@mrunalp" | ||
creation-date: "2021-06-23" | ||
status: implementable | ||
--- | ||
|
||
# OpenShift Node Swap Support | ||
|
||
## Release Signoff Checklist | ||
|
||
- [X] Enhancement is `implementable` | ||
- [X] Design details are appropriately documented from clear requirements | ||
- [X] Test plan is defined | ||
- [X] Operational readiness criteria is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
The upstream Kubernetes 1.22 release introduced alpha support for configuring | ||
swap memory usage for Kubernetes workloads on a per-node basis. | ||
|
||
Now that swap use on nodes is supported in upstream, there are a number of use | ||
cases that would benefit from OpenShift nodes supporting swap, including | ||
improved node stability, better support for applications with high memory | ||
overhead but smaller working sets, the use of memory-constrained devices, and | ||
memory flexibility. | ||
|
||
## Motivation | ||
|
||
See [KEP-2400: Motivation]. | ||
|
||
[KEP-2400: Motivation]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#motivation | ||
|
||
### Goals | ||
|
||
- Swap can be provisioned and configured for nodes to use in an OpenShift | ||
cluster. | ||
|
||
### Non-Goals | ||
|
||
- Workload-specific swap accounting. | ||
- Any of the non-goals in [KEP-2400: Non-goals]. | ||
|
||
[KEP-2400: Non-goals]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#non-goals | ||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
See [KEP-2400: User Stories](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#user-stories). | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
See [KEP-2400: Notes/Constraints/Caveats](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#notesconstraintscaveats-optional). | ||
|
||
### Risks and Mitigations | ||
|
||
See [KEP-2400: Risks and Mitigations]. | ||
|
||
[KEP-2400: Risks and Mitigations]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#risks-and-mitigations | ||
|
||
## Design Details | ||
|
||
### Enable `NodeSwap` feature gate | ||
|
||
We will add a new FeatureSet `NodeSwapNoUpgrade` with feature gate `NodeSwap` | ||
enabled for [Tech Preview]. | ||
|
||
Note that we will not add this to the existing `TechPreviewNoUpgrade` feature | ||
set list, as we do not wish to enable this feature simultaneously with other | ||
tech preview features. Note that swap cannot be enabled or used without | ||
additional configuration, so there isn't necessarily a risk of adding it to the | ||
existing Tech Preview feature set; rather, for testing swap, we do not want to | ||
require users to also turn on other features. | ||
|
||
Without this change, it is possible to enable the feature gate on any 4.9+ | ||
cluster by defining `NodeSwap` in a `customNoUpgrade` configuration: | ||
|
||
```yaml | ||
apiVersion: config.openshift.io/v1 | ||
kind: FeatureGate | ||
metadata: | ||
name: cluster | ||
spec: | ||
featureSet: CustomNoUpgrade | ||
customNoUpgrade: | ||
enabled: | ||
- NodeSwap | ||
``` | ||
Once the `NodeSwap` feature flag is enabled, a cluster admin can enable swap | ||
usage on the cluster as follows: | ||
|
||
### Ensure component versions support swap | ||
|
||
While swap support has been available in Kubernetes since the 1.22.0 release, | ||
the container runtimes also need support in order for Kubernetes workloads to | ||
be able to use the feature correctly. | ||
|
||
Therefore, a user of OpenShift version >4.9.0 can enable the feature flag, but | ||
they also will require a version of CRI-O that supports the | ||
`MemorySwapLimitInBytes` for best results. This is only supported in the 1.23 | ||
and onwards releases of CRI-O, and should be supported in the first release of | ||
OpenShift 4.10. | ||
|
||
### Configure worker Kubelets | ||
|
||
Swap behaviour on a node can be configured with | ||
[`memorySwap.SwapBehavior`](https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory)). | ||
|
||
The most straightforward way to configure the kubelets with this feature is | ||
with a custom `KubeletConfig` that will automatically be applied by the MCO: | ||
|
||
```bash | ||
# Enable the custom kubelet config on the worker pool | ||
oc label machineconfigpool worker custom-kubelet=enabled | ||
``` | ||
|
||
```yaml | ||
apiVersion: machineconfiguration.openshift.io/v1 | ||
kind: KubeletConfig | ||
metadata: | ||
name: custom-config | ||
spec: | ||
machineConfigPoolSelector: | ||
matchLabels: | ||
custom-kubelet: enabled | ||
kubeletConfig: | ||
failSwapOn: false | ||
memorySwap: | ||
swapBehavior: UnlimitedSwap # LimitedSwap is also supported | ||
``` | ||
|
||
Note that enabling swap on control plane nodes is possible but **not** recommended. | ||
|
||
### Add swap to nodes | ||
|
||
There are a few different ways this can be accomplished. The most | ||
straightforward is to add a kubelet configuration that enables a swapfile at | ||
startup with a custom machine config, the same way we enable swap in upstream | ||
CI: | ||
|
||
```yaml | ||
apiVersion: machineconfiguration.openshift.io/v1 | ||
kind: MachineConfig | ||
metadata: | ||
labels: | ||
machineconfiguration.openshift.io/role: worker | ||
name: 90-worker-swap | ||
spec: | ||
config: | ||
ignition: | ||
version: 3.2.0 | ||
systemd: | ||
units: | ||
- contents: | | ||
[Unit] | ||
Description=Enable swap on CoreOS | ||
Before=crio-install.service | ||
ConditionFirstBoot=no | ||
[Service] | ||
Type=oneshot | ||
ExecStart=/bin/sh -c "sudo dd if=/dev/zero of=/var/swapfile count=1024 bs=1MiB && sudo chmod 600 /var/swapfile && sudo mkswap /var/swapfile && sudo swapon /var/swapfile && free -h" | ||
[Install] | ||
WantedBy=multi-user.target | ||
enabled: true | ||
name: swap-enable.service | ||
``` | ||
|
||
It is also possible to use [ignition | ||
configs](https://coreos.github.io/ignition/configuration-v3_3/) to add swap | ||
partitions to worker nodes with `filesystems.format = swap`. | ||
|
||
Beyond tech preview, we may want to look into adding swap support to the | ||
installer, and consider adding a partition on the root node, or perhaps | ||
[provision and mount an NVMe | ||
volume](https://github.com/openshift/machine-config-operator/issues/1619). | ||
|
||
### Open Questions [optional] | ||
|
||
- Will we eventually want to enable swap on all OpenShift nodes by default? | ||
- Should swap just be limited to worker nodes, or should we consider adding it | ||
to control plane nodes too? | ||
|
||
### Test Plan | ||
|
||
In addition to the upstream e2e tests, we will need to add e2e suites to | ||
OpenShift in order to exercise provisioning and use of swap. This may include | ||
unit tests where appropriate, such as the MCO. | ||
|
||
### Graduation Criteria | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
Requires alpha support in upstream Kubernetes (1.22+) and support in CRI-O | ||
(1.23+). | ||
|
||
- Support provisioning OpenShift nodes with swap enabled for all available | ||
upstream swap configurations (currently `LimitedSwap`, `UnlimitedSwap`). | ||
|
||
JIRA: https://issues.redhat.com/browse/OCPNODE-470 | ||
|
||
_Graduation criteria below are tentative._ | ||
|
||
#### Tech Preview -> GA | ||
|
||
Requires beta/GA support in upstream Kubernetes. (1.25?+) | ||
|
||
- More testing (upgrade, downgrade, scale) | ||
- Sufficient time for feedback | ||
- Available by default | ||
- Backhaul SLI telemetry | ||
- Document SLOs for the component | ||
- Conduct load testing | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
The `NodeSwap` feature flag is not supported in Kubernetes versions prior to | ||
1.22/OpenShift 4.9. We will add the upstream `NodeSwap` feature flag as a | ||
"NoUpgrade" feature flag to prevent upgrades. | ||
|
||
Note that swap support does not require coordination between components and the | ||
configuration is limited to individual nodes/machine pools. | ||
|
||
See also [KEP-2400: Upgrade/Downgrade Strategy]. | ||
|
||
[Tech Preview]: https://github.com/openshift/enhancements/blob/ce4d303db807622687159eb9d3248285a003fabb/guidelines/techpreview.md#official-processmechanism-for-delivering-a-tp-feature | ||
[KEP-2400: Upgrade/Downgrade Strategy]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#upgrade--downgrade-strategy | ||
|
||
### Version Skew Strategy | ||
|
||
N/A, this is a compatible API change limited to the Kubelet that does not | ||
require coordination with the API Server. | ||
|
||
## Implementation History | ||
|
||
- [Upstream alpha swap support] completed in 1.22. | ||
|
||
[Upstream alpha swap support]: https://github.com/kubernetes/enhancements/issues/2400#issuecomment-884327938 | ||
|
||
## Drawbacks and Alternatives | ||
|
||
See [KEP-2400: Drawbacks] and [KEP-2400: Alternatives]. | ||
|
||
[KEP-2400: Drawbacks]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#drawbacks | ||
[KEP-2400: Alternatives]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#alternatives | ||
|
||
## Infrastructure Needed [optional] | ||
|
||
- We will need to configure periodic e2e tests on VMs with swap enabled. | ||
- We will need to enable swap on a [reliability cluster] to gauge long-term | ||
stability. | ||
|
||
[reliability cluster]: https://issues.redhat.com/browse/OCPNODE-619 |