Skip to content

Commit

Permalink
Add initial OpenShift swap enhancement
Browse files Browse the repository at this point in the history
  • Loading branch information
ehashman committed Sep 21, 2021
1 parent 4412b51 commit afcc24e
Showing 1 changed file with 142 additions and 0 deletions.
142 changes: 142 additions & 0 deletions enhancements/kubelet/node-swap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
title: node-swap
authors:
- "@ehashman"
reviewers:
- "@rphilips"
- "@sjenning"
- "???"
approvers:
- "@mrunalp"
creation-date: "2021-06-23"
status: provisional
---

# OpenShift Node Swap Support

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

The upstream Kubernetes 1.22 release introduced alpha support for configuring swap memory usage for Kubernetes workloads on a per-node basis.

Now that swap use on nodes is supported in upstream, there are a number of use cases that would benefit from OpenShift nodes supporting swap, including improved node stability, better support for applications with high memory overhead but smaller working sets, the use of memory-constrained devices, and memory flexibility.

## Motivation

See [KEP-2400: Motivation].

[KEP-2400: Motivation]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#motivation

### Goals

- Swap can be provisioned and configured for nodes to use in an OpenShift cluster.

### Non-Goals

- Workload-specific swap accounting.
- Any of the non-goals in [KEP-2400: Non-goals].

[KEP-2400: Non-goals]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#non-goals

## Proposal

### User Stories

See [KEP-2400: User Stories](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#user-stories).

### Implementation Details/Notes/Constraints [optional]

See [KEP-2400: Notes/Constraints/Caveats](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#notesconstraintscaveats-optional).

### Risks and Mitigations

See [KEP-2400: Risks and Mitigations].

[KEP-2400: Risks and Mitigations]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#risks-and-mitigations

## Design Details

_very drafty_

- We will add `NodeSwap` to the [`TechPreviewNoUpgrade`] feature gate list.
- We can use [ignition configs](https://coreos.github.io/ignition/configuration-v3_3/) to add swap partitions to worker nodes. (filesystems.format = swap)
- Options: partition on the root node, or perhaps provision and mount an NVMe volume? (https://github.com/openshift/machine-config-operator/issues/1619)
- We will need to ensure the node has `swapon` before kubelet starts
- The kubelet needs an appropriate KubeletConfiguration (e.g. `NodeSwap` feature flag enabled, `failSwapOn = false`, and [`memorySwap.SwapBehavior` set](https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory))

TODO: how will we modify MCO to roll this out? Do we need to at all?

### Open Questions [optional]

- Will we eventually want to enable swap on all OpenShift nodes by default?
- Should swap just be limited to worker nodes, or should we consider adding it to control plane nodes too?

### Test Plan

In addition to the upstream e2e tests, we will need to add e2e suites to OpenShift in order to exercise provisioning and use of swap. This may include unit tests where appropriate, such as the MCO.

### Graduation Criteria

#### Dev Preview -> Tech Preview

Requires alpha support in upstream Kubernetes. (1.22+)

- Support provisioning OpenShift nodes with swap enabled for all available upstream swap configurations (currently `LimitedSwap`, `UnlimitedSwap`).

JIRA: https://issues.redhat.com/browse/OCPNODE-470

_Graduation criteria below are tentative._

#### Tech Preview -> GA

Requires beta/GA support in upstream Kubernetes. (1.25?+)

- More testing (upgrade, downgrade, scale)
- Sufficient time for feedback
- Available by default
- Backhaul SLI telemetry
- Document SLOs for the component
- Conduct load testing

### Upgrade / Downgrade Strategy

The `NodeSwap` feature flag is not supported in Kubernetes versions prior to 1.22/OpenShift 4.9. We will add the upstream `NodeSwap` feature flag to the set of [`TechPreviewNoUpgrade`] flags to prevent upgrades.

Note that swap support does not require coordination between components and the configuration is limited to individual nodes.

See also [KEP-2400: Upgrade/Downgrade Strategy].

[`TechPreviewNoUpgrade`]: https://github.com/openshift/enhancements/blob/ce4d303db807622687159eb9d3248285a003fabb/guidelines/techpreview.md#official-processmechanism-for-delivering-a-tp-feature
[KEP-2400: Upgrade/Downgrade Strategy]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#upgrade--downgrade-strategy

### Version Skew Strategy

N/A, this is a compatible API change limited to the Kubelet that does not require coordination with the API Server.

## Implementation History

- [Upstream alpha swap support] completed in 1.22.

[Upstream alpha swap support]: https://github.com/kubernetes/enhancements/issues/2400#issuecomment-884327938

## Drawbacks and Alternatives

See [KEP-2400: Drawbacks] and [KEP-2400: Alternatives].

[KEP-2400: Drawbacks]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#drawbacks
[KEP-2400: Alternatives]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#alternatives

## Infrastructure Needed [optional]

- We will need to configure periodic e2e tests on VMs with swap enabled.
- We will need to enable swap on a [reliability cluster] to gauge long-term stability.

[reliability cluster]: https://issues.redhat.com/browse/OCPNODE-619

0 comments on commit afcc24e

Please sign in to comment.