|
| 1 | +# KEP-3015 PreferSameNode Traffic Distribution |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [User Stories](#user-stories) |
| 11 | + - [DNS](#dns) |
| 12 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 13 | +- [Design Details](#design-details) |
| 14 | + - [Test Plan](#test-plan) |
| 15 | + - [Graduation Criteria](#graduation-criteria) |
| 16 | + - [Alpha](#alpha) |
| 17 | + - [Beta](#beta) |
| 18 | + - [GA](#ga) |
| 19 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 20 | + - [Version Skew Strategy](#version-skew-strategy) |
| 21 | + - [apiserver vs kube-proxy skew](#apiserver-vs-kube-proxy-skew) |
| 22 | + - [kube-proxy vs kube-proxy skew](#kube-proxy-vs-kube-proxy-skew) |
| 23 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 24 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 25 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 26 | + - [Monitoring Requirements](#monitoring-requirements) |
| 27 | + - [Dependencies](#dependencies) |
| 28 | + - [Scalability](#scalability) |
| 29 | + - [Troubleshooting](#troubleshooting) |
| 30 | +- [Implementation History](#implementation-history) |
| 31 | +- [Drawbacks](#drawbacks) |
| 32 | +- [Alternatives](#alternatives) |
| 33 | +<!-- /toc --> |
| 34 | + |
| 35 | +## Release Signoff Checklist |
| 36 | + |
| 37 | +<!-- |
| 38 | +**ACTION REQUIRED:** In order to merge code into a release, there must be an |
| 39 | +issue in [kubernetes/enhancements] referencing this KEP and targeting a release |
| 40 | +milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases) |
| 41 | +of the targeted release**. |
| 42 | +
|
| 43 | +For enhancements that make changes to code or processes/procedures in core |
| 44 | +Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release |
| 45 | +Signoff checklist to be completed. |
| 46 | +
|
| 47 | +Check these off as they are completed for the Release Team to track. These |
| 48 | +checklist items _must_ be updated for the enhancement to be released. |
| 49 | +--> |
| 50 | + |
| 51 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 52 | + |
| 53 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 54 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 55 | +- [ ] (R) Design details are appropriately documented |
| 56 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 57 | + - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 58 | + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 59 | + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free |
| 60 | +- [ ] (R) Graduation criteria is in place |
| 61 | + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 62 | +- [ ] (R) Production readiness review completed |
| 63 | +- [ ] (R) Production readiness review approved |
| 64 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 65 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 66 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 67 | + |
| 68 | +<!-- |
| 69 | +**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone. |
| 70 | +--> |
| 71 | + |
| 72 | +[kubernetes.io]: https://kubernetes.io/ |
| 73 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 74 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 75 | +[kubernetes/website]: https://git.k8s.io/website |
| 76 | + |
| 77 | +## Summary |
| 78 | + |
| 79 | +This KEP extends KEP-4444 `TrafficDistribution` with a new value, |
| 80 | +`PreferSameNode`, indicating traffic for a service should |
| 81 | +preferentially be routed to endpoints on the same node as the client. |
| 82 | + |
| 83 | +(This is the third attempt at this feature, which was previously |
| 84 | +suggested as [`internalTrafficPolicy: PreferLocal`] and [Node-level |
| 85 | +topology].) |
| 86 | + |
| 87 | +[`internalTrafficPolicy: PreferLocal`]: https://github.com/kubernetes/enhancements/pull/3016 |
| 88 | +[Node-level topology]: https://github.com/kubernetes/enhancements/pull/3293 |
| 89 | + |
| 90 | +## Motivation |
| 91 | + |
| 92 | +### Goals |
| 93 | + |
| 94 | +- Allow configuring a service so that connections will be delivered to |
| 95 | + a local endpoint when possible, and a remote endpoint if not. |
| 96 | + |
| 97 | +### Non-Goals |
| 98 | + |
| 99 | +N/A |
| 100 | + |
| 101 | +## Proposal |
| 102 | + |
| 103 | +### User Stories |
| 104 | + |
| 105 | +#### DNS |
| 106 | + |
| 107 | +As a cluster administrator, I plan to run a DNS pod on each node, and |
| 108 | +would like DNS requests from other pods to always go to the local DNS |
| 109 | +pod, for efficiency. However, if no local DNS pod is available, DNS |
| 110 | +should just go to a remote pod instead so it keeps working. There |
| 111 | +should never be enough DNS traffic to overload any one endpoint, so |
| 112 | +it's safe to use a TrafficDistribution mode that doesn't worry about |
| 113 | +endpoint overload. |
| 114 | + |
| 115 | +### Risks and Mitigations |
| 116 | + |
| 117 | +This is similar to the existing `PreferClose` mode (possibly to be |
| 118 | +renamed `PreferSameZone`) and has the same sorts of risks. |
| 119 | +We only use the new traffic distribution mode if the user explicitly |
| 120 | +requests it, and in that case, the user is responsible for ensuring |
| 121 | +that clients and servers are distributed in a way such that the |
| 122 | +traffic distribution mode makes sense. |
| 123 | + |
| 124 | +## Design Details |
| 125 | + |
| 126 | +We will add a new field to `discoveryv1.EndpointHints`: |
| 127 | + |
| 128 | +```golang |
| 129 | +// EndpointHints provides hints describing how an endpoint should be consumed. |
| 130 | +type EndpointHints struct { |
| 131 | + ... |
| 132 | + |
| 133 | + // forNodes indicates the node(s) this endpoint should be targeted by. |
| 134 | + // +listType=atomic |
| 135 | + ForNodes []string `json:"forNodes,omitempty" protobuf:"bytes,2,name=forNodes"` |
| 136 | +} |
| 137 | + |
| 138 | +When updating EndpointSlices, if the EndpointSlice controller sees a |
| 139 | +service with `PreferSameNode` traffic distribution, then for each |
| 140 | +endpoint in the slice, it will add a `ForNodes` hint including the |
| 141 | +name of the endpoint's node. (The field is an array for future |
| 142 | +extensibility, but initially it will always have either 0 or 1 |
| 143 | +elements.) In addition, it will set the `ForZones` hint as it would |
| 144 | +with `TrafficDistribution: PreferClose`, to allow older service |
| 145 | +proxies to fall back to at least same-zone behavior. |
| 146 | +
|
| 147 | +When kube-proxy sees an Endpoint with the `ForNodes` hint set, it will |
| 148 | +use that endpoint if the hint includes its own node name, and ignore |
| 149 | +it otherwise, similarly to the `ForZones` hint. |
| 150 | +
|
| 151 | +### Test Plan |
| 152 | +
|
| 153 | +[X] I/we understand the owners of the involved components may require updates to |
| 154 | +existing tests to make this code solid enough prior to committing the changes necessary |
| 155 | +to implement this enhancement. |
| 156 | +
|
| 157 | +##### Prerequisite testing updates |
| 158 | +
|
| 159 | +N/A |
| 160 | +
|
| 161 | +##### Unit tests |
| 162 | +
|
| 163 | +Tests of validation, endpointslice-controller, and kube-proxy will be |
| 164 | +updated. |
| 165 | +
|
| 166 | +<!-- |
| 167 | +Additionally, for Alpha try to enumerate the core package you will be touching |
| 168 | +to implement this enhancement and provide the current unit coverage for those |
| 169 | +in the form of: |
| 170 | +- <package>: <date> - <current test coverage> |
| 171 | +The data can be easily read from: |
| 172 | +https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit |
| 173 | +
|
| 174 | +This can inform certain test coverage improvements that we want to do before |
| 175 | +extending the production code to implement this enhancement. |
| 176 | +--> |
| 177 | +
|
| 178 | +- `<package>`: `<date>` - `<test coverage>` |
| 179 | +
|
| 180 | +##### Integration tests |
| 181 | +
|
| 182 | +N/A |
| 183 | +
|
| 184 | +##### e2e tests |
| 185 | +
|
| 186 | +E2E tests will be added similar to existing traffic distribution |
| 187 | +tests, to cover the new options. |
| 188 | +
|
| 189 | +- <test>: <link to test coverage> |
| 190 | +
|
| 191 | +### Graduation Criteria |
| 192 | +
|
| 193 | +#### Alpha |
| 194 | +
|
| 195 | +- Feature implemented behind a feature flag |
| 196 | +
|
| 197 | +- Unit tests for API enablement and endpoint selection. |
| 198 | +
|
| 199 | +#### Beta |
| 200 | +
|
| 201 | +- E2E tests completed and enabled. |
| 202 | +
|
| 203 | +- Enough time has passed since Alpha to avoid version skew issues. |
| 204 | +
|
| 205 | +#### GA |
| 206 | +
|
| 207 | +- Time passes, no major objections |
| 208 | +
|
| 209 | +### Upgrade / Downgrade Strategy |
| 210 | +
|
| 211 | +No real issues, other than dealing with skew. |
| 212 | +
|
| 213 | +### Version Skew Strategy |
| 214 | +
|
| 215 | +In skewed clusters, it may not be possible for kube-controller-manager |
| 216 | +to set the new EndpointSlice hint, or else kube-proxy may not be able |
| 217 | +to see the hint. In this case, the service will fall back to |
| 218 | +perfer-same-zone semantics rather than prefer-same-node. Users can |
| 219 | +avoid problems with this by not using the feature until their cluster |
| 220 | +is fully upgraded to a version that supports the feature. |
| 221 | +
|
| 222 | +## Production Readiness Review Questionnaire |
| 223 | +
|
| 224 | +### Feature Enablement and Rollback |
| 225 | +
|
| 226 | +###### How can this feature be enabled / disabled in a live cluster? |
| 227 | +
|
| 228 | +- [X] Feature gate (also fill in values in `kep.yaml`) |
| 229 | + - Feature gate name: PreferSameNodeTrafficDistribution |
| 230 | + - Components depending on the feature gate: |
| 231 | + - kube-apiserver |
| 232 | + - kube-controller-manager |
| 233 | + - kube-proxy |
| 234 | +
|
| 235 | +###### Does enabling the feature change any default behavior? |
| 236 | +
|
| 237 | +No |
| 238 | +
|
| 239 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? |
| 240 | +
|
| 241 | +Yes. |
| 242 | +
|
| 243 | +###### What happens if we reenable the feature if it was previously rolled back? |
| 244 | +
|
| 245 | +It starts working again. |
| 246 | +
|
| 247 | +###### Are there any tests for feature enablement/disablement? |
| 248 | +
|
| 249 | +No. |
| 250 | +
|
| 251 | +### Rollout, Upgrade and Rollback Planning |
| 252 | +
|
| 253 | +###### How can a rollout or rollback fail? Can it impact already running workloads? |
| 254 | +
|
| 255 | +An initial rollout cannot fail and won't impact already-running |
| 256 | +workloads, because at the time of the initial rollout, there cannot |
| 257 | +already be any `TrafficDistribution: PreferSameNode` services. |
| 258 | + |
| 259 | +A rollback has reasonable fallback behavior (as with downgrades), and |
| 260 | +a re-rollout just updates the behavior of existing `PreferSameNode` |
| 261 | +services in the expected way. |
| 262 | + |
| 263 | +###### What specific metrics should inform a rollback? |
| 264 | + |
| 265 | +There are no metrics that would inform anyone that the feature was |
| 266 | +failing, but since the feature is opt-in, individual users can simply |
| 267 | +stop using the feature if it is not working for them. |
| 268 | + |
| 269 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? |
| 270 | + |
| 271 | +No |
| 272 | + |
| 273 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? |
| 274 | + |
| 275 | +No |
| 276 | + |
| 277 | +### Monitoring Requirements |
| 278 | + |
| 279 | +###### How can an operator determine if the feature is in use by workloads? |
| 280 | + |
| 281 | +By checking if any Service has `TrafficDistribution: PreferSameNode`. |
| 282 | + |
| 283 | +###### How can someone using this feature know that it is working for their instance? |
| 284 | + |
| 285 | +As with other topology features, there is no easy way for an end user |
| 286 | +to reliably confirm that it is working correctly other than by |
| 287 | +sniffing the network traffic, or else looking at the logs of each |
| 288 | +endpoint to confirm that they are receiving the expected connections |
| 289 | +and not receiving unexpected connections. |
| 290 | + |
| 291 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? |
| 292 | + |
| 293 | +The implementation of the feature itself has no SLOs. The effect it |
| 294 | +has on the performance of end user workloads that use the feature |
| 295 | +depends on those workloads. |
| 296 | + |
| 297 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? |
| 298 | + |
| 299 | +The implementation of the feature itself has no SLIs, other than the |
| 300 | +generic kube-proxy metrics. User workloads that use the feature may |
| 301 | +expose SLI information that the user can examine to determine how well |
| 302 | +the feature is working for their workload. |
| 303 | + |
| 304 | +###### Are there any missing metrics that would be useful to have to improve observability of this feature? |
| 305 | + |
| 306 | +Not really; we don't know how fast the user's services are supposed to |
| 307 | +be, so we can't really tell if we are improving them as much as they |
| 308 | +hoped or not. |
| 309 | +
|
| 310 | +### Dependencies |
| 311 | +
|
| 312 | +###### Does this feature depend on any specific services running in the cluster? |
| 313 | +
|
| 314 | +It depends on a service proxy which recognizes the new traffic policy |
| 315 | +values. We will update `kube-proxy` ourselves, but network plugins / |
| 316 | +kubernetes distributions that ship their own alternative service |
| 317 | +proxies will also need to be updated to support the new value before |
| 318 | +their users can make use of it. (Until then, `TrafficDistribution: |
| 319 | +PreferSameNode` would be implemented as `TrafficDistribution: |
| 320 | +PreferClose`.) |
| 321 | +
|
| 322 | +### Scalability |
| 323 | +
|
| 324 | +###### Will enabling / using this feature result in any new API calls? |
| 325 | +
|
| 326 | +No |
| 327 | +
|
| 328 | +###### Will enabling / using this feature result in introducing new API types? |
| 329 | +
|
| 330 | +No |
| 331 | +
|
| 332 | +###### Will enabling / using this feature result in any new calls to the cloud provider? |
| 333 | +
|
| 334 | +No |
| 335 | +
|
| 336 | +###### Will enabling / using this feature result in increasing size or count of the existing API objects? |
| 337 | +
|
| 338 | +No (other than that it means people may set `TrafficDistribution` on |
| 339 | +Services where they were not previously setting it). |
| 340 | +
|
| 341 | +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? |
| 342 | +
|
| 343 | +No |
| 344 | +
|
| 345 | +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? |
| 346 | +
|
| 347 | +No |
| 348 | +
|
| 349 | +### Troubleshooting |
| 350 | +
|
| 351 | +###### How does this feature react if the API server and/or etcd is unavailable? |
| 352 | +
|
| 353 | +No change from existing service/proxy behavior. |
| 354 | +
|
| 355 | +###### What are other known failure modes? |
| 356 | +
|
| 357 | +None known |
| 358 | +
|
| 359 | +###### What steps should be taken if SLOs are not being met to determine the problem? |
| 360 | +
|
| 361 | +N/A |
| 362 | +
|
| 363 | +## Implementation History |
| 364 | +
|
| 365 | +- Initial proposal as `InternalTrafficPolicy: PreferLocal`: 2021-10-21 |
| 366 | +- Initial proposal as "Node-level topology": 2022-01-15 |
| 367 | +- Initial proposal as `TrafficDistribution: PreferSameNode`: 2025-02-06 |
0 commit comments