Skip to content

Commit

Permalink
kep-1672: update beta milestones for v1.22
Browse files Browse the repository at this point in the history
Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com>
  • Loading branch information
andrewsykim committed May 12, 2021
1 parent a6909c2 commit 2ca35a2
Show file tree
Hide file tree
Showing 3 changed files with 147 additions and 2 deletions.
5 changes: 5 additions & 0 deletions keps/prod-readiness/sig-network/1672.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
kep-number: 1672
alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
121 changes: 121 additions & 0 deletions keps/sig-network/1672-tracking-terminating-endpoints/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@
- [Alpha](#alpha)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
- [Monitoring Requirements](#monitoring-requirements)
- [Dependencies](#dependencies)
- [Scalability](#scalability)
- [Troubleshooting](#troubleshooting)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
<!-- /toc -->
Expand Down Expand Up @@ -148,9 +155,123 @@ of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README
Since this is an addition to the EndpointSlice API, the version skew strategy will follow that
of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md).

## Production Readiness Review Questionnaire

### Feature Enablement and Rollback

###### How can this feature be enabled / disabled in a live cluster?

- [X] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: EndpointSliceTerminatingCondition
- Components depending on the feature gate: kube-apiserver and kube-controller-manager

###### Does enabling the feature change any default behavior?

Yes, terminating endpoints are now included as part of EndpointSlice API. The `ready` condition of an EndpointSlice will always be `false` to ensure consumers do not send traffic to terminating endpoints unless new conditions are checked.

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. On rollback, terminating endpoints will no longer be included in EndpointSlice and the `terminating` and `serving` conditions will not be set.

###### What happens if we reenable the feature if it was previously rolled back?

EndpointSlice will continue to have the `terminating` and `serving` condition set.

###### Are there any tests for feature enablement/disablement?

Yes, there will be integration and e2e tests validating whether EndpointSlice contains endpoints for pods that are terminating.

### Rollout, Upgrade and Rollback Planning

###### How can a rollout fail? Can it impact already running workloads?

If there are consumers of EndpointSlice that do not check the `ready` condition, then they may unexpectedly start sending traffic to terminating endpoints.
It is assumed that almost all consumers of EndpointSlice check the `ready` condition prior to allowing traffic to a pod.

###### What specific metrics should inform a rollback?

Application-level traffic indicating packet-loss or error rates.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Not yet, but manual upgrade and rollback testing will be done prior to graduating the feature to Beta.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No.

### Monitoring Requirements

###### How can an operator determine if the feature is in use by workloads?

The condition will always be set for terminating pods but consumers may choose to ignore them. It is up to consumers of the API to provide metrics
on how the new conditions are being used.

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

N/A

###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?

N/A

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

N/A

### Dependencies

###### Does this feature depend on any specific services running in the cluster?

N/A

### Scalability

###### Will enabling / using this feature result in any new API calls?

Yes, there will be more writes to EndpointSlice for every pod when it begins terminating.

###### Will enabling / using this feature result in introducing new API types?

No.

###### Will enabling / using this feature result in any new calls to the cloud provider?

No.

###### Will enabling / using this feature result in increasing size or count of the existing API objects?

Yes, it will increase the size of EndpointSlice by adding two boolean fields for each endpoint.

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No.

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

More writes to EndpointSlice could result in more resource usage from etcd disk IO and network bandwidth for all watchers.

### Troubleshooting

###### How does this feature react if the API server and/or etcd is unavailable?

EndpointSlice conditions will get stale.

###### What are other known failure modes?

* Consumers of EndpointSlice that do not not check the `ready` condition may unexpectedly use terminating endpoints.

###### What steps should be taken if SLOs are not being met to determine the problem?

* Disable the feature gate
* Check if consumers of EndpointSlice are using the serving or termianting condition
* Check etcd disk usage

## Implementation History

- [x] 2020-04-23: KEP accepted as implementable for v1.19
- [x] 2020-07-01: initial PR with alpha imlementation merged for v1.20
- [x] 2020-05-12: KEP accepted as implementable for v1.22

## Drawbacks

Expand Down
23 changes: 21 additions & 2 deletions keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,24 @@ see-also:
- /kep/sig-network/20190603-EndpointSlice-API.md
replaces: []

latest-milestone: "0.0"
stage: "alpha"
# The target maturity stage in the current dev cycle for this KEP.
stage: beta

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.22"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.20"
beta: "v1.22"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: EndpointSliceTerminatingCondition
components:
- kube-apiserver
- kube-controller-manager
disable-supported: true

0 comments on commit 2ca35a2

Please sign in to comment.