From bbe8178f353983856e52785e5fa614b87538269c Mon Sep 17 00:00:00 2001 From: Andrew Sy Kim Date: Wed, 12 May 2021 10:59:08 -0400 Subject: [PATCH] kep-1672: update beta milestones for v1.22 Signed-off-by: Andrew Sy Kim --- keps/prod-readiness/sig-network/1672.yaml | 5 + .../README.md | 114 ++++++++++++++++++ .../kep.yaml | 23 +++- 3 files changed, 140 insertions(+), 2 deletions(-) create mode 100644 keps/prod-readiness/sig-network/1672.yaml diff --git a/keps/prod-readiness/sig-network/1672.yaml b/keps/prod-readiness/sig-network/1672.yaml new file mode 100644 index 000000000000..f3c8307c0668 --- /dev/null +++ b/keps/prod-readiness/sig-network/1672.yaml @@ -0,0 +1,5 @@ +kep-number: 1672 +alpha: + approver: "@wojtek-t" +beta: + approver: "@wojtek-t" diff --git a/keps/sig-network/1672-tracking-terminating-endpoints/README.md b/keps/sig-network/1672-tracking-terminating-endpoints/README.md index a879d95f04cb..f27067e8a6d0 100644 --- a/keps/sig-network/1672-tracking-terminating-endpoints/README.md +++ b/keps/sig-network/1672-tracking-terminating-endpoints/README.md @@ -148,9 +148,123 @@ of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README Since this is an addition to the EndpointSlice API, the version skew strategy will follow that of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md). +## Production Readiness Review Questionnaire + +### Feature Enablement and Rollback + +###### How can this feature be enabled / disabled in a live cluster? + +- [X] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: EndpointSliceTerminatingCondition + - Components depending on the feature gate: kube-apiserver and kube-controller-manager + +###### Does enabling the feature change any default behavior? + +Yes, terminating endpoints are now included as part of EndpointSlice API. The `ready` condition of an EndpointSlice will always be `false` to ensure consumers do not send traffic to terminating endpoints unless new conditions are checked. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + +Yes. On rollback, terminating endpoints will no longer be included in EndpointSlice and the `terminating` and `serving` conditions will not be set. + +###### What happens if we reenable the feature if it was previously rolled back? + +EndpointSlice will continue to have the `terminating` and `serving` condition set. + +###### Are there any tests for feature enablement/disablement? + +Yes, there will be integration and e2e tests validating whether EndpointSlice contains endpoints for pods that are terminating. + +### Rollout, Upgrade and Rollback Planning + +###### How can a rollout fail? Can it impact already running workloads? + +If there are consumers of EndpointSlice that do not check the `ready` condition, then they may unexpectedly start sending traffic to terminating endpoints. +It is assumed that almost all consumers of EndpointSlice check the `ready` condition prior to allowing traffic to a pod. + +###### What specific metrics should inform a rollback? + +Application-level traffic indicating packet-loss or error rates. + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + +Not yet, but manual upgrade and rollback testing will be done prior to graduating the feature to Beta. + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + +No. + +### Monitoring Requirements + +###### How can an operator determine if the feature is in use by workloads? + +The condition will always be set for terminating pods but consumers may choose to ignore them. It is up to consumers of the API to provide metrics +on how the new conditions are being used. + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + +N/A + +###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs? + +N/A + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + +N/A + +### Dependencies + +###### Does this feature depend on any specific services running in the cluster? + +N/A + +### Scalability + +###### Will enabling / using this feature result in any new API calls? + +Yes, there will be more writes to EndpointSlice for every pod when it begins terminating. + +###### Will enabling / using this feature result in introducing new API types? + +No. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No. + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + +Yes, it will increase the size of EndpointSlice by adding two boolean fields for each endpoint. + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + +No. + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + +More writes to EndpointSlice could result in more resource usage from etcd disk IO and network bandwidth for all watchers. + +### Troubleshooting + +###### How does this feature react if the API server and/or etcd is unavailable? + +EndpointSlice conditions will get stale. + +###### What are other known failure modes? + +* Consumers of EndpointSlice that do not not check the `ready` condition may unexpectedly use terminating endpoints. + +###### What steps should be taken if SLOs are not being met to determine the problem? + +* Disable the feature gate +* Check if consumers of EndpointSlice are using the serving or termianting condition +* Check etcd disk usage + ## Implementation History - [x] 2020-04-23: KEP accepted as implementable for v1.19 +- [x] 2020-07-01: initial PR with alpha imlementation merged for v1.20 +- [x] 2020-05-12: KEP accepted as implementable for v1.22 ## Drawbacks diff --git a/keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml b/keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml index 68632764f0f7..196b2717f8e5 100644 --- a/keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml +++ b/keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml @@ -19,5 +19,24 @@ see-also: - /kep/sig-network/20190603-EndpointSlice-API.md replaces: [] -latest-milestone: "0.0" -stage: "alpha" +# The target maturity stage in the current dev cycle for this KEP. +stage: beta + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.22" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.20" + beta: "v1.22" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: EndpointSliceTerminatingCondition + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true