From c1323be2ba90ef6d94ed5cc12486f110b039ca32 Mon Sep 17 00:00:00 2001 From: David Eads Date: Thu, 13 Jan 2022 13:29:51 -0500 Subject: [PATCH 1/6] beta apis off by default --- .../3136-beta-off-by-default/README.md | 585 ++++++++++++++++++ .../3136-beta-off-by-default/kep.yaml | 46 ++ 2 files changed, 631 insertions(+) create mode 100644 keps/sig-architecture/3136-beta-off-by-default/README.md create mode 100644 keps/sig-architecture/3136-beta-off-by-default/kep.yaml diff --git a/keps/sig-architecture/3136-beta-off-by-default/README.md b/keps/sig-architecture/3136-beta-off-by-default/README.md new file mode 100644 index 00000000000..c84541af3c7 --- /dev/null +++ b/keps/sig-architecture/3136-beta-off-by-default/README.md @@ -0,0 +1,585 @@ + +# KEP-3136: Beta APIs Are Off by Default + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +New beta APIs will not be enabled in clusters by default. +Existing beta APIs and new versions of existing beta APIs, will continue to be enabled by default. + +## Motivation + +Beta APIs are not considered stable and reliance upon APIs in this state leads to exposure to bugs, +guaranteed migration pain for users when the APIs move to stable, and the risk that dependencies will +grow around unfinished APIs. +Enabling beta APIs by default, exacerbates these problems by making them on in nearly every cluster. +We observed these problems as we removed long-standing beta APIs and the PRR survey tells us that over +90% of production clusters leave these APIs enabled. +By disabling beta APIs by default, a cluster-admin can opt-in for specific APIs without having every +incomplete API present in the cluster. + +### Goals + +1. Disable new beta APIs by default. +2. Continue enabling existing beta APIs and new version of existing beta APIs by default. +3. Allow enabling specific resources in beta. Enable cronjob.v1beta1.batch.k8s.io without enabling other-cool-job.v1beta1.batch.k8s.io + +### Non-Goals + +1. Change featuregate defaults. + +## Proposal + +New beta APIs will be placed into the `DisableVersions` stanza instead of the `EnableVersions` stanza (see [DefaultAPIResourceConfigSource](https://github.com/kubernetes/kubernetes/blob/0669da445fa8c1ae07c15c0827f0e83da11cbe58/pkg/controlplane/instance.go#L643)). +The `--runtime-config` flag will be extended to allow `group/version/resource=true`, to enable specific resources. +To enable a beta API, a cluster-admin will have to add the appropriate `--runtime-config` flags. + +### User Stories (Optional) + +#### Story 1 + +As a cluster-admin I want to enable the cronjob.v1beta1.batch.k8s.io API in my cluster. + +To do this I call `kube-apiserver --runtime-config=batch.k8s.io/v1beta1/cronjob`. + +#### Story 1 + +As a cluster-admin I want to enable all beta APIs as in past releases. + +To do this I call `kube-apiserver --runtime-config=api/beta=true`. + + +### Notes/Constraints/Caveats (Optional) + +### Risks and Mitigations + +Adoption of beta features will slow. +Given how kubernetes is now treated, this is a good thing, not a bad thing. +Those users that want to move quickly and get new features can do so by enabling all beta feature +or just enabling those that are important for their workload. + +## Design Details + + + +### Test Plan + +Integration tests will be written to ensure that no new beta APIs are enabled in the kube-apiserver by default. +Unit tests will be written to ensure that the new flag functionality works as expected. + +### Graduation Criteria + +This KEP is a policy KEP, not a feature KEP. It will start as GA. + +#### GA + +- Integration and unit tests from above. + +### Upgrade / Downgrade Strategy + +The additional command line flag format for `--runtime-config` will not be recognized on older levels of kubernetes. +This means that when downgrading, cluster-admins will have to adjust their CLI arguments if they opted into a new beta API. +This is congruent to flag handling for new features today. +Because this only impacts new beta APIs, there is no behavior change for existing APIs on upgrade. + +### Version Skew Strategy + +Because this only impacts new beta APIs, there is no novel skew risk. + +## Production Readiness Review Questionnaire + +Not applicable. + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-architecture/3136-beta-off-by-default/kep.yaml b/keps/sig-architecture/3136-beta-off-by-default/kep.yaml new file mode 100644 index 00000000000..7e000223da9 --- /dev/null +++ b/keps/sig-architecture/3136-beta-off-by-default/kep.yaml @@ -0,0 +1,46 @@ +title: Beta APIs Are Off by Default +kep-number: 3136 +authors: + - "@deads2k" +owning-sig: sig-architecture +participating-sigs: + - sig-api-machinery +status: implementable +creation-date: 2022-01-13 +reviewers: +approvers: + - "@liggitt" + - "@johnbelamaric" + - "@derekwaynecarr" + - "@dims" + +##### WARNING !!! ###### +# prr-approvers has been moved to its own location +# You should create your own in keps/prod-readiness +# Please make a copy of keps/prod-readiness/template/nnnn.yaml +# to keps/prod-readiness/sig-xxxxx/00000.yaml (replace with kep number) +#prr-approvers: + +see-also: + - "/keps/sig-architecture/1635-prevent-permabeta" +replaces: + +# The target maturity stage in the current dev cycle for this KEP. +stage: stable + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.24" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + stable: "v1.24" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: +disable-supported: false + +# The following PRR answers are required at beta release +metrics: From e55c925f889bef3c663c88f09399c62555617cf0 Mon Sep 17 00:00:00 2001 From: David Eads Date: Fri, 14 Jan 2022 10:16:59 -0500 Subject: [PATCH 2/6] clarify beta apis off by default --- .../prod-readiness/sig-architecture/3136.yaml | 3 + .../3136-beta-apis-off-by-default/README.md | 286 +++++++++ .../kep.yaml | 0 .../3136-beta-off-by-default/README.md | 585 ------------------ 4 files changed, 289 insertions(+), 585 deletions(-) create mode 100644 keps/prod-readiness/sig-architecture/3136.yaml create mode 100644 keps/sig-architecture/3136-beta-apis-off-by-default/README.md rename keps/sig-architecture/{3136-beta-off-by-default => 3136-beta-apis-off-by-default}/kep.yaml (100%) delete mode 100644 keps/sig-architecture/3136-beta-off-by-default/README.md diff --git a/keps/prod-readiness/sig-architecture/3136.yaml b/keps/prod-readiness/sig-architecture/3136.yaml new file mode 100644 index 00000000000..091381a82cd --- /dev/null +++ b/keps/prod-readiness/sig-architecture/3136.yaml @@ -0,0 +1,3 @@ +kep-number: 3136 +stable: + approver: "@wojtek-t" diff --git a/keps/sig-architecture/3136-beta-apis-off-by-default/README.md b/keps/sig-architecture/3136-beta-apis-off-by-default/README.md new file mode 100644 index 00000000000..ecaeaa93c79 --- /dev/null +++ b/keps/sig-architecture/3136-beta-apis-off-by-default/README.md @@ -0,0 +1,286 @@ + +# KEP-3136: Beta APIs Are Off by Default + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [GA](#ga) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +New beta APIs will not be enabled in clusters by default. +Existing beta APIs and new versions of existing beta APIs, will continue to be enabled by default: +if v1beta.some.group is currently enabled by default and we create v1beta2.some.group, v1beta2.some.group will still be enabled by default. + +## Motivation + +Beta APIs are not considered stable and reliance upon APIs in this state leads to exposure to bugs, +guaranteed migration pain for users when the APIs move to stable, and the risk that dependencies will +grow around unfinished APIs. +Enabling beta APIs by default, exacerbates these problems by making them on in nearly every cluster. +We observed these problems as we removed long-standing beta APIs and the PRR survey tells us that over +90% of production clusters leave these APIs enabled. +Unsuitability for production use is documented at https://kubernetes.io/docs/reference/using-api/#api-versioning +("The software is not recommended for production uses"), but defaulting on means they are present in nearly every +production cluster. +By disabling beta APIs by default, a cluster-admin can opt-in for specific APIs without having every +incomplete API present in the cluster. + +### Goals + +1. Disable new beta APIs by default. +2. Continue enabling existing beta APIs and new version of existing beta APIs by default: + if v1beta.some.group is currently enabled by default and we create v1beta2.some.group, v1beta2.some.group will still be enabled by default. +3. Allow enabling specific resources in beta. Enable coolnewjobtype.v1beta1.batch.k8s.io without enabling other-neat-job.v1beta1.batch.k8s.io + +### Non-Goals + +1. Change featuregate defaults. + +## Proposal + +New beta APIs will be placed into the `DisableVersions` stanza instead of the `EnableVersions` stanza (see [DefaultAPIResourceConfigSource](https://github.com/kubernetes/kubernetes/blob/0669da445fa8c1ae07c15c0827f0e83da11cbe58/pkg/controlplane/instance.go#L643)). +The `--runtime-config` flag will be extended to allow `group/version/resource=true`, to enable specific resources. +To enable a beta API, a cluster-admin will have to add the appropriate `--runtime-config` flags. + +### User Stories (Optional) + +#### Story 1 + +As a cluster-admin I want to enable the coolnewjobtype.v1beta1.batch.k8s.io API in my cluster. + +To do this I call `kube-apiserver --runtime-config=batch.k8s.io/v1beta1/coolnewjobtype`. + +#### Story 2 + +As a cluster-admin I want to enable all beta APIs as in past releases. + +To do this I call `kube-apiserver --runtime-config=api/beta=true`. +This already exists and will continue to function. + + +### Notes/Constraints/Caveats (Optional) + +### Risks and Mitigations + +Adoption of beta features will slow. +Given how kubernetes is now treated, this is a good thing, not a bad thing. +Those users that want to move quickly and get new features can do so by enabling all beta feature +or just enabling those that are important for their workload. +The [PRR survey](https://datastudio.google.com/reporting/2e9c7439-202b-48a9-8c57-4459e0d69c8d/page/Cv5HB) shows that +over 30% of production clusters have alpha features enabled, so clsuter-admins are willing and able to enable features +that are not on by default when they are desired. + +## Design Details + + + +### Test Plan + +Integration tests will be written to ensure that no new beta APIs are enabled in the kube-apiserver by default. +Unit tests will be written to ensure that the new flag functionality works as expected. + +### Graduation Criteria + +This KEP is a policy KEP, not a feature KEP. It will start as GA. + +#### GA + +- Integration and unit tests from above. +- updating the enablement docs for beta + - https://kubernetes.io/docs/reference/using-api/#api-versioning + - https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#using-a-feature + Even though that is talking about feature gates, it is likely worth calling out there that new beta REST APIs are no + longer enabled by default) + +### Upgrade / Downgrade Strategy + +The additional command line flag format for `--runtime-config` will not be recognized on older levels of kubernetes. +This means that when downgrading, cluster-admins will have to adjust their CLI arguments if they opted into a new beta API. +This is congruent to flag handling for new features today. +Because this only impacts new beta APIs, there is no behavior change for existing APIs on upgrade. + +### Version Skew Strategy + +Because this only impacts new beta APIs, there is no novel skew risk. + +## Production Readiness Review Questionnaire + +Not applicable because this is a policy KEP. + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-architecture/3136-beta-off-by-default/kep.yaml b/keps/sig-architecture/3136-beta-apis-off-by-default/kep.yaml similarity index 100% rename from keps/sig-architecture/3136-beta-off-by-default/kep.yaml rename to keps/sig-architecture/3136-beta-apis-off-by-default/kep.yaml diff --git a/keps/sig-architecture/3136-beta-off-by-default/README.md b/keps/sig-architecture/3136-beta-off-by-default/README.md deleted file mode 100644 index c84541af3c7..00000000000 --- a/keps/sig-architecture/3136-beta-off-by-default/README.md +++ /dev/null @@ -1,585 +0,0 @@ - -# KEP-3136: Beta APIs Are Off by Default - - - - - - -- [Release Signoff Checklist](#release-signoff-checklist) -- [Summary](#summary) -- [Motivation](#motivation) - - [Goals](#goals) - - [Non-Goals](#non-goals) -- [Proposal](#proposal) - - [User Stories (Optional)](#user-stories-optional) - - [Story 1](#story-1) - - [Story 2](#story-2) - - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - - [Risks and Mitigations](#risks-and-mitigations) -- [Design Details](#design-details) - - [Test Plan](#test-plan) - - [Graduation Criteria](#graduation-criteria) - - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - - [Version Skew Strategy](#version-skew-strategy) -- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - - [Feature Enablement and Rollback](#feature-enablement-and-rollback) - - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) - - [Monitoring Requirements](#monitoring-requirements) - - [Dependencies](#dependencies) - - [Scalability](#scalability) - - [Troubleshooting](#troubleshooting) -- [Implementation History](#implementation-history) -- [Drawbacks](#drawbacks) -- [Alternatives](#alternatives) -- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) - - -## Release Signoff Checklist - - - -Items marked with (R) are required *prior to targeting to a milestone / release*. - -- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) -- [ ] (R) KEP approvers have approved the KEP status as `implementable` -- [ ] (R) Design details are appropriately documented -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) - - [ ] e2e Tests for all Beta API Operations (endpoints) - - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free -- [ ] (R) Graduation criteria is in place - - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) -- [ ] (R) Production readiness review completed -- [ ] (R) Production readiness review approved -- [ ] "Implementation History" section is up-to-date for milestone -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] -- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes - - - -[kubernetes.io]: https://kubernetes.io/ -[kubernetes/enhancements]: https://git.k8s.io/enhancements -[kubernetes/kubernetes]: https://git.k8s.io/kubernetes -[kubernetes/website]: https://git.k8s.io/website - -## Summary - -New beta APIs will not be enabled in clusters by default. -Existing beta APIs and new versions of existing beta APIs, will continue to be enabled by default. - -## Motivation - -Beta APIs are not considered stable and reliance upon APIs in this state leads to exposure to bugs, -guaranteed migration pain for users when the APIs move to stable, and the risk that dependencies will -grow around unfinished APIs. -Enabling beta APIs by default, exacerbates these problems by making them on in nearly every cluster. -We observed these problems as we removed long-standing beta APIs and the PRR survey tells us that over -90% of production clusters leave these APIs enabled. -By disabling beta APIs by default, a cluster-admin can opt-in for specific APIs without having every -incomplete API present in the cluster. - -### Goals - -1. Disable new beta APIs by default. -2. Continue enabling existing beta APIs and new version of existing beta APIs by default. -3. Allow enabling specific resources in beta. Enable cronjob.v1beta1.batch.k8s.io without enabling other-cool-job.v1beta1.batch.k8s.io - -### Non-Goals - -1. Change featuregate defaults. - -## Proposal - -New beta APIs will be placed into the `DisableVersions` stanza instead of the `EnableVersions` stanza (see [DefaultAPIResourceConfigSource](https://github.com/kubernetes/kubernetes/blob/0669da445fa8c1ae07c15c0827f0e83da11cbe58/pkg/controlplane/instance.go#L643)). -The `--runtime-config` flag will be extended to allow `group/version/resource=true`, to enable specific resources. -To enable a beta API, a cluster-admin will have to add the appropriate `--runtime-config` flags. - -### User Stories (Optional) - -#### Story 1 - -As a cluster-admin I want to enable the cronjob.v1beta1.batch.k8s.io API in my cluster. - -To do this I call `kube-apiserver --runtime-config=batch.k8s.io/v1beta1/cronjob`. - -#### Story 1 - -As a cluster-admin I want to enable all beta APIs as in past releases. - -To do this I call `kube-apiserver --runtime-config=api/beta=true`. - - -### Notes/Constraints/Caveats (Optional) - -### Risks and Mitigations - -Adoption of beta features will slow. -Given how kubernetes is now treated, this is a good thing, not a bad thing. -Those users that want to move quickly and get new features can do so by enabling all beta feature -or just enabling those that are important for their workload. - -## Design Details - - - -### Test Plan - -Integration tests will be written to ensure that no new beta APIs are enabled in the kube-apiserver by default. -Unit tests will be written to ensure that the new flag functionality works as expected. - -### Graduation Criteria - -This KEP is a policy KEP, not a feature KEP. It will start as GA. - -#### GA - -- Integration and unit tests from above. - -### Upgrade / Downgrade Strategy - -The additional command line flag format for `--runtime-config` will not be recognized on older levels of kubernetes. -This means that when downgrading, cluster-admins will have to adjust their CLI arguments if they opted into a new beta API. -This is congruent to flag handling for new features today. -Because this only impacts new beta APIs, there is no behavior change for existing APIs on upgrade. - -### Version Skew Strategy - -Because this only impacts new beta APIs, there is no novel skew risk. - -## Production Readiness Review Questionnaire - -Not applicable. - - - -### Feature Enablement and Rollback - - - -###### How can this feature be enabled / disabled in a live cluster? - - - -- [ ] Feature gate (also fill in values in `kep.yaml`) - - Feature gate name: - - Components depending on the feature gate: -- [ ] Other - - Describe the mechanism: - - Will enabling / disabling the feature require downtime of the control - plane? - - Will enabling / disabling the feature require downtime or reprovisioning - of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). - -###### Does enabling the feature change any default behavior? - - - -###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? - - - -###### What happens if we reenable the feature if it was previously rolled back? - -###### Are there any tests for feature enablement/disablement? - - - -### Rollout, Upgrade and Rollback Planning - - - -###### How can a rollout or rollback fail? Can it impact already running workloads? - - - -###### What specific metrics should inform a rollback? - - - -###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? - - - -###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? - - - -### Monitoring Requirements - - - -###### How can an operator determine if the feature is in use by workloads? - - - -###### How can someone using this feature know that it is working for their instance? - - - -- [ ] Events - - Event Reason: -- [ ] API .status - - Condition name: - - Other field: -- [ ] Other (treat as last resort) - - Details: - -###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? - - - -###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - - - -- [ ] Metrics - - Metric name: - - [Optional] Aggregation method: - - Components exposing the metric: -- [ ] Other (treat as last resort) - - Details: - -###### Are there any missing metrics that would be useful to have to improve observability of this feature? - - - -### Dependencies - - - -###### Does this feature depend on any specific services running in the cluster? - - - -### Scalability - - - -###### Will enabling / using this feature result in any new API calls? - - - -###### Will enabling / using this feature result in introducing new API types? - - - -###### Will enabling / using this feature result in any new calls to the cloud provider? - - - -###### Will enabling / using this feature result in increasing size or count of the existing API objects? - - - -###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? - - - -###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? - - - -### Troubleshooting - - - -###### How does this feature react if the API server and/or etcd is unavailable? - -###### What are other known failure modes? - - - -###### What steps should be taken if SLOs are not being met to determine the problem? - -## Implementation History - - - -## Drawbacks - - - -## Alternatives - - - -## Infrastructure Needed (Optional) - - From c99f1042504745d0895e58128c93926db4c9e07b Mon Sep 17 00:00:00 2001 From: David Eads Date: Thu, 20 Jan 2022 15:35:47 -0500 Subject: [PATCH 3/6] add additional GA criteria for beta off by default --- .../3136-beta-apis-off-by-default/README.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/keps/sig-architecture/3136-beta-apis-off-by-default/README.md b/keps/sig-architecture/3136-beta-apis-off-by-default/README.md index ecaeaa93c79..fef640e48f6 100644 --- a/keps/sig-architecture/3136-beta-apis-off-by-default/README.md +++ b/keps/sig-architecture/3136-beta-apis-off-by-default/README.md @@ -205,9 +205,13 @@ Given how kubernetes is now treated, this is a good thing, not a bad thing. Those users that want to move quickly and get new features can do so by enabling all beta feature or just enabling those that are important for their workload. The [PRR survey](https://datastudio.google.com/reporting/2e9c7439-202b-48a9-8c57-4459e0d69c8d/page/Cv5HB) shows that -over 30% of production clusters have alpha features enabled, so clsuter-admins are willing and able to enable features +over 30% of production clusters have alpha features enabled, so cluster-admins are willing and able to enable features that are not on by default when they are desired. +If two or more APIs are tightly coupled together, it will now be possible to enable them independently. +This can lead to unanticipated failure modes, but should only impact beta APIs with beta dependencies. +While this is a risk, it is not very common and components should fail safe as a general principle. + ## Design Details