-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config ClusterResourceOverrides as mutating admission webhook via OLM #13
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,185 @@ | ||
--- | ||
title: Cluster-Resource-Overrides-Enablement | ||
authors: | ||
- "@deads2k" | ||
reviewers: | ||
- "@sttts" | ||
- "@derekwaynecarr" | ||
approvers: | ||
- "@derekwaynecarr" | ||
creation-date: 2019-09-11 | ||
last-updated: 2019-09-11 | ||
status: provisional | ||
see-also: | ||
replaces: | ||
superseded-by: | ||
--- | ||
|
||
# ClusterResourceOverrides Enablement | ||
|
||
The `autoscaling.openshift.io/ClusterResourceOverride` cannot be enabled in 4.x. The plugin already exists, this design | ||
is about how we make it possible for a customer to enable the feature. | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift/docs] | ||
|
||
## Summary | ||
|
||
The `autoscaling.openshift.io/ClusterResourceOverride` admission plugin is an uncommonly used admission plugin with configuration | ||
values. Because it is uncommonly used, it doesn't fit well with our targeted configuration which aims to avoid adding | ||
lots of intricately documented knobs. Instead of wiring the admission plugin via a kube-apiserver operator, we can create | ||
a mutating admission webhook based on the [generic-admission-server](https://github.com/openshift/generic-admission-server) | ||
and install it via OLM. | ||
|
||
## Motivation | ||
|
||
The `autoscaling.openshift.io/ClusterResourceOverride` admission plugin is used for over-commit, let's stipulate that it is | ||
important enough to enable. The kube-apiserver is designed to be extended using mutating admission webhooks, we have the | ||
technology to easily build one, and we have the ability to create a simple operator to manage it. We want to enable the | ||
feature using a pattern that we can extend to other admission plugins that can scale beyond the small team that maintains | ||
the kube-apiserver. | ||
|
||
### Goals | ||
|
||
1. Enable the `autoscaling.openshift.io/ClusterResourceOverride` admission plugin that is used for overcommit. | ||
2. Use existing extension points, libraries, and installation mechanisms in the manner we would recommend to | ||
external teams. | ||
3. Have a fairly straightforward way to install and enable this admission plugin. | ||
4. Rebootstrapping must be possible. | ||
|
||
### Non-Goals | ||
|
||
1. Revisit how `autoscaling.openshift.io/ClusterResourceOverride` works. We're lifting it as-is. | ||
2. Couple a slow moving admission plugin to a fast moving kube-apiserver. | ||
|
||
### Open Questions | ||
|
||
1. Do we need to protect openshift resources from being overcommitted? Perhaps the cluster-admin's intent is exactly that. | ||
2. We cannot uniformly apply protection just to our payload resources, how do we position this? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we could have an opt-in per namespace behavior. the world has evolved around us since this original plug-in was conceived, it is possible that operators that take resource requirements as input could attempt to reconcile a overriden resource, but that is not much different than limit ranges already. |
||
External teams may be surprised that their resource requirements are not respect, but ultimately the cluster-admin is in | ||
control of his cluster. This is what running self-hosted means. | ||
3. How are OLM operators tested against OpenShift levels? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One can test with OLM during openshift CI by creating a config map catalog source of the CSV, using a testing script to get the CSV installed and running, and then run any smoke tests that by the user. We have put together a small script to help with this testing. |
||
4. How do we build and distribute this OLM operator using OpenShift CI? | ||
5. How do we describe version skew limitations to OLM so our operator gets uninstalled *before* an illegal downgrade or upgrade? | ||
This is a concrete case of the API we want to use isn't available before 1.16 and after 1.18, the previous API could be gone. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify, this is not a strict requirement for 1.16, just an API graduated to a new version. |
||
|
||
## Proposal | ||
|
||
1. Create a mutating admission webhook server that provides `autoscaling.openshift.io/ClusterResourceOverride`. | ||
2. Create an operator that can install, maintain, and configure this mutating admission webhook. | ||
3. Ensure that we consistently label all prereq namespaces (we attempted runlevel before so this may work), to be sure | ||
that we re-bootstrap. | ||
4. Expose the new operator via OLM and integrate our docs that way. | ||
|
||
### User Stories [optional] | ||
|
||
#### Story 1 | ||
|
||
#### Story 2 | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
We *must* be able to re-bootstrap the cluster. This means that a cluster with this admission plugin created must be able | ||
to be completely shut down and subsequently restarted. | ||
|
||
### Risks and Mitigations | ||
|
||
External teams may be surprised that their resource requirements are not respect, but ultimately the cluster-admin is in | ||
control of his cluster. This is what running self-hosted means. | ||
|
||
## Design Details | ||
|
||
### Test Plan | ||
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
TBD, see open questions. | ||
|
||
### Graduation Criteria | ||
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
Define graduation milestones. | ||
|
||
These may be defined in terms of API maturity, or as something else. Initial proposal | ||
should keep this high-level with a focus on what signals will be looked at to | ||
determine graduation. | ||
|
||
Consider the following in developing the graduation criteria for this | ||
enhancement: | ||
- Maturity levels - `Dev Preview`, `Tech Preview`, `GA` | ||
- Deprecation | ||
|
||
Clearly define what graduation means. | ||
|
||
#### Examples | ||
|
||
These are generalized examples to consider, in addition to the aforementioned | ||
[maturity levels][maturity-levels]. | ||
|
||
##### Dev Preview -> Tech Preview | ||
|
||
- Ability to utilize the enhancement end to end | ||
- End user documentation, relative API stability | ||
- Sufficient test coverage | ||
- Gather feedback from users rather than just developers | ||
|
||
##### Tech Preview -> GA | ||
|
||
- More testing (upgrade, downgrade, scale) | ||
- Sufficient time for feedback | ||
- Available by default | ||
|
||
**For non-optional features moving to GA, the graduation criteria must include | ||
end to end tests.** | ||
|
||
##### Removing a deprecated feature | ||
|
||
- Announce deprecation and support policy of the existing feature | ||
- Deprecate the feature | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
See open questions. | ||
|
||
### Version Skew Strategy | ||
|
||
How will the component handle version skew with other components? | ||
What are the guarantees? Make sure this is in the test plan. | ||
|
||
Consider the following in developing a version skew strategy for this | ||
enhancement: | ||
- During an upgrade, we will always have skew among components, how will this impact your work? | ||
- Does this enhancement involve coordinating behavior in the control plane and | ||
in the kubelet? How does an n-2 kubelet without this feature available behave | ||
when this feature is used? | ||
- Will any other components on the node change? For example, changes to CSI, CRI | ||
or CNI may require updating that component before the kubelet. | ||
|
||
## Implementation History | ||
|
||
Major milestones in the life cycle of a proposal should be tracked in `Implementation | ||
History`. | ||
|
||
## Drawbacks | ||
|
||
The idea is to find the best form of an argument why this enhancement should _not_ be implemented. | ||
|
||
## Alternatives | ||
|
||
Similar to the `Drawbacks` section the `Alternatives` section is used to | ||
highlight and record other possible approaches to delivering the value proposed | ||
by an enhancement. | ||
|
||
## Infrastructure Needed [optional] | ||
|
||
Use this section if you need things from the project. Examples include a new | ||
subproject, repos requested, github details, and/or testing infrastructure. | ||
|
||
Listing these here allows the community to get the process for these resources | ||
started right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would not recommend that to any cluster-admin.