-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow installer to include/exclude components based on user select install solution #784
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,246 @@ | ||
--- | ||
title: component-selection-during-install | ||
authors: | ||
- "@bparees" | ||
reviewers: | ||
- "@decarr" | ||
- "@staebler" | ||
approvers: | ||
- "@decarr" | ||
creation-date: 2021-05-04 | ||
last-updated: 2021-05-04 | ||
status: provisional | ||
--- | ||
|
||
# User Selectable Install Solutions | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Operational readiness criteria is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
This enhancement proposes a mechanism for cluster installers to exclude one or more optional components for | ||
their installation which will determine which payload components are/are not installed in their cluster. | ||
Core components are defined as the set of Second Level Operators managed by the Cluster Version Operator | ||
which today cannot be disabled until after completing the install and editing a CVO override, or editing | ||
the CVO overrides as part of rendering+editing manifests. | ||
|
||
The proposed UX is to make this a first class part of the install config api with the implementation | ||
being arguments supplied to the CVO to filter the user-selected manifests. | ||
|
||
## Motivation | ||
|
||
There is an increasing desire to move away from "one size fits all" cluster installations, and | ||
towards flexibility about what should/should not exist in a new cluster out of the box. This can | ||
be seen in efforts such as hypershift, single node, and code-ready-containers. Each of these | ||
efforts has done some amount of one-off work to enable their requirements. This EP proposes a | ||
mechanism that allows components to be disabled in a first class way that the installer exposes. | ||
|
||
### Goals | ||
|
||
* Users can easily explicitly exclude specific "optional" components from their cluster, at install time. | ||
|
||
### Non-Goals | ||
|
||
* Making control-plane critical components optional (k8s apiserver, openshift apiserver, openshift controller, | ||
networking, etc) | ||
* Defining which components should be disable-able (this will be up to component teams to classify themselves | ||
as `addons` or not) | ||
|
||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
* As a user creating a new cluster that will be managed programmatically, I do not want the additional | ||
security exposure and resource overhead of running the web console. I would like a way to install | ||
a cluster that has no console out of the box, rather than having to disable it post-install or | ||
modify rendered manifests in a way that requires deep understanding of the OCP components/resources. | ||
|
||
* As a team scaffolding a managed service based on openshift, I want to minimize the footprint of my | ||
clusters to the components I need for the service. | ||
|
||
* As a user creating a cluster that will never run an image registry, I do not want the additional overhead | ||
of running the image registry operator, or have to remove the default registry that is created. | ||
|
||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
The CVO already has the ability to respect annotations on resources, as can be seen | ||
[here](https://github.com/openshift/cluster-kube-apiserver-operator/blob/c03c9edf5fddf4e3fb1bc6d7afcd2a2284ca03d8/manifests/0000_20_kube-apiserver-operator_06_deployment.yaml#L10) and leveraged [here](https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/assets/cluster-version-operator/cluster-version-operator-deployment.yaml#L47-L48). | ||
This proposal consists of two parts: | ||
|
||
1) Formalizing a concept of an "addon" annotation which allows a given resource to be excluded based | ||
on installer input. For example the console related resources could be annotated as | ||
dhellmann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```yaml | ||
annotations: | ||
bparees marked this conversation as resolved.
Show resolved
Hide resolved
|
||
addon.openshift.io/console: "true" | ||
``` | ||
|
||
2) Defining an install config mechanism whereby the user can opt out of specific addons. | ||
|
||
InstallConfig.ExcludeAddons | ||
- console | ||
- samples | ||
|
||
Which resources ultimately get installed for a given cluster would be the set of resources encompassed | ||
by the CLUSTER_PROFILE(if any), minus any resources explicitly excluded by the excluded addons configuration. | ||
|
||
Examples of candidate components to be treated as addons: | ||
|
||
* console | ||
* imageregistry | ||
* samples | ||
* baremetal operator | ||
bparees marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* olm | ||
* ??? | ||
|
||
### Risks and Mitigations | ||
|
||
The primary risk is that teams understand how to use these new annotations and apply them | ||
correctly to the full set of resources that make up their addon. Inconsistent or | ||
partial labeling will result in inconsistent or partially deployed resources. | ||
|
||
Another risk is that this introduces more deployment configurations which might | ||
have unforeseen consequences (e.g. not installing the imageregistry causes some | ||
other component that assumes there is always an imageregistry or assumes the | ||
presence of some CRD api that is installed with the imageregistry to break). | ||
|
||
There was some discussion about the pros/cons of allowing each component to be enabled/disabled independent | ||
of that component explicitly opting into a particular (presumably well tested) configuration/topology | ||
[here](https://github.com/openshift/enhancements/pull/200#discussion_r375837903). The position of this EP is that | ||
we should only recommend the exclusion of fully independent "addon" components that are not depended on by | ||
other components. Further the assumption is that it will be reasonable to tell a customer who disabled | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that removing the CBO from hypershift clusters is desirable, but I think there's a misunderstanding that we need to address. Most of the examples given above are "easy" because nothing else depends on having them present. That's not true for the baremetal, unfortunately. The whole reason it's in the payload at all is that it underpins IPI deployment and the machine-api. As hardys points out, when the platform is Maybe baremetal is a one-off case, and we can do the validation for it in the place where we validate the other platform inputs. Can we identify other similar cases, though? Do we need something more complex than annotations to support cross-component dependencies? Or are there similar places in the installer where we could put dependency-specific validation checks for other cases, one at a time as they come up? |
||
something and ended up with a non-functional cluster that their chosen exclusions are simply not supported | ||
currently. | ||
|
||
|
||
## Design Details | ||
|
||
### Open Questions | ||
|
||
|
||
1. Do we want to constrain this functionality to turning off individual components? We could | ||
also use it to | ||
a) turn on/off groups of components as defined by "solutions" (e.g. a "headless" solution | ||
which might turn off the console but also some other components). This is what CLUSTER_PROFILES | ||
sort of enable, but there seems to be reluctance to expand the cluster profile use case to include | ||
these sorts of things. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Part of the issue with profiles is managing the proliferation of definitions of groupings of individual components as we try to predict what users will or won't prioritize including in their clusters. We may have the same challenge with defining solutions or configurations. Starting with the low level components individually will give users complete control, and we can layer some sort of grouping concept on top later if we find it necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. at this point the EP is directed towards this approach, basically grouping resources at the second-level-operator level (technically people can annotate the resources however they want, but the expected usage will be to annotate component-related resources with a common name that the admin can then opt in/out of). slightly more complicated is the case where 2 or more components have interdependencies and what we do if the admin only disables one of them. I'm not sure we've identified any examples of this that need to be disable-able in the first pass, though. |
||
b) enable/disable specific configurations such as "HA", where components could contribute multiple | ||
deployment definitions for different configurations and then the installer/CVO would select the correct | ||
one based on the chosen install configuration (HA vs single node) instead of having components read/reconcile | ||
the infrastructure resource. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO it would be useful to expand the scope to include opt-in to optional components, for example see the discussion on #747 where we'd like to make the kubernetes-nmstate operator optionally available on any platform (and probably enabled by default on some where it's commonly used, e.g baremetal) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for now i want to constrain this to "how do we make the set of things the CVO currently installs be dynamic instead of static" mechanisms for having OLM-based components be a first class part of an install experience is going to be a follow-on item. |
||
|
||
2. How does the admin enable a component post-install if they change their mind about what components | ||
they want enabled? Do we need/want to allow this? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we produce OLM versions of the add-ons, and let them use OLM to install the components later if they aren't present from the start? If we do that, could we move the add-ons out of the release payload and always manage them via OLM, even if it means the installer defaults to generating the OLM manifests to trigger installing the add-on? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's plausible, but olm versions of things that are deemed "essential" brings its own set of challenges:
these are definitely things we're going to be starting to explore (looks like cert-manager may be the first candidate) but at least right now it raises more questions than it answers, for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps a better post-install solution is to just let the user modify the CVO resource that records the list of skipped components? Then when the CVO reconciles, it would (un)install components based on the change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
that would be a substantially new capability for the CVO (and also requires the operators themselves get a chance to clean up any changes they've made to the cluster directly). The other things in this EP are fairly small/incremental changes from what the CVO does today, but removal would be a big new capability. |
||
|
||
3. What are the implications for upgrades if a future upgrade would add a component which would have | ||
been filtered out during install time? The install time choices need to be stored somewhere in | ||
the cluster and used to filter applied resources during upgrades also. My understanding is today | ||
this is handled with CLUSTER_PROFILES and EXCLUDE_ANNOTATIONS by setting the env vars on the CVO | ||
pod, but if we want to allow the set to be changed (see (2), we need a more first class config | ||
resource that is admin editable) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would a new component have been filtered out if the user didn't know to do so because it didn't exist? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because the new component (or new resource for an existing component) uses the same "component name" that was part of the initial filter list during install. |
||
|
||
|
||
### Test Plan | ||
|
||
1) Install clusters w/ the various add-on components included/excluded and confirm | ||
that the cluster is functional but only running the expected add-ons. | ||
|
||
2) Upgrade a cluster to a new version that includes new resources that belong to | ||
an addon that was included in the original install. The new resources should be | ||
created. | ||
|
||
3) Upgrade a cluster to a new version that includes new resources that belong to | ||
an addon that was excluded in the original install. The new resources should *not* be | ||
created. | ||
|
||
4) After installing a cluster, change the set of addons that are included/excluded. | ||
Newly included addon resources should be created, newly excluded ones should be | ||
deleted. (Do we want to support this?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it cost us anything extra to support it, other than the testing effort? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. allowing previously excluded components to be added to the cluster is relatively easy. allowing previously included components to be removed is hard, per the other comment thread (CVO doesn't know how to clean stuff up today, and can't be sure it knows everything an operator may have done) And allowing it at all means we need a user editable config resource that the CVO watches to update its filter list. If you can't change it after install time, you don't need any of that. |
||
|
||
|
||
|
||
### Graduation Criteria | ||
bparees marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Would expect this to go directly to GA once a design is agreed upon/approved. | ||
|
||
#### Dev Preview -> Tech Preview | ||
N/A | ||
|
||
#### Tech Preview -> GA | ||
N/A | ||
|
||
#### Removing a deprecated feature | ||
|
||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
For upgrades, any new resources should have the same exclusion filters applied by the CVO. | ||
For downgrades, if downgrading below the version of the CVO that supports this logic | ||
previously excluded components will get created on the cluster. This is likely | ||
not a concern since you can't downgrade below the version you started at, and if | ||
you're using this feature that means you started at a version of the CVO that supports it. | ||
|
||
If we allow enabling filters post-install, then we need to revisit the implications of | ||
downgrades. | ||
|
||
There is also some risk if a particular resource has different annotations in different | ||
versions, then upgrading/downgrading could change whether that resource is excluded by | ||
the CVO or not. Once created, the CVO never deletes resources, so some manual cleanup | ||
might be needed to achieve the desired state. For downgrades this is probably acceptable, | ||
for upgrades this could be a concern (resource A wasn't excluded in v1, but is excluded | ||
in v2. Clusters that upgrade from v1 to v2 will still have resource A, but clusters | ||
installed at v2 will not have it). Technically this situation can already arise today | ||
if a resource is deleted from the payload between versions. | ||
|
||
|
||
### Version Skew Strategy | ||
|
||
N/A | ||
|
||
## Implementation History | ||
|
||
N/A | ||
|
||
## Drawbacks | ||
|
||
The primary drawback is that this increases the matrix of cluster configurations/topologies and | ||
the behavior that is expected from each permutation. | ||
|
||
## Alternatives | ||
|
||
* CVO already supports a CLUSTER_PROFILE env variable. We could define specific profiles like "headless" | ||
that disables the console. CLUSTER_PROFILE isn't a great fit because the idea there is to define a relatively | ||
small set of profiles to define specific sets of components to be included, not to allow a user to fully pick | ||
and choose individual components. We would have to define a large set of profiles to encompass all the possible | ||
combinations of components to be enabled/disabled. | ||
|
||
* CVO already supports an EXCLUDE_MANIFESTS env variable which is used to implement the ROKS deployment topology. | ||
Unfortunately it only allows a single annotation to be specified, so even if we want to use it for this purpose | ||
it needs to be extended to support multiple annotations so multiple individual components can be excluded | ||
independently rather than requiring all components to be excluded to share a single common annotation. | ||
|
||
Regardless we need a way to expose this configuration as a first class part of the install config provided by the | ||
user creating the cluster, so at a minimum we need to add a mechanism to wire an install config value into | ||
the CVO arguments and allow the CVO to consume more than a single annotation to exclude. | ||
|
||
|
||
* Allow the installer to specify additional resources to `include` in addition to ones to `exclude`. This has the challenge | ||
of potentially conflicting with the specific set of resources that a cluster_profile defines. There are some | ||
components that should never be deployed in a particular cluster_profile and so we do not want to allow the user | ||
to add them. Examples would be resources that should only be created in standalone installs, not hypershift | ||
managed ones, because hypershift has its own versions of those resources. | ||
|
||
|
||
## Infrastructure Needed | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is more involved in the installer implementation of this than simply passing along arguments to the CVO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? specifying things to exclude that don't actually exclude anything isn't a great UX but i dunno that it's a requirement that we tell you your exclusion didn't match anything.
not sure i follow this one. I'd expect all related manifests to be annotated such that they get excluded as a group. Could the manifest that configures the metal provisioning not also be annotated? Is it instantiated by the CVO or by the installer itself?
i'd classify that as a potential future goal/possibility, but not a requirement here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the same topic - will the user be able to say exclude bm-operator but install bm platform? If not, how the installer will know to block it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to cover these sort of scenarios here:
https://github.com/openshift/enhancements/pull/784/files#diff-4b412132f6f354bd03403cc85e104b445d5ff581abebae3d0b6fb69c58f094e7R120-R122
so we can discuss if that's "sufficient/acceptable" or not, but the position of this EP was to punt on such things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? specifying things to exclude that don't actually exclude anything isn't a great UX but i dunno that it's a requirement that we tell you your exclusion didn't match anything.
I think it will be a bad experience if the user will have a typo. Maybe, we can find a way for the CVO to verify it during bootstrap and fail early. kinda cvo-render --verify-components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ultimately added to the cluster via CVO. So the manifest could have the annotation. The installer would need to add that annotation since the installer creates the manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like that is the scope of this proposal, but as @staebler mentions it implies some additional validation in the installer, e.g taking the baremetal example - it's reasonable to do
create manifests
orcreate install-configs
with the baremetal SLO disabled (perhaps to deploy via UPI or assisted where those components may not be required), but a user should not be allowed to docreate cluster
with that configuration, because deployment of any worker hosts will fail.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have more recent requirements to be able to install the metal tools on platforms other than
baremetal
, so that's probably no longer a good example of something we would expect the installer to exclude automatically.That said, I think we can ignore the automated guidance angle, at least in a first iteration. We can make it fancier/smarter later, but the first version should have simple rules: The default guidance is to install everything. If you want less than everything, you tell the CVO and it leaves out the things you don't want.