Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow installer to include/exclude components based on user select install solution #784

Closed
wants to merge 1 commit into from

Conversation

bparees
Copy link
Contributor

@bparees bparees commented May 17, 2021

No description provided.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 17, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from bparees after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bparees
Copy link
Contributor Author

bparees commented May 17, 2021

/assign @derekwaynecarr
cc @ecordell @staebler

@bparees bparees force-pushed the installdomains branch 3 times, most recently from 5f325f0 to 145856a Compare May 17, 2021 20:43
the CVO overrides as part of rendering+editing manifests.

The proposed UX is to make this a first class part of the install config api with the implementation
being arguments supplied to the CVO to filter the user-selected manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is more involved in the installer implementation of this than simply passing along arguments to the CVO.

  1. Presumably the installer would need to know which addons can be excluded so that the installer can do validation on the user input.
  2. Some addons may require the installer to omit manifests that it would normally create. For example, if the baremetal-operator addons were omitted, then the installer would need to omit the manifest that configures the metal provisioning.
  3. There would probably be an expectation by the user that the installer would make some opinionated decisions on behalf of the user about which addons to exclude. Again with the baremetal-operator addon, my expectation as a user would be that the installer would configure the exclusions automatically so that the baremetal-operator addon was excluded when using a platform other than "baremetal".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. maybe? specifying things to exclude that don't actually exclude anything isn't a great UX but i dunno that it's a requirement that we tell you your exclusion didn't match anything.

  2. not sure i follow this one. I'd expect all related manifests to be annotated such that they get excluded as a group. Could the manifest that configures the metal provisioning not also be annotated? Is it instantiated by the CVO or by the installer itself?

  3. i'd classify that as a potential future goal/possibility, but not a requirement here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the same topic - will the user be able to say exclude bm-operator but install bm platform? If not, how the installer will know to block it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to cover these sort of scenarios here:
https://github.com/openshift/enhancements/pull/784/files#diff-4b412132f6f354bd03403cc85e104b445d5ff581abebae3d0b6fb69c58f094e7R120-R122

so we can discuss if that's "sufficient/acceptable" or not, but the position of this EP was to punt on such things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe? specifying things to exclude that don't actually exclude anything isn't a great UX but i dunno that it's a requirement that we tell you your exclusion didn't match anything.
I think it will be a bad experience if the user will have a typo. Maybe, we can find a way for the CVO to verify it during bootstrap and fail early. kinda cvo-render --verify-components

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. not sure i follow this one.  I'd expect all related manifests to be annotated such that they get excluded as a group.  Could the manifest that configures the metal provisioning not also be annotated?   Is it instantiated by the CVO or by the installer itself?

It is ultimately added to the cluster via CVO. So the manifest could have the annotation. The installer would need to add that annotation since the installer creates the manifest.

Copy link
Contributor

@hardys hardys May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the same topic - will the user be able to say exclude bm-operator but install bm platform? If not, how the installer will know to block it?

Seems like that is the scope of this proposal, but as @staebler mentions it implies some additional validation in the installer, e.g taking the baremetal example - it's reasonable to do create manifests or create install-configs with the baremetal SLO disabled (perhaps to deploy via UPI or assisted where those components may not be required), but a user should not be allowed to do create cluster with that configuration, because deployment of any worker hosts will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have more recent requirements to be able to install the metal tools on platforms other than baremetal, so that's probably no longer a good example of something we would expect the installer to exclude automatically.

That said, I think we can ignore the automated guidance angle, at least in a first iteration. We can make it fancier/smarter later, but the first version should have simple rules: The default guidance is to install everything. If you want less than everything, you tell the CVO and it leaves out the things you don't want.

b) enable/disable specific configurations such as "HA", where components could contribute multiple
deployment definitions for different configurations and then the installer/CVO would select the correct
one based on the chosen install configuration (HA vs single node) instead of having components read/reconcile
the infrastructure resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be useful to expand the scope to include opt-in to optional components, for example see the discussion on #747 where we'd like to make the kubernetes-nmstate operator optionally available on any platform (and probably enabled by default on some where it's commonly used, e.g baremetal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now i want to constrain this to "how do we make the set of things the CVO currently installs be dynamic instead of static"

mechanisms for having OLM-based components be a first class part of an install experience is going to be a follow-on item.

enhancements/installer/component-selection.md Show resolved Hide resolved
enhancements/installer/component-selection.md Show resolved Hide resolved
of that component explicitly opting into a particular (presumably well tested) configuration/topology
[here](https://github.com/openshift/enhancements/pull/200#discussion_r375837903). The position of this EP is that
we should only recommend the exclusion of fully independent "addon" components that are not depended on by
other components. Further the assumption is that it will be reasonable to tell a customer who disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that removing the CBO from hypershift clusters is desirable, but I think there's a misunderstanding that we need to address.

Most of the examples given above are "easy" because nothing else depends on having them present. That's not true for the baremetal, unfortunately. The whole reason it's in the payload at all is that it underpins IPI deployment and the machine-api. As hardys points out, when the platform is baremetal, if the implementation isn't present, the cluster won't form and deployment will fail in a not obvious way. (Either the installer would try to create BareMetalHost resources using a CRD that isn't installed and installation would time out, or the installer would not try to create BareMetalHost resources and then the nodes and machines wouldn't be linked and nodes wouldn't be allowed to join the cluster as it is forming.)

Maybe baremetal is a one-off case, and we can do the validation for it in the place where we validate the other platform inputs. Can we identify other similar cases, though? Do we need something more complex than annotations to support cross-component dependencies? Or are there similar places in the installer where we could put dependency-specific validation checks for other cases, one at a time as they come up?

the infrastructure resource.

2. How does the admin enable a component post-install if they change their mind about what components
they want enabled? Do we need/want to allow this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we produce OLM versions of the add-ons, and let them use OLM to install the components later if they aren't present from the start?

If we do that, could we move the add-ons out of the release payload and always manage them via OLM, even if it means the installer defaults to generating the OLM manifests to trigger installing the add-on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's plausible, but olm versions of things that are deemed "essential" brings its own set of challenges:

  1. disconnected installs (you need to mirror the olm catalog also, or we need to include an olm catalog in the payload)
  2. independent versioning/release cadence and ensuring version-matrix compatibility
  3. install ordering (no olm component can possibly come up until OLM itself comes up). - likely not a concern for things we have deemed "Addons" but just noting it.

these are definitely things we're going to be starting to explore (looks like cert-manager may be the first candidate) but at least right now it raises more questions than it answers, for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a better post-install solution is to just let the user modify the CVO resource that records the list of skipped components? Then when the CVO reconciles, it would (un)install components based on the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then when the CVO reconciles, it would (un)install components based on the change.

that would be a substantially new capability for the CVO (and also requires the operators themselves get a chance to clean up any changes they've made to the cluster directly). The other things in this EP are fairly small/incremental changes from what the CVO does today, but removal would be a big new capability.

enhancements/installer/component-selection.md Outdated Show resolved Hide resolved
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 17, 2021
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 24, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Sep 24, 2021
a) turn on/off groups of components as defined by "solutions" (e.g. a "headless" solution
which might turn off the console but also some other components). This is what CLUSTER_PROFILES
sort of enable, but there seems to be reluctance to expand the cluster profile use case to include
these sorts of things.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the issue with profiles is managing the proliferation of definitions of groupings of individual components as we try to predict what users will or won't prioritize including in their clusters. We may have the same challenge with defining solutions or configurations. Starting with the low level components individually will give users complete control, and we can layer some sort of grouping concept on top later if we find it necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point the EP is directed towards this approach, basically grouping resources at the second-level-operator level (technically people can annotate the resources however they want, but the expected usage will be to annotate component-related resources with a common name that the admin can then opt in/out of).

slightly more complicated is the case where 2 or more components have interdependencies and what we do if the admin only disables one of them. I'm not sure we've identified any examples of this that need to be disable-able in the first pass, though.

the infrastructure resource.

2. How does the admin enable a component post-install if they change their mind about what components
they want enabled? Do we need/want to allow this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a better post-install solution is to just let the user modify the CVO resource that records the list of skipped components? Then when the CVO reconciles, it would (un)install components based on the change.

the cluster and used to filter applied resources during upgrades also. My understanding is today
this is handled with CLUSTER_PROFILES and EXCLUDE_ANNOTATIONS by setting the env vars on the CVO
pod, but if we want to allow the set to be changed (see (2), we need a more first class config
resource that is admin editable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a new component have been filtered out if the user didn't know to do so because it didn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the new component (or new resource for an existing component) uses the same "component name" that was part of the initial filter list during install.


4) After installing a cluster, change the set of addons that are included/excluded.
Newly included addon resources should be created, newly excluded ones should be
deleted. (Do we want to support this?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it cost us anything extra to support it, other than the testing effort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing previously excluded components to be added to the cluster is relatively easy.

allowing previously included components to be removed is hard, per the other comment thread (CVO doesn't know how to clean stuff up today, and can't be sure it knows everything an operator may have done)

And allowing it at all means we need a user editable config resource that the CVO watches to update its filter list. If you can't change it after install time, you don't need any of that.

the CVO overrides as part of rendering+editing manifests.

The proposed UX is to make this a first class part of the install config api with the implementation
being arguments supplied to the CVO to filter the user-selected manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have more recent requirements to be able to install the metal tools on platforms other than baremetal, so that's probably no longer a good example of something we would expect the installer to exclude automatically.

That said, I think we can ignore the automated guidance angle, at least in a first iteration. We can make it fancier/smarter later, but the first version should have simple rules: The default guidance is to install everything. If you want less than everything, you tell the CVO and it leaves out the things you don't want.

@bparees
Copy link
Contributor Author

bparees commented Oct 5, 2021

new PR:
#922

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants