Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add: upgrade-blocker-operator #624

Conversation

michaelgugino
Copy link
Contributor

No description provided.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign dustymabe after the PR has been reviewed.
You can assign the PR to them by writing /assign @dustymabe in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

as severity.

The CRD should have a field in spec "preventUpgrades" with a boolean
value to clearly indicate if a given CR is block an upgrade or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: boolean will not address minor (4.y -> 4.y') vs. patch (4.y.z -> 4.y.z') bumps. Probably better to use an enumerated type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an open question if we need to add such an interface. Probably we should, and so we might need something like
"blockAllUpgrades" and "blockMinorUpgrades" I dislike that we use the term 'minor' because they're IMO not really 'minor' upgrades, but the wording can be ironed out I think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this enhancement was ARO request. We will be doing this without this (just by creating CO CRD for ARO operator) and managing states there.
What we could really use is
Upgradable, and UpgradableMinor or something like this.

So if this enhancement will get closed we would not get sad :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike that we use the term 'minor' because they're IMO not really 'minor' upgrades...

SemVer spec:

Given a version number MAJOR.MINOR.PATCH...

Easier to not invent new, OpenShift-specific words for the different SemVer components.

What we could really use is Upgradable, and UpgradableMinor or something like this.

When would you allow minor bumps but block patch bumps? I can see cases for "allow everything", "block minor bumps, allow patch bumps", and "block everything". You can handle that with a single knob and an enumerated list of values. You can't handle that with two separate knobs, without also exposing the nonsensical "allow minor bumps, block patch bumps".

And yeah, we can probably close this now that ARO is going with an admission webhook in front of ClusterVersion, and revisit if/when we get another use-case where a generic operator would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARO is only one user, and IMO, webhooks as a way to block upgrades is not the best design. We should build an API that allows any user to block an upgrade if they choose.

User intent being translated into actions on the cluster, that's an API. We use CRDs to build those APIs today. We can use the existing ClusterOperator API (upgradeable=False) with a new user-facing API to create this functionality.

@fabiand
Copy link

fabiand commented Feb 24, 2021

This change is also helpful for CNV scenarios.

@ashcrow
Copy link
Member

ashcrow commented Feb 24, 2021

This seems reasonable to me. /cc @yuqi-zhang


Sometimes clusters shouldn't be upgraded. Currently, the only signal is
ClusterOperator statuses monitored by the CVO. Cluster administrators need
to be able to optionally inform the cluster when upgrades aren't appropriate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, but doesn't upgrade require an action from a cluster administrator? So is this enhancement about adding a pluggable model that would allow administrators to encode arbitrary pre-upgrade checks into the tool itself, rather than having the administrator manually do checks before triggering the update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're managing a fleet of say, a dozen clusters, you don't necessarily want to stop and ensure each one meets some arbitrary criteria, or worse, start an upgrade when the criteria has not been met.

Also, the end goal is to have cluster upgrades be automatic. Having an API like this would potentially allow someone to turn on automatic upgrades, and still have a large degree over when and whether those upgrades take place.

@dofinn
Copy link
Contributor

dofinn commented Feb 24, 2021

Very high level thoughts:

Why not:

@michaelgugino
Copy link
Contributor Author

Very high level thoughts:

Why not:

* add this functionality to https://github.com/openshift/managed-upgrade-operator

* make all the functionalities configurbale via CR

* make MUO apart of OCP.

MUO seems useful. upgrade-blocker-operator would be a second level operator that blocks upgrades based on the state of one or more CR's. MUO would be a client of this API, it would create/update the CRs to affect the block upgrades, based on what I read about MUO.

upgrade-blocker-operator doesn't need to know the why/how a decision is made to block upgrades, it's simply the CRD spec and the ClusterOperator status.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jul 26, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2021

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants