Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer Cluster Upgrades for OLM Managed Operators #592

Merged
merged 1 commit into from
Feb 5, 2021

Conversation

awgreene
Copy link
Contributor

Introduces proposal to update OLM to consider if the collection of
Operators on cluster will continue to work on the next Minor OpenShift
version before reporting whether or not it is upgradeable to CVO.

Copy link
Contributor

@gallettilance gallettilance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if OLM could inspect the bundle labels to see what it supports. All bundles that don't support a specific OCP version will have a max version set moving forward. The limitation is OLM won't be able to tell when a bundle has been deprecated or if it will be able to continue to receive updates after cluster upgrade. But it sounds like that's not the goal of the proposal.


The bulk of this enhancement focuses on defining the steps that OLM will take to determine upgrade safety based on the collection of installed operators.

### Allowing Operator Authors to define supported OpenShift Versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operators that publish to the redhat operators, isv and community operator catalogs that are on cluster use a label on their bundle image to indicate which OCP versions they support (see: https://docs.engineering.redhat.com/display/CFC/Delivery). It seems redundant to have them also add a field in the CSV.

Copy link
Contributor Author

@awgreene awgreene Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gallettilance thanks for sharing that delivery document. We can absolutely pursue that approach since this is an OpenShift only feature. However, there are two concerns that I would like to address before moving forward:

  1. The "Max OpenShift Version" could only be inferred when the bundle image is shipped to a specific version of OpenShift (V4.6) or a range of OpenShift Version (v4.5-v4.7, where we would assume the end of the range as the Max OpenShift Version, 4.7 in this case) as all other forms are shipped to future OpenShift versions. This means that the "Max OpenShift Version" could not be inferred for a set of labels that are currently in use so OLM would still need to provide the ability for this subset of operators to specify a "Max OpenShift Version".
  2. Currently, these labels simply identify which OpenShift versions they should be available on. With this proposed change, labels will also specify when to prevent the Cluster from upgrading to the next version. I suspect there might be a number of operators that will unexpectedly prevent cluster upgrades if we make this change. Consider a 4.,6 cluster where a bundle is shipped labeled with the =4.6 label and the next version of the bundle is shipped with the =4.7 label. The 4.6 cluster would not be able to upgrade to 4.7 as the existing bundle would set the max OpenShift Version to 4.6. We could instead assume that the Max OpenShift Version is equal to 1 more than the provided version (Example: a bundle labeled with =4.6 equals a "Max OpenShift Version" of 4.7) but that would not handle the usecase where a bundle made available on 4.6 is known not to run on 4.7, in which case a new channel should be made available that includes bundles that would unblock the upgrade.

Overall, I am concerned that if we commandeer these labels we might make unfair assumptions about what was originally intended by the team shipping the operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Operator authors can't modify their CSV after they are released. In both cases (label and CSV annotation) they have to use their best guess (which is why there is a deprecation mechanism in the pipeline) and that guess will be wrong exactly in the same way for both right? let me know if I misunderstood what you were trying to say
  2. I think we need to distinguish "installable" and "upgradeable". In the case where I set a max version in the labels, the intention is for the version to be no longer installable on OCP versions after that max version. However, it may still be able to run and upgrade on OCP versions after that max version - if it cannot maybe this is where deprecation comes into play (given that the resolver has access to that information via properties on the bundle, it could surface that somehow). To be honest we haven't explicitely made that distinction in the pipeline but it definitely feels like a reasonable ask given this enhancement. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response @gallettilance

  1. Operator authors can't modify their CSV after they are released. In both cases (label and CSV annotation) they have to use their best guess (which is why there is a deprecation mechanism in the pipeline) and that guess will be wrong exactly in the same way for both right? let me know if I misunderstood what you were trying to say

There is no need to modify the CSV after it is released, consider the following example:

  • Operator Foo has been developed for and tested against a 4.7 cluster.
  • The Author sets the maxOpenShiftVersion annotation on Operator Foo's CSV to 4.7
  • OpenShift 4.8 becomes available.
  • Clusters with Operator Foo are unable to upgrade to 4.8
  • A new version of Operator Foo is released in the existing 4.7 index / channel. This version of Operator Foo has been tested on OpenShift Version 4.8 and has had the maxOpenShiftVersion annotation set to 4.8.
  • (Optional Step) All earlier versions of Operator Foo are marked as deprecated. (unsure if this is needed).
  • Cluster Admins can now upgrade Operator Foo to the latest version, unblocking Cluster Upgrades.

I think we need to distinguish "installable" and "upgradeable"

Agreed!

In the case where I set a max version in the labels, the intention is for the version to be no longer installable on OCP versions after that max version. However, it may still be able to run and upgrade on OCP versions after that max version - if it cannot maybe this is where deprecation comes into play (given that the resolver has access to that information via properties on the bundle, it could surface that somehow). To be honest we haven't explicitly made that distinction in the pipeline but it definitely feels like a reasonable ask given this enhancement. What do you think?

I believe that you are suggesting that we could prevent cluster upgrades based on whether or not the CSV has been deprecated. This is certainly possible, but I am concerned that:

  • Operator Authors would be forced to mark a number of CSVs as deprecated before each OpenShift release to prevent cluster upgrades. Missing this step could allow Cluster Upgrades that disrupt operator services. Note that the workflow above does not include this downside.
  • Upgrades would only be disabled if the index on the cluster is at the earliest version that includes all the deprecation changes..

If you want to pull in eyes from the pipeline team to discuss please do, I welcome any and all reviews :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should plan to make use of declarative index for this.

  • A team publishes an operator v1 during openshift 4.6. It is tested on 4.6 and stamped with maxOCPVersion: 4.6
  • 4.7 is released. Some combination of QE / automated tests / static analysis determines to some degree of confidence that the operator should or does run on 4.7.
  • The index is updated to overwrite maxOCPVersion to be 4.7 for v1.

The goal of the declarative config is to allow us to overwrite metadata (like maxOCPVersion) for operators after release.

Certainly what @awgreene said will allow us to make use of maxOCPVersion without declarative config - but we know that we want to deliver declarative config for 4.8 already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just include a bit about dropping properties into the bundle and making sure that this workflow works that way as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will bump the Declaring Max OpenShift Version via the Declarative Index Config section into the main proposal and outline the advantages that @ecordell discussed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you proposing that adopting declarative config is the only way operator authors will be able to specify maxVersion?

Nope - just that with the declarative config, there is no need to publish a new bundle to support a new ocp versions. Existing bundles can be annotated with new support statements as we discover them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - just that with the declarative config, there is no need to publish a new bundle to support a new ocp versions. Existing bundles can be annotated with new support statements as we discover them.

ack, ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this portion of the document to outline the benefits of declaring the operators Max OpenShift Version using the Declarative Index Config.

| -------------------| -------------------------------------------------------------------------------------------------- |
| Upgradeable CSV | A CSV with a valid upgradeable annotation that will run on the next Minor version of OpenShift |
| Unupgradeable CSV | A CSV with a valid upgradeable annotation that will not run on the next Minor version of OpenShift |
| Undeterminable CSV | A CSV that does not include a valid upgradeable annotation |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will these conditions be presented to the admin? In particular the "indeterminate" state needs to be clear to the admin since we are not(?) going to block upgrades when we are in that state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways to initiate a Cluster Upgrade, via the CLI and OpenShift Console UI.

CLI

I am unsure how we could notify users that there is an operator that is "indeterminate". I believe the Cluster Admin would need to look at the OLM ClusterOperator.

OpenShift Console UI

The messages in ClusterOperators t is already available in the Cluster Admin panel:
ClusterOperators

The UI Team could work to make this more visible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i don't think there's much we can do about the cli. Which is all the more reason we need to ensure that if we are in that state, the warning is very clearly presented to the admin. (We can also document as part of the upgrade process that we recommend you check the OLM status before starting an upgrade).

For the UI, I was thinking we might want an explicit alert when we are in this state, rather than relying on the user to go look at the operator status. But perhaps that is too aggressive/noisy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option for the UI is to present/warn the user when they are on the upgrade panel, that OLM is not happy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the UI, I was thinking we might want an explicit alert when we are in this state, rather than relying on the user to go look at the operator status. But perhaps that is too aggressive/noisy.

By "alert" do you mean a warning box or a prometheus alert? I don't know that we need this information picked up by telemeter.

Another option for the UI is to present/warn the user when they are on the upgrade panel, that OLM is not happy.

I think this is the best option. We could define a specific reason that alerts the UI to show a warning box, perhaps @spadgett or someone from his team can chime in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "alert" do you mean a warning box or a prometheus alert? I don't know that we need this information picked up by telemeter.

i meant a prometheus alert since those get presented to the admin on the console. i don't know that we care that it will be picked up by telemeter, but i do agree that it may be too big of a hammer in general. We can definitely start with a lighter touch (something on the upgrade UI page, and docs that tell the admin to check the status before starting an upgrade. I think we already have docs that tell admins to upgrade all their OLM operators before starting an upgrade)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a bit ugly for the EUS->EUS case? So every intermediate upgrade will need to stop, check the UI to see if there is anything that's showing the warning, then resume assuming nothing is wrong? Or could we potentially have something more explicit in the case of EUS->EUS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "EUS->EUS" upgrade, it's just a series of upgrades from 4.6->4.70->4.8->4.9->4.10.

So the EUS->EUS experience is already "make sure you're ready to upgrade. run an upgrade to the next minor. make sure it succeeded. make sure you're ready for the next tone. run the next minor. rinse/repeat until you get to 4.10".

So while this adds an extra step in what it means to make sure you're ready to upgrade, it's not fundamentally changing the fact that EUS->EUS is still a series of independent upgrades that have to be managed properly by an admin.

What we (as an org) have been asked to do in support of it is:

  1. do everything we can to ensure those upgrades will be successful/inform the admins when the upgrade may not be successful
  2. do everything we can to ensure those upgrades are smooth/fast were possible

@awgreene awgreene force-pushed the safe-upgrades branch 2 times, most recently from 6eb2419 to 89ed84d Compare January 21, 2021 17:11

## Proposal

The [ClusterVersionOperator](https://github.com/openshift/cluster-version-operator) provides OLM with the means to prevent Minor Semantic Version cluster upgrades by setting the `operator-lifecycle-manager` [ClusterOperator's](https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md) [upgradeable condition](https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#what-should-an-operator-report-with-clusteroperator-custom-resource) to `False`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: master links are unreliable, because things might be changed or moved. Probably best to pin to a specific commit, and also probably better to link the official API instead of CVO's dev docs. Like this.

Given that this is an OpenShift specific feature, OLM would rather not expand the Spec of the [ClusterServiceVersion (CSV)](https://olm.operatorframework.io/docs/concepts/crds/clusterserviceversion/), used on vanilla Kubernetes clusters, to include a `maxOpenShiftVersion` field. Instead, OLM will rely on the presence of an annotation on the CSVs to specify the max OpenShift Version that the operator may run on. This annotation can be added in one of two ways:

1. By defining the Max OpenShift Version as a property of the bundle using the [Declarative Index Config](https://github.com/operator-framework/enhancements/blob/1cf4a0363d918e810f638036539388622265d466/enhancements/declarative-index-config.md) (recommended approach)
2. By adding the annotation to the CSV Directly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only option for #357, where the compatibility with the next OpenShift version is less about the OLM-managed operator and more about that OLM-child's operands (in the case of #357, the kernel modules being installed)? I guess the special-resource operator would check version compat of the kernel modules it managed vs. some compiled-into-the-SRO knowledge of the kernel version that was likely coming with the next OCP minor, and then manage the annotation in its own CSV to pass the word on to OLM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the special-resource operator would check version compat of the kernel modules it managed vs. some compiled-into-the-SRO knowledge of the kernel version that was likely coming with the next OCP minor, and then manage the annotation in its own CSV to pass the word on to OLM?

Not sure if you're suggesting the operator updates its own CSV at runtime to change its maxVersion. If so, i don't think we had that in mind (not even sure if you can do that, or if OLM will stomp your attempted changes).

if you're just saying the operator team would set the annotation on their CSV prior to shipping the operator bundle, then yes, that's the expectation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're just saying the operator team would set the annotation on their CSV prior to shipping the operator bundle, then yes, that's the expectation.

Or the annotation could be set using the declarative index config.

| OpenShift Cluster Version | CSV's Max OpenShift Version | Is Upgradeable?|
| --------------------------- | --------------------------- | -------------- |
| 4.6.z | 4.8.0 | True |
| 4.7.z | 4.8.0 | True |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enhancement proposal claims to be asking for patch-level operators.coreos.com/maxOpenShiftVersion values above ("By allowing Major.Minor.Patch versions to be specified..."). But if you're running 4.7.30 (or whatever), your only 4.8 update targets might be 4.8.4 (or whatever). That would be over the claimed 4.8.0 max. If you want to keep the door open for patch-level specificity (because you don't trust the OpenShift core to preserve the required backwards compatibility?), you should probably give folks a way to say "I'm ok floating without claiming a particular patch will break me" like 4.8.*. If you're ok having everyone float, you can restrict the maxOpenShiftVersion to just {major}.{minor}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, @wking. Given we can't (won't) actually enforce the z-stream level, we should probably not allow it to be specified in the field.

if that changes someday, we can loosen the validation on the field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, above it's stated that this is meant to allow a future state where we'd allow inhibition of patch upgrades without changing existing annotations but that's only true if they chose an inflated value from the start, ie: 4.8.999999

While there have been past proposals for UpgradeableMicro or UpgradeablePatch to inhibit patch version upgrades those have been rejected because we always want to allow ourselves the ability to alter the logic for setting Upgradeable in a later patch version if necessary. If we ever inhibit patch versions we then pushing admins toward forcing an upgrade and we never want to encourage that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood and thank you for the information. I will update the PR to only allow {major}.{minor} declarations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest version of the enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark as resolved if applicable.


#### Cons

- This makes upgrade safety opt-in for each Operator, meaning that OLM cannot guarantee upgrade safety unless each installed operator specifies a maxOpenShiftVersion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And even then, these are declarations about compatibility with a version you are not yet running. Folks can get declarations wrong. Even if we test operator A on OpenShift B, and it works under testing, that does not guarantee it will work on B for every customer. Bugs happen. I'd soften "guarantee" -> "calculate" or something.


### Open Questions

- How will OLM rejecting an upgrade be surfaced in both the UI and the CLI?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be no need for anything special. The message OLM sets in its ClusterOperator will be picked up by the CVO and stored in ClusterVersion. And both oc adm upgrade ... and the web console expose ClusterVersion's Upgradeable condition today, and there's an alert for Upgradeable=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which scenario this bullet was aimed at, but there's not just "rejecting" updates, there's also the case where we are warning the admin that we can't be sure (because one or more operators doesn't make any declaration at all).

Copy link
Contributor Author

@awgreene awgreene Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording for clarity:

  • When a CSV is present that does not include a maxOpenShiftVersion annotation, how will this information be surfaced by the UI and CLI?


### Determine upgrade safety using OLM Managed Operator RBAC

In addition to giving Operator Authors the ability to provide the Max OpenShift Version, OLM could determine upgrade readiness based on default OpenShift GVKs and those required by installed Operators. At a high level, the following workflow could be implemented:
Copy link
Member

@wking wking Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this exists today, where operators which will remove a particular API in the next OpenShift minor will set Upgradeable=False if they see any current consumers of that API. If the only compat issue you're worried about is Kube-objects, maybe we don't need this enhancement? You would need this enhancement for non-Kube things like RHCOS-kernel compat (#357), so if you grow OLM-side Kube API checks in the future, there should be ways for OLM-child operators to opt-into declaring additional restrictions beyond the ones the Kube API checks turn up.

And there's also a chance that an OLM-child operator isn't using a particular, vulnerable Kube API right now, but will try to start using it once a user touches some knob. So there is a benefit to declaring a complete set of possible consumed APIs, instead of relying entirely on current-use detection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @wking!

Something like this exists today, where operators which will remove a particular API in the next OpenShift minor will set Upgradeable=False if they see any current consumers of that API. If the only compat issue you're worried about is Kube-objects, maybe we don't need this enhancement? You would need this enhancement for non-Kube things like RHCOS-kernel compat (#357), so if you grow OLM-side Kube API checks in the future, there should be ways for OLM-child operators to opt-into declaring additional restrictions beyond the ones the Kube API checks turn up.

The existing enhancement is focused on providing operator authors or index owners a way to specify a maximum OpenShift version per bundle and is required to prevent an cluster upgrade form disabling a service provided by an operator. Based on the frequency of conversations on this topic with OLM customers this feature is highly requested.

The possible expansion of this feature to have OLM to inspect RBAC required by installed operators is simply a nice to have that allows OLM to derive when a cluster upgrade may disable services offered by operators due to the deprecation of GVKs required by the operators.

And there's also a chance that an OLM-child operator isn't using a particular, vulnerable Kube API right now, but will try to start using it once a user touches some knob. So there is a benefit to declaring a complete set of possible consumed APIs, instead of relying entirely on current-use detection.

In general, the RBAC required by an operator is declared in the permission and clusterPermission fields of its CSV. Its possible that a knob exists which causes the operator to perform privilege escalation if it has role/rolebinding rbac, but that seems like an abusive usecase.

### Goals

- Prevent Minor OpenShift Upgrades when an installed operator declares that it does not support the next ocp minor version
- Warn the Cluster Admin when an installed operator does not explicitly declare support for the next ocp minor version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like we should also prevent installation of operators that specify a maxKube/OCP version that is lower than the current cluster version. Can we add that as a goal here? (similar to what minKubeVersion does)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do that though? I understand the reasoning of minkubeversion because it is easy to be explicit about, but I could imagine a world where an operator is just not maintained anymore and the maxkubeversion isn't accurate but a user really wants to install it, and we do not have a way today of overriding a constraint like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not have a way today of overriding a constraint like that.

i think that's the actual problem, right? We need to start introducing ways for admins to override these blocking requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enhancement is focused on preventing unsafe cluster upgrades, not preventing installation of operators onto a OpenShift version that is high than their defined max OpenShift version. I would rather not dilute this enhancement by expanding its scope. If this feature is needed, could we create an enhancement that focuses on expanding the resolver to include it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enhancement is focused on preventing unsafe cluster upgrades, not preventing installation of operators onto a OpenShift version that is high than their defined max OpenShift version.

I have mixed feelings. I agree that's what this EP is focused on, but i wonder if the EP should be focused on "adding a maxocpversion metadata" and all that implies (blocking upgrades, blocking operator installs).

I'm generally happy with iterative development and i don't want to make it harder to get this landed than it needs to be, but it feels weird/confusing to introduce a "maxocp/kubeversion" field for operator authors to specify that doesn't actually enforce that restriction on install, especially when that is exactly what the existing minKubeVersion api field does. It's also slightly problematic in my mind to add future restrictions based on that field after operator teams start adopting it (we generally try to avoid changing the behavior of apis).

Net being:

  1. it can certainly be a separate EP
  2. but we need to think long and hard about what it means if we only land the implementation of one of those EPs and not the other, when we introduce the api.

From an implementation perspective, am I naive to think it's pretty trivial to add this constraint to the resolver? Adding the ability for admins to override the constraint i'm sure is harder, but i don't think we need that on day 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right to insinuate that implementing the constraint is trivial, but the override is a much more hotly debated and nebulous thing that will require significantly more work. I think you do have a good point that introducing this without actually blocking is going to be an unexpected experience for operator devs though.

IMO I think it's reasonable to include that constraint in this enhancement proposal and deal with how to override it in a future enhancement. We aren't actually going to see too many folks with an explicit need to override this until at least one or two releases after we even ship a feature like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't actually going to see too many folks with an explicit need to override this until at least one or two releases after we even ship a feature like this.

yeah, that was my thinking in saying it'd be ok to have the constraint w/o a way to override, initially.

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

- OpenShift users might be upset when a Minor OpenShift Upgrade is block by OLM if they are unfamiliar with this feature.
- There are instances where an operator might not be supported on the targeted version that the operator is being upgraded to. If the customer is unable to remove the existing operator, this may place them in a position where they must choose between upgrading the cluster or removing the operator. This is arguably better than upgrading the cluster only to find out that a core service is no longer available on cluster, but I suspect that multiple tickets will be opened against OLM and the operators it installs when the customer finds themselves unable to upgrade.

## Possible Future Work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another bit of future work that's at least related to this is providing the ability to override the install requirement.


### Open Questions

- When a CSV is present that does not include a `maxOpenShiftVersion` annotation, how will this information be surfaced by the UI and CLI?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the UI team been looped into this so they know work is coming for them?
cc @spadgett

@bparees
Copy link
Contributor

bparees commented Feb 4, 2021

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2021
@kevinrizza
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2021
@sdodson
Copy link
Member

sdodson commented Feb 4, 2021

/approve
This looks good and meets the needs I have for EUS, thank you all.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

This commit introduces a proposal that enables OLM managed operators to
specify a maximum OpenShift version that they support. OLM can then use
this information to prevent minor OpenShift upgrades when an installed
operator will not run on the next minor OpenShift version. Additionally,
OLM can use this information so it does not install an operator on an
unsupported OpenShift version.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2021
- The steps that OLM will take to determine upgrade safety based on the collection of installed operators.
- The steps that OLM will take to prevent the operator from being installed on OpenShift clusters whose version is greater than the `maxOpenShiftVersion` specified by the operator.

### Allowing Operator Authors to Define a Maximum OpenShift Version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I specify both minKubeversionand maxOpenShiftVersion? If yes, what happens if minKubeversion is v0.19 and maxOpenShiftVersion is v4.6? I suggest you call out the various permutations / combinations and the expected result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're independent constraints, I would expect both of them must be met to install, along with any other constraints.

@kevinrizza
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2021
@openshift-merge-robot openshift-merge-robot merged commit 795d06f into openshift:master Feb 5, 2021
@@ -0,0 +1,284 @@
---
title: Allow OLM Managed Operators to Specify a Max OpenShift Version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @awgreene,

I know that this EP is merged already. However, it only came to my attention now.
I raised some concerns/questions over it in the channel which might be valid we address here.

  1. We have the OCP label LABEL com.redhat.openshift.versions: v4.5-v4.7 that can be used in the image index already. What happens if a user adds the label com.redhat.openshift.versions in the image and the maxOCPversion annotation in the CSV? Also, as described in the https://github.com/openshift/enhancements/pull/592/files#r570641713 how will we ensure that the value in the annotation is compatible with minKubeversion?
  • Should not CVP verify if it?
  • Should not this annotation and its compatibility with minKubeversion be validated via the operator-framework/api and then, in this way, it is checked via operator-sdk bundle validate and etc?
  1. Could also we add the annotation for maxKubeVersion in order to make it compatible? Also, would not make sense deprecated the attr minKubeVersion and add it via annotation as well? And then, we have a minOCPVersion too?

  2. Would be possible to make this annotation be generic and valid for any vendor instead of being specific for openshift?

c/c @kevinrizza @dmesser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @camilamacedo86 - thanks for the review!

We have the OCP label LABEL com.redhat.openshift.versions: v4.5-v4.7 that can be used in the image index already. What happens if a user adds the label com.redhat.openshift.versions in the image and the maxOCPversion annotation in the CSV?

Can you elaborate on this label and its impact? I believe that @gallettilance and I discussed this earlier and I had some concerns with overloading the existing field.

Also, as described in the https://github.com/openshift/enhancements/pull/592/files#r570641713 how will we ensure that the value in the annotation is compatible with minKubeversion? Should not CVP verify if it?

There is no guarantee that these annotation will not conflict. If either of these constraints are not met the operator will not be installed.

Should not this annotation and its compatibility with minKubeversion be validated via the operator-framework/api and then, in this way, it is checked via operator-sdk bundle validate and etc?

Again, these are independent constraints. Also, if this approach was pursued, are you proposing that the validation library hardcodes the relationship between OCP and K8s version?

Could also we add the annotation for maxKubeVersion in order to make it compatible?

This could certainly be added.

Also, would not make sense deprecated the attr minKubeVersion and add it via annotation as well? And then, we have a minOCPVersion too?

We could remove this field from existing CRDs but I do not believe that it is worth doing so until we release a new version of the CRD as existing CRs would need to be migrated or OLM would need to ship a conversion webhook.

Would be possible to make this annotation be generic and valid for any vendor instead of being specific for openshift?

While I agree with this in principle, I am not sure how users could provide specific versions for different Kubernetes variations without unique annotations. For example, how could an Operator Author generically specify that their operator's maxVersion is 4.7 for openshift but 1.21 for vanilla k8s?

Copy link

@camilamacedo86 camilamacedo86 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @awgreene,

Sorry, if I could not clarifies. My mainly questions/concerns are about:

Is it aligned with what exists already or are we planning to change what exists to be aligned with?

Otherwise, how it works and how it should be used can end up bring confusion and not to be intuitive to the end-users and us indeed. (e.g skipRange (runtime check) which does not work exactly as skips (remove from the index catalogue) )

How users will know that they did not make something wrong? How we will know what should be the expected behaviour? How can we prevent an operator be published with an OCP label and max version or minKubeversioon that is not compatible and will never be installed?

Example A:

  • LABEL com.redhat.openshift.versions: v4.5-v4.7
  • maxOCPversion=4.6

Will the operator be installed or not in 4.7?

Example B:

  • LABEL com.redhat.openshift.versions: v4.5-v4.7
  • maxOCPversion=4.8

Will the operator be installed in 4.8 or not?

  • And then, will the maxOCPversion prevent the operator be in the index catalogue as it is done for skips or will it works as skipRange (runtime check)?
  • How the users will know that they added an invalid value for this field before publish and check that the operator is not installed? Could we prevent this scenario? Would be possible to add a check to the bundle validate for that? E.g ensure that the value in maxOCPVersion is in the range of com.redhat.openshift.versions OR raise a warning that the com.redhat.openshift.versions will be overwritten by maxOCPVersion.

Could we provide a solution that is valid for upstream and downstream?

Do we really need to create a specific annotation for OCP or could we have for example maxVendorVersion and compare that version with the vendor version one? I am not sure if it would be possible, however, did we check if we could use k8s api to get the vendor version and do the check no matter the vendor kind or this information cannot be obtained by k8s api?

WDYT? Could I clarify better my concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 the label decides which catalogs the bundle will be in. the min/maxVersion field decides which clusters it can be installed on.

They are related, but independent. There is nothing invalid about saying "put this bundle in the v4.8 catalog" and then saying "this bundle can't be installed on v4.8". It's a poor choice by the operator team, but there's no need to explicitly forbid it. (Among other things, a cluster admin could still reference the v4.8 catalog from another cluster version which might be able to install the operator. Also someday we're going to have the ability for admins to force the installation despite requirements not being met).

Do we really need to create a specific annotation for OCP or could we have for example maxVendorVersion and compare that version with the vendor version one? I am not sure if it would be possible, however, did we check if we could use k8s api to get the vendor version and do the check no matter the vendor kind or this information cannot be obtained by k8s api?

what would operator teams set "maxVendorVersion" to? Openshift has both an OCP version and a kube version. Both versions can be retrieved and compared.

Copy link

@camilamacedo86 camilamacedo86 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarifications. Fell free to close this one :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.