-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1822844: Block z level upgrades when ClusterVersionOverridesSet is set #364
Conversation
@jottofar: This pull request references Bugzilla bug 1822844, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
be0728a
to
27cea86
Compare
/test e2e-aws-upgrade |
/test e2e-aws |
/retest |
/retest |
/test e2e-aws |
a27e4e2
to
f63f3a2
Compare
/test e2e-aws |
5fddab0
to
0079459
Compare
Looks good to me, but unit failed, including:
Possibly needs some CI updates for the |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@jottofar: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
@jottofar: All pull requests linked via external trackers have merged: openshift/cluster-version-operator#364. Bugzilla bug 1822844 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Address a bug introduced by cc1921d (pkg/start: Release leader lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the Operator.Run context would leave the operator with no time to attempt the final sync [1]: E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled With this commit, I'm piping through shutdownContext, which gets a two-minute grace period beyond runContext, to give the operator time to push out that final status (which may include important information like the fact that the incoming release image has completed verification). --- This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final status synchronization, 2021-01-19, openshift#517) back to 4.5. It's not a clean pick, because we're missing changes like: * b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364). * 1d1de3b (Use context to add timeout to cincinnati HTTP request, 2019-01-15, openshift#410). which also touched these lines. But we've gotten this far without backporting rhbz#1822844, and openshift#410 was never associated with a bug in the first place, so instead of pulling back more of 4.6 to get a clean pick, I've just manually reconciled the pick conflicts. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
Address a bug introduced by cc1921d (pkg/start: Release leader lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the Operator.Run context would leave the operator with no time to attempt the final sync [1]: E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled With this commit, I'm piping through shutdownContext, which gets a two-minute grace period beyond runContext, to give the operator time to push out that final status (which may include important information like the fact that the incoming release image has completed verification). --- This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final status synchronization, 2021-01-19, openshift#517) back to 4.5. It's not a clean pick, because we're missing changes like: * b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364). * 1d1de3b (Use context to add timeout to cincinnati HTTP request, 2019-01-15, openshift#410). which also touched these lines. But we've gotten this far without backporting rhbz#1822844, and openshift#410 was never associated with a bug in the first place, so instead of pulling back more of 4.6 to get a clean pick, I've just manually reconciled the pick conflicts. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
Address a bug introduced by cc1921d (pkg/start: Release leader lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the Operator.Run context would leave the operator with no time to attempt the final sync [1]: E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled With this commit, I'm piping through shutdownContext, which gets a two-minute grace period beyond runContext, to give the operator time to push out that final status (which may include important information like the fact that the incoming release image has completed verification). --- This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final status synchronization, 2021-01-19, openshift#517) back to 4.5. It's not a clean pick, because we're missing changes like: * b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364). * 1d1de3b (Use context to add timeout to cincinnati HTTP request, 2019-01-15, openshift#410). which also touched these lines. But we've gotten this far without backporting rhbz#1822844, and openshift#410 was never associated with a bug in the first place, so instead of pulling back more of 4.6 to get a clean pick, I've just manually reconciled the pick conflicts. Removing Start from pkg/start (again) fixes a buggy re-introduction in the manually-backported 20421b6 (*: Add lots of Context and options arguments, 2020-07-24, openshift#470). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
The argument landed in b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364) for use by Upgradeable.Run. But even then, that method opens by retrieving a (possibly cached) ClusterVersion resource from the configured lister, so there's no need to pass the explicit argument. We should save explicit inputs for things that need to be passed in from memory at call-time, and not use them for information that can be retrieved from precondition-creation-time callbacks. And even for things that need to come from memory at call time, we should be using ReleaseContext so we can add and remove properties without having to touch function signatures for precondition implementations that don't care about the properties we're touching. While I'm touching the Run call site, I replaced a context.TODO with a context.Background. As pointed out in the docs [1], Background is prefered for tests. [1]: https://pkg.go.dev/context#Background
The argument landed in b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364) for use by Upgradeable.Run. But even then, that method opens by retrieving a (possibly cached) ClusterVersion resource from the configured lister, so there's no need to pass the explicit argument. We should save explicit inputs for things that need to be passed in from memory at call-time, and not use them for information that can be retrieved from precondition-creation-time callbacks. And even for things that need to come from memory at call time, we should be using ReleaseContext so we can add and remove properties without having to touch function signatures for precondition implementations that don't care about the properties we're touching. While I'm touching the Run call site, I replaced a context.TODO with a context.Background. As pointed out in the docs [1], Background is prefered for tests. [1]: https://pkg.go.dev/context#Background
The argument landed in b72e843 (Bug 1822844: Block z level upgrades if ClusterVersionOverridesSet set, 2020-04-30, openshift#364) for use by Upgradeable.Run. But even then, that method opens by retrieving a (possibly cached) ClusterVersion resource from the configured lister, so there's no need to pass the explicit argument. We should save explicit inputs for things that need to be passed in from memory at call-time, and not use them for information that can be retrieved from precondition-creation-time callbacks. And even for things that need to come from memory at call time, we should be using ReleaseContext so we can add and remove properties without having to touch function signatures for precondition implementations that don't care about the properties we're touching. While I'm touching the Run call site, I replaced a context.TODO with a context.Background. As pointed out in the docs [1], Background is prefered for tests. [1]: https://pkg.go.dev/context#Background
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
CVO Upgradeable=False should block all upgrades including z level.