Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1822844: Block z level upgrades when ClusterVersionOverridesSet is set #364

Merged
merged 1 commit into from
Jul 24, 2020

Conversation

jottofar
Copy link
Contributor

CVO Upgradeable=False should block all upgrades including z level.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 30, 2020
@openshift-ci-robot
Copy link
Contributor

@jottofar: This pull request references Bugzilla bug 1822844, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

WIP: Bug 1822844: Block z level upgrades

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2020
@jottofar jottofar force-pushed the bug-1822844 branch 2 times, most recently from be0728a to 27cea86 Compare April 30, 2020 20:20
@jottofar
Copy link
Contributor Author

jottofar commented May 1, 2020

/test e2e-aws-upgrade

@jottofar
Copy link
Contributor Author

jottofar commented May 6, 2020

/test e2e-aws

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2020
@jottofar
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor Author

jottofar commented Jun 1, 2020

/retest

@jottofar
Copy link
Contributor Author

jottofar commented Jun 1, 2020

/test e2e-aws

@jottofar jottofar force-pushed the bug-1822844 branch 4 times, most recently from a27e4e2 to f63f3a2 Compare June 1, 2020 21:25
@jottofar
Copy link
Contributor Author

jottofar commented Jun 2, 2020

/test e2e-aws

@jottofar jottofar force-pushed the bug-1822844 branch 2 times, most recently from 5fddab0 to 0079459 Compare June 2, 2020 13:44
@jottofar jottofar changed the title WIP: Bug 1822844: Block z level upgrades Bug 1822844: Block z level upgrades Jun 2, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2020
@wking
Copy link
Member

wking commented Jul 23, 2020

Looks good to me, but unit failed, including:

 --- FAIL: TestCVO_UpgradePreconditionFailing (0.00s)
    cvo_scenarios_test.go:1429: ([]testing.Action) (len=3 cap=3) {
         (testing.GetActionImpl) {
...

Possibly needs some CI updates for the Completed pivot like this one?

@jottofar
Copy link
Contributor Author

/retest

@wking
Copy link
Member

wking commented Jul 23, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 23, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jottofar
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 24, 2020

@jottofar: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upgrade a862696 link /test e2e-gcp-upgrade
ci/prow/e2e-gcp a862696 link /test e2e-gcp

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit b658b42 into openshift:master Jul 24, 2020
@openshift-ci-robot
Copy link
Contributor

@jottofar: All pull requests linked via external trackers have merged: openshift/cluster-version-operator#364. Bugzilla bug 1822844 has been moved to the MODIFIED state.

In response to this:

Bug 1822844: Block z level upgrades when ClusterVersionOverridesSet is set

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 20, 2021
Address a bug introduced by cc1921d (pkg/start: Release leader
lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the
Operator.Run context would leave the operator with no time to attempt
the final sync [1]:

  E0119 22:24:15.924216       1 cvo.go:344] unable to perform final sync: context canceled

With this commit, I'm piping through shutdownContext, which gets a
two-minute grace period beyond runContext, to give the operator time
to push out that final status (which may include important information
like the fact that the incoming release image has completed
verification).

---

This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final
status synchronization, 2021-01-19, openshift#517) back to 4.5.  It's not a
clean pick, because we're missing changes like:

* b72e843 (Bug 1822844: Block z level upgrades if
  ClusterVersionOverridesSet set, 2020-04-30, openshift#364).
* 1d1de3b (Use context to add timeout to cincinnati HTTP request,
  2019-01-15, openshift#410).

which also touched these lines.  But we've gotten this far without
backporting rhbz#1822844, and openshift#410 was never associated with a bug in
the first place, so instead of pulling back more of 4.6 to get a clean
pick, I've just manually reconciled the pick conflicts.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 23, 2021
Address a bug introduced by cc1921d (pkg/start: Release leader
lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the
Operator.Run context would leave the operator with no time to attempt
the final sync [1]:

  E0119 22:24:15.924216       1 cvo.go:344] unable to perform final sync: context canceled

With this commit, I'm piping through shutdownContext, which gets a
two-minute grace period beyond runContext, to give the operator time
to push out that final status (which may include important information
like the fact that the incoming release image has completed
verification).

---

This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final
status synchronization, 2021-01-19, openshift#517) back to 4.5.  It's not a
clean pick, because we're missing changes like:

* b72e843 (Bug 1822844: Block z level upgrades if
  ClusterVersionOverridesSet set, 2020-04-30, openshift#364).
* 1d1de3b (Use context to add timeout to cincinnati HTTP request,
  2019-01-15, openshift#410).

which also touched these lines.  But we've gotten this far without
backporting rhbz#1822844, and openshift#410 was never associated with a bug in
the first place, so instead of pulling back more of 4.6 to get a clean
pick, I've just manually reconciled the pick conflicts.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
wking added a commit to wking/cluster-version-operator that referenced this pull request Feb 23, 2021
Address a bug introduced by cc1921d (pkg/start: Release leader
lease on graceful shutdown, 2020-08-03, openshift#424), where canceling the
Operator.Run context would leave the operator with no time to attempt
the final sync [1]:

  E0119 22:24:15.924216       1 cvo.go:344] unable to perform final sync: context canceled

With this commit, I'm piping through shutdownContext, which gets a
two-minute grace period beyond runContext, to give the operator time
to push out that final status (which may include important information
like the fact that the incoming release image has completed
verification).

---

This commit picks c4ddf03 (pkg/cvo: Use shutdownContext for final
status synchronization, 2021-01-19, openshift#517) back to 4.5.  It's not a
clean pick, because we're missing changes like:

* b72e843 (Bug 1822844: Block z level upgrades if
  ClusterVersionOverridesSet set, 2020-04-30, openshift#364).
* 1d1de3b (Use context to add timeout to cincinnati HTTP request,
  2019-01-15, openshift#410).

which also touched these lines.  But we've gotten this far without
backporting rhbz#1822844, and openshift#410 was never associated with a bug in
the first place, so instead of pulling back more of 4.6 to get a clean
pick, I've just manually reconciled the pick conflicts.

Removing Start from pkg/start (again) fixes a buggy re-introduction in
the manually-backported 20421b6 (*: Add lots of Context and options
arguments, 2020-07-24, openshift#470).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1916384#c10
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 8, 2021
The argument landed in b72e843 (Bug 1822844: Block z level upgrades
if ClusterVersionOverridesSet set, 2020-04-30, openshift#364) for use by
Upgradeable.Run.  But even then, that method opens by retrieving a
(possibly cached) ClusterVersion resource from the configured lister,
so there's no need to pass the explicit argument.  We should save
explicit inputs for things that need to be passed in from memory at
call-time, and not use them for information that can be retrieved from
precondition-creation-time callbacks.  And even for things that need
to come from memory at call time, we should be using ReleaseContext so
we can add and remove properties without having to touch function
signatures for precondition implementations that don't care about the
properties we're touching.

While I'm touching the Run call site, I replaced a context.TODO with a
context.Background.  As pointed out in the docs [1], Background is
prefered for tests.

[1]: https://pkg.go.dev/context#Background
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 8, 2021
The argument landed in b72e843 (Bug 1822844: Block z level upgrades
if ClusterVersionOverridesSet set, 2020-04-30, openshift#364) for use by
Upgradeable.Run.  But even then, that method opens by retrieving a
(possibly cached) ClusterVersion resource from the configured lister,
so there's no need to pass the explicit argument.  We should save
explicit inputs for things that need to be passed in from memory at
call-time, and not use them for information that can be retrieved from
precondition-creation-time callbacks.  And even for things that need
to come from memory at call time, we should be using ReleaseContext so
we can add and remove properties without having to touch function
signatures for precondition implementations that don't care about the
properties we're touching.

While I'm touching the Run call site, I replaced a context.TODO with a
context.Background.  As pointed out in the docs [1], Background is
prefered for tests.

[1]: https://pkg.go.dev/context#Background
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 10, 2021
The argument landed in b72e843 (Bug 1822844: Block z level upgrades
if ClusterVersionOverridesSet set, 2020-04-30, openshift#364) for use by
Upgradeable.Run.  But even then, that method opens by retrieving a
(possibly cached) ClusterVersion resource from the configured lister,
so there's no need to pass the explicit argument.  We should save
explicit inputs for things that need to be passed in from memory at
call-time, and not use them for information that can be retrieved from
precondition-creation-time callbacks.  And even for things that need
to come from memory at call time, we should be using ReleaseContext so
we can add and remove properties without having to touch function
signatures for precondition implementations that don't care about the
properties we're touching.

While I'm touching the Run call site, I replaced a context.TODO with a
context.Background.  As pointed out in the docs [1], Background is
prefered for tests.

[1]: https://pkg.go.dev/context#Background
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants