Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/cvo/sync_worker: Make expected/actual version mismatch fatal #431

Conversation

wking
Copy link
Member

@wking wking commented Aug 9, 2020

If, for example, the configured ClusterVersion spec.upstream advertised a given image as 4.0.1, but the version baked into the release's own metadata was 1.0.0-abc, report VerifyPayloadVersionFailed and continue to apply the previous release image, instead of optimistically moving on to apply the new release image. This protects users from compromised or man-in-the-middled upstreams who attempt downgrade and similar attacks by misrepresenting a recommended update.

Addressing a FIXME from 58356de (#419).

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2020
If, for example, the configured ClusterVersion spec.upstream
advertised a given image as 4.0.1, but the version baked into the
release's own metadata was 1.0.0-abc, report
VerifyPayloadVersionFailed and continue to apply the previous release
image, instead of optimistically moving on to apply the new release
image.  This protects users from compromised or man-in-the-middled
upstreams who attempt downgrade and similar attacks by misrepresenting
a recommended update.

Addressing a FIXME from 58356de (*: Port from Update to Release for
ClusterVersion status, 2020-07-24, openshift#419).
@wking wking force-pushed the fatal-cincinnati-version-mismatch branch from c53b9b2 to 9be6175 Compare August 9, 2020 20:46
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e 9be6175 link /test e2e
ci/prow/e2e-upgrade 9be6175 link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking added a commit to wking/cincinnati-operator that referenced this pull request Sep 15, 2020
We had used InsecureEdgeTerminationPolicyAllow since the route landed
in 1fdf865 (Create a route for Cincinnati service, 2020-05-01,
commit message, but from discussion in the GitHub pull request [1], it
was:

* InsecureEdgeTerminationPolicyAllow is the default termination
  policy.
* Cincinnati's docs have no preference [2].

However, we really, really want HTTPS security for cluster-version
operators making upstream requests for update recommendations.  There
are long-term plans for tightening down guards against malicious,
compromised, or man-in-the-middled update recommendation services, but
today we have yet to land even guards as basic as "upstream is lying
about the version string associated with a given release image" [3].

By removing HTTP termination [4], we force consumers to configure
their clients, including the cluster-version operator, with https://
URIs (or do something else explicit like setting up their own HTTP
termination) before they can access the policy-engine output, which
reduces the risk that they will recieve and trust compromised update
graphs.  This may be a breaking change, but:

* We're still in beta, and not yet in general-availability with
  backwards-compatability requirements.
* Folks who have configured their cluster-version operators and other
  clients with http:// upstreams should *want* to be broken.  We are
  protecting them from all sorts of compromised-upstream failure
  modes.
* The cluster-version operator, and other well-behaved clients, will
  report understandable error messages for "I tried to connect over
  HTTP and there was nobody there", which will lead users into
  auditing and fixing their upstream URIs, so recovering from the
  breakage should not be to onerous.

[1]: openshift#30 (comment)
[2]: https://github.com/openshift/cincinnati/blame/0bb5f6f3228858f9e5d1807bd6f45f46e537cdea/docs/user/running-cincinnati.md#L87-L88
[3]: openshift/cluster-version-operator#431
[4]: https://github.com/openshift/api/blob/346618ed7d5e6396191efe6f10b2c36f1e95d8b7/route/v1/types.go#L258-L259
wking added a commit to wking/cincinnati-operator that referenced this pull request Sep 15, 2020
We had used InsecureEdgeTerminationPolicyAllow since the route landed
in 1fdf865 (Create a route for Cincinnati service, 2020-05-01, openshift#30).
The motivation for that value didn't make it into the Git commit
message, but from discussion in the GitHub pull request [1], it was:

* InsecureEdgeTerminationPolicyAllow is the default termination
  policy.
* Cincinnati's docs have no preference [2].

However, we really, really want HTTPS security for cluster-version
operators making upstream requests for update recommendations.  There
are long-term plans for tightening down guards against malicious,
compromised, or man-in-the-middled update recommendation services, but
today we have yet to land even guards as basic as "upstream is lying
about the version string associated with a given release image" [3].

By removing HTTP termination [4], we force consumers to configure
their clients, including the cluster-version operator, with https://
URIs (or do something else explicit like setting up their own HTTP
termination) before they can access the policy-engine output, which
reduces the risk that they will recieve and trust compromised update
graphs.  This may be a breaking change, but:

* We're still in beta, and not yet in general-availability with
  backwards-compatability requirements.
* Folks who have configured their cluster-version operators and other
  clients with http:// upstreams should *want* to be broken.  We are
  protecting them from all sorts of compromised-upstream failure
  modes.
* The cluster-version operator, and other well-behaved clients, will
  report understandable error messages for "I tried to connect over
  HTTP and there was nobody there", which will lead users into
  auditing and fixing their upstream URIs, so recovering from the
  breakage should not be to onerous.

[1]: openshift#30 (comment)
[2]: https://github.com/openshift/cincinnati/blame/0bb5f6f3228858f9e5d1807bd6f45f46e537cdea/docs/user/running-cincinnati.md#L87-L88
[3]: openshift/cluster-version-operator#431
[4]: https://github.com/openshift/api/blob/346618ed7d5e6396191efe6f10b2c36f1e95d8b7/route/v1/types.go#L258-L259
@openshift-merge-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-upgrade 9be6175 link /test e2e-agnostic-upgrade
ci/prow/e2e-agnostic 9be6175 link /test e2e-agnostic
ci/prow/e2e-agnostic-operator 9be6175 link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jottofar
Copy link
Contributor

jottofar commented Feb 3, 2021

/retest

@jottofar
Copy link
Contributor

jottofar commented Feb 3, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 3, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit ac702a5 into openshift:master Feb 9, 2021
@wking wking deleted the fatal-cincinnati-version-mismatch branch April 6, 2021 22:23
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
wking added a commit to wking/hypershift that referenced this pull request Mar 28, 2023
…mp scoping

Godocs for Upgradeable [1]:

  Upgradeable indicates whether the component (operator and all
  configured operands) is safe to upgrade based on the current cluster
  state. When Upgradeable is False, the cluster-version operator will
  prevent the cluster from performing impacted updates unless forced.
  When set on ClusterVersion, the message will explain which updates
  (minor or patch) are impacted. When set on ClusterOperator, False
  will block minor OpenShift updates. The message field should contain
  a human readable description of what the administrator should do to
  allow the cluster or component to successfully update. The
  cluster-version operator will allow updates when this condition is
  not False, including when it is missing, True, or Unknown.

So we specifically doc it as only about 4.y -> 4.(y+1) minor updates
when seen on ClusterOperator.  But we leave it unclear on
ClusterVersion because when you set some ClusterVersion overrides, it
can break patch updates, so QE asked us to also block patch updates in
that case [2,3].

With this patch, I'm using availableUpdates and conditionalUpdates to
look up a version associated with the proposed target release
pullspec.  That's a bit less reliable than the current cluster-version
operator behavior, which is extracting the proposed target version
from the proposed release image itself (e.g. see [4]).  But it's
probably sufficient for now, with the odds that the OpenShift Update
Service serves bad data low.  And we can refine further in the future
if we want.

[1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 :
[2]: openshift/cluster-version-operator#364
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
[4]: openshift/cluster-version-operator#431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants