-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/cvo/sync_worker: Make expected/actual version mismatch fatal #431
pkg/cvo/sync_worker: Make expected/actual version mismatch fatal #431
Conversation
If, for example, the configured ClusterVersion spec.upstream advertised a given image as 4.0.1, but the version baked into the release's own metadata was 1.0.0-abc, report VerifyPayloadVersionFailed and continue to apply the previous release image, instead of optimistically moving on to apply the new release image. This protects users from compromised or man-in-the-middled upstreams who attempt downgrade and similar attacks by misrepresenting a recommended update. Addressing a FIXME from 58356de (*: Port from Update to Release for ClusterVersion status, 2020-07-24, openshift#419).
c53b9b2
to
9be6175
Compare
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
We had used InsecureEdgeTerminationPolicyAllow since the route landed in 1fdf865 (Create a route for Cincinnati service, 2020-05-01, commit message, but from discussion in the GitHub pull request [1], it was: * InsecureEdgeTerminationPolicyAllow is the default termination policy. * Cincinnati's docs have no preference [2]. However, we really, really want HTTPS security for cluster-version operators making upstream requests for update recommendations. There are long-term plans for tightening down guards against malicious, compromised, or man-in-the-middled update recommendation services, but today we have yet to land even guards as basic as "upstream is lying about the version string associated with a given release image" [3]. By removing HTTP termination [4], we force consumers to configure their clients, including the cluster-version operator, with https:// URIs (or do something else explicit like setting up their own HTTP termination) before they can access the policy-engine output, which reduces the risk that they will recieve and trust compromised update graphs. This may be a breaking change, but: * We're still in beta, and not yet in general-availability with backwards-compatability requirements. * Folks who have configured their cluster-version operators and other clients with http:// upstreams should *want* to be broken. We are protecting them from all sorts of compromised-upstream failure modes. * The cluster-version operator, and other well-behaved clients, will report understandable error messages for "I tried to connect over HTTP and there was nobody there", which will lead users into auditing and fixing their upstream URIs, so recovering from the breakage should not be to onerous. [1]: openshift#30 (comment) [2]: https://github.com/openshift/cincinnati/blame/0bb5f6f3228858f9e5d1807bd6f45f46e537cdea/docs/user/running-cincinnati.md#L87-L88 [3]: openshift/cluster-version-operator#431 [4]: https://github.com/openshift/api/blob/346618ed7d5e6396191efe6f10b2c36f1e95d8b7/route/v1/types.go#L258-L259
We had used InsecureEdgeTerminationPolicyAllow since the route landed in 1fdf865 (Create a route for Cincinnati service, 2020-05-01, openshift#30). The motivation for that value didn't make it into the Git commit message, but from discussion in the GitHub pull request [1], it was: * InsecureEdgeTerminationPolicyAllow is the default termination policy. * Cincinnati's docs have no preference [2]. However, we really, really want HTTPS security for cluster-version operators making upstream requests for update recommendations. There are long-term plans for tightening down guards against malicious, compromised, or man-in-the-middled update recommendation services, but today we have yet to land even guards as basic as "upstream is lying about the version string associated with a given release image" [3]. By removing HTTP termination [4], we force consumers to configure their clients, including the cluster-version operator, with https:// URIs (or do something else explicit like setting up their own HTTP termination) before they can access the policy-engine output, which reduces the risk that they will recieve and trust compromised update graphs. This may be a breaking change, but: * We're still in beta, and not yet in general-availability with backwards-compatability requirements. * Folks who have configured their cluster-version operators and other clients with http:// upstreams should *want* to be broken. We are protecting them from all sorts of compromised-upstream failure modes. * The cluster-version operator, and other well-behaved clients, will report understandable error messages for "I tried to connect over HTTP and there was nobody there", which will lead users into auditing and fixing their upstream URIs, so recovering from the breakage should not be to onerous. [1]: openshift#30 (comment) [2]: https://github.com/openshift/cincinnati/blame/0bb5f6f3228858f9e5d1807bd6f45f46e537cdea/docs/user/running-cincinnati.md#L87-L88 [3]: openshift/cluster-version-operator#431 [4]: https://github.com/openshift/api/blob/346618ed7d5e6396191efe6f10b2c36f1e95d8b7/route/v1/types.go#L258-L259
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
…mp scoping Godocs for Upgradeable [1]: Upgradeable indicates whether the component (operator and all configured operands) is safe to upgrade based on the current cluster state. When Upgradeable is False, the cluster-version operator will prevent the cluster from performing impacted updates unless forced. When set on ClusterVersion, the message will explain which updates (minor or patch) are impacted. When set on ClusterOperator, False will block minor OpenShift updates. The message field should contain a human readable description of what the administrator should do to allow the cluster or component to successfully update. The cluster-version operator will allow updates when this condition is not False, including when it is missing, True, or Unknown. So we specifically doc it as only about 4.y -> 4.(y+1) minor updates when seen on ClusterOperator. But we leave it unclear on ClusterVersion because when you set some ClusterVersion overrides, it can break patch updates, so QE asked us to also block patch updates in that case [2,3]. With this patch, I'm using availableUpdates and conditionalUpdates to look up a version associated with the proposed target release pullspec. That's a bit less reliable than the current cluster-version operator behavior, which is extracting the proposed target version from the proposed release image itself (e.g. see [4]). But it's probably sufficient for now, with the odds that the OpenShift Update Service serves bad data low. And we can refine further in the future if we want. [1]: https://github.com/openshift/api/blob/cce310ad2932f6de24491052d506926e484c082c/config/v1/types_cluster_operator.go#L179-L190 : [2]: openshift/cluster-version-operator#364 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844 [4]: openshift/cluster-version-operator#431
If, for example, the configured ClusterVersion
spec.upstream
advertised a given image as 4.0.1, but the version baked into the release's own metadata was 1.0.0-abc, report VerifyPayloadVersionFailed and continue to apply the previous release image, instead of optimistically moving on to apply the new release image. This protects users from compromised or man-in-the-middled upstreams who attempt downgrade and similar attacks by misrepresenting a recommended update.Addressing a FIXME from 58356de (#419).