-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancements/update/targeted-update-edge-blocking: Propose a new enhancement #821
enhancements/update/targeted-update-edge-blocking: Propose a new enhancement #821
Conversation
95efba9
to
e664c70
Compare
e664c70
to
61c6c45
Compare
61c6c45
to
a034a69
Compare
@wking I'm unsure if we should add this to alternatives section or where you'd like to address it, however I think it'd be great to think through how we can enable future cluster agnostic graph viewers. I'm a bit worried that the suggested implementation relies a bit too much on client side merging of multiple graphs which would complicate our ability to visualize edges into |
a034a69
to
97c2bf1
Compare
97c2bf1
to
66ecf6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a few open threads with questions but I think this is probably ready to have WIP removed. Would like to see affirmative support from other CVO maintainers that the level of complexity in the CVO is necessary.
I don't find the suggested CVO changes overly complex. Seems to me a reasonable outgrowth of what we already have in CVO. |
f853d4a
to
fe5073a
Compare
We need a section for what it means for a disconnected cluster (as they might not have telemetry), when we release a cincinnati-graph-data image for OSUS operator setup. |
fe5073a
to
beeee98
Compare
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment)
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. * The server-side apply annotations documented in [8]: listType, listMapKey, patchMergeStrategy, and patchMergeKey. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment) [8]: https://kubernetes.io/docs/reference/using-api/server-side-apply/
…anded Following up on openshift#821 with the changes made in order to land openshift/api#1011.
…anded Following up on openshift#821 with the changes made in order to land openshift/api#1011. The graph-data properties remain optional, because graph-data has been used for unconditional blocks that do not declare risks today. The update service will continue to read those blocks, but will implement them by pruning 'edges'. So things that end up in 'conditionalEdges' will still need all the metadata populated to keep David happy ;).
So we can explain why we're blocking the different edges [1] (the promql -> PromQL type change is in flight with [2]). The zz in the filename for the vSphere hostname block ensures that one sorts last alphabetically, because it's the broadest block, and legacy Cincinnati services will prefer the final regular expression they load for a given 'to' target. This is basically a second attempt at my earlier 39bc2fb (blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0, 2021-09-01, openshift#1056), which ended up getting reverted in da1254a (Revert "blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0", 2021-09-21, openshift#1078) because the production Cincinnati was mad about the 1.1.0 version string. [3] taught Cincinnati to relax, and now that's live (and we never shipped any versions that would be mad about 1.1.0 to customers, the 4.6.0 Update Service operator went out before [4]). [1]: openshift/enhancements#821 [2]: openshift/enhancements#910 [3]: openshift/cincinnati#538 [4]: openshift/cincinnati#314
To keep us from forgetting to set 'url', etc. [1,2]. We'll probably want to have PromQL parser validation later [3], but I write Python a lot faster than I write Go. This should avoid obvious mistakes while the PromQL parser gets wired up to a validator. [1]: openshift/enhancements#821 [2]: openshift/enhancements#910 [3]: https://github.com/openshift/enhancements/blob/4412b51182ba6e007f1308a73e290e81fd66d092/enhancements/update/targeted-update-edge-blocking.md#promql-validation
…anded Following up on openshift#821 with the changes made in order to land openshift/api#1011. The graph-data properties remain optional, because graph-data has been used for unconditional blocks that do not declare risks today. The update service will continue to read those blocks, but will implement them by pruning 'edges'. So things that end up in 'conditionalEdges' will still need all the metadata populated to keep David happy ;).
So we can explain why we're blocking the different edges [1] (the promql -> PromQL type change is in flight with [2]). The zz in the filename for the vSphere hostname block ensures that one sorts last alphabetically, because it's the broadest block, and legacy Cincinnati services will prefer the final regular expression they load for a given 'to' target. This is basically a second attempt at my earlier 39bc2fb (blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0, 2021-09-01, openshift#1056), which ended up getting reverted in da1254a (Revert "blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0", 2021-09-21, openshift#1078) because the production Cincinnati was mad about the 1.1.0 version string. [3] taught Cincinnati to relax, and now that's live (and we never shipped any versions that would be mad about 1.1.0 to customers, the 4.6.0 Update Service operator went out before [4]). [1]: openshift/enhancements#821 [2]: openshift/enhancements#910 [3]: openshift/cincinnati#538 [4]: openshift/cincinnati#314
So we can explain why we're blocking the different edges [1] (the promql -> PromQL type change is in flight with [2]). The zz in the filename for the vSphere hostname block ensures that one sorts last alphabetically, because it's the broadest block, and legacy Cincinnati services will prefer the final regular expression they load for a given 'to' target. This is basically a second attempt at my earlier 39bc2fb (blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0, 2021-09-01, openshift#1056), which ended up getting reverted in da1254a (Revert "blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0", 2021-09-21, openshift#1078) because the production Cincinnati was mad about the 1.1.0 version string. [3] taught Cincinnati to relax, and now that's live (and we never shipped any versions that would be mad about 1.1.0 to customers, the 4.6.0 Update Service operator went out before [4]). [1]: openshift/enhancements#821 [2]: openshift/enhancements#910 [3]: openshift/cincinnati#538 [4]: openshift/cincinnati#314
Implementing targeted edge blocking as described in [1]. Except: * Some properties which are optional in the enhancement are required here. I think this is strictly worse handling for the cluster-version operator, because we are just getting this data from the upstream update service's Cincinnati JSON, and if the upstream says "here's a risk, but I am not linking a URI", and the CVO subsequently rejects the entire update payload as invalid, then downstream consumers are not hearing about the reason/message/matchingRules that the upstream may have been passing along. But I'm not an API approver, and David is asking for the +required in [2,3,4,5]. I dunno why he isn't finding my argument's convincing, but I'm out of time to continue arguing my position. * Similarly, the ClusterCondition.Type property has an enum, because David requires it [6]. This means that the CVO will need to self-censor if there are unrecognized types (because we don't want our ClusterVersion writes rejected on API-server validation failures). That means that instead of saying "... and there was another rule $UNRECOGNIZED_TYPE that I don't know how to evaluate", the CVO will have to silently drop that rule. And if that's the only rule, the CVO will have to silently drop that risk and conditional update. David is fine with that, even if I'm not on board, and he's the API approver, so that's what we'll do. I've also upcased the types to Always and PromQL; from David: Always and PromQL. We use CamelCase for enumerated values * history[].overrides became history[].acceptedRisks to satisfy David's concerns about misinterpretation [7]. * The server-side apply annotations documented in [8]: listType, listMapKey, patchMergeStrategy, and patchMergeKey. Swagger, deepcopy, and CRD changes were generated with: $ make update [1]: openshift/enhancements#821 [2]: openshift#1011 (comment) [3]: openshift#1011 (comment) [4]: openshift#1011 (comment) [5]: openshift#1011 (comment) [6]: openshift#1011 (comment) [7]: openshift#1011 (comment) [8]: https://kubernetes.io/docs/reference/using-api/server-side-apply/
…endations" Users are occasionally confused by the "blocked edges" wording, which is internal shorthand. The fact that update recommendations are tracked as a directed graph (hence "edges") isn't all that relevant for most users, so talk about "updates" instead of "edges". While we occasionally drop update recommendations to reduce the chances that users update into serious issues, from [1]: When an update recommendation is supported, it remains supported for the life of 4.10, even if the update recommendation is later dropped or made conditional. Updating is still possible, we just don't think it's a good idea (unless maybe you're elbow deep in a support case, and a support tech decides to recommend an update based on context that is not available to the update service). So replace "blocked" with "not recommended", to convey the "still possible" aspect. Also ask for some more specifics in the impact statement template, as we gear up to support conditional updates fff9d3c (enhancements/update/targeted-update-edge-blocking: Propose a new enhancement, 2020-08-07, openshift#821). I haven't gone as far as "tell us what PromQL to use in our conditional recommendation", but I'm easing things in that direction. [1]: https://docs.openshift.com/container-platform/4.10/updating/understanding-upgrade-channels-release.html#upgrade-version-paths_understanding-upgrade-channels-releases
Because he approved the original pull request creating the directory [1]. [1]: openshift#821 (comment)
Because he approved the original pull request creating the directory [1]. [1]: openshift#821 (comment)
…endations" Users are occasionally confused by the "blocked edges" wording, which is internal shorthand. The fact that update recommendations are tracked as a directed graph (hence "edges") isn't all that relevant for most users, so talk about "updates" instead of "edges". While we occasionally drop update recommendations to reduce the chances that users update into serious issues, from [1]: When an update recommendation is supported, it remains supported for the life of 4.10, even if the update recommendation is later dropped or made conditional. Updating is still possible, we just don't think it's a good idea (unless maybe you're elbow deep in a support case, and a support tech decides to recommend an update based on context that is not available to the update service). So replace "blocked" with "not recommended", to convey the "still possible" aspect. Also ask for some more specifics in the impact statement template, as we gear up to support conditional updates fff9d3c (enhancements/update/targeted-update-edge-blocking: Propose a new enhancement, 2020-08-07, openshift#821). I haven't gone as far as "tell us what PromQL to use in our conditional recommendation", but I'm easing things in that direction. [1]: https://docs.openshift.com/container-platform/4.10/updating/understanding-upgrade-channels-release.html#upgrade-version-paths_understanding-upgrade-channels-releases
…endations" Users are occasionally confused by the "blocked edges" wording, which is internal shorthand. The fact that update recommendations are tracked as a directed graph (hence "edges") isn't all that relevant for most users, so talk about "updates" instead of "edges". While we occasionally drop update recommendations to reduce the chances that users update into serious issues, from [1]: When an update recommendation is supported, it remains supported for the life of 4.10, even if the update recommendation is later dropped or made conditional. Updating is still possible, we just don't think it's a good idea (unless maybe you're elbow deep in a support case, and a support tech decides to recommend an update based on context that is not available to the update service). So replace "blocked" with "not recommended", to convey the "still possible" aspect. Also ask for some more specifics in the impact statement template, as we gear up to support conditional updates fff9d3c (enhancements/update/targeted-update-edge-blocking: Propose a new enhancement, 2020-08-07, openshift#821). [1]: https://docs.openshift.com/container-platform/4.10/updating/understanding-upgrade-channels-release.html#upgrade-version-paths_understanding-upgrade-channels-releases
This enhancement proposes a mechanism for blocking edges for the subset of clusters considered vulnerable to known issues with a particular update or target release.
This is the third iteration of this proposal, replacing #426 (update-service-side decisions based on Telemetry queries) and #737 (update-service-side decisions based on client-supplied query parameters). I'm now recommending cluster-side decisions, with the update service declaring conditional edges and telling the cluster-side tools how to decide if the cluster is vulnerable to the known bugs impacting its update decisions.