From 5fbae31e5af64e66d040194cf83b4ab0dea1d450 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Wed, 1 Sep 2021 15:24:55 -0700 Subject: [PATCH] blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0 So we can explain why we're blocking the different edges [1] (the promql -> PromQL type change is in flight with [2]). The zz in the filename for the vSphere hostname block ensures that one sorts last alphabetically, because it's the broadest block, and legacy Cincinnati services will prefer the final regular expression they load for a given 'to' target. This is basically a second attempt at my earlier 39bc2fb686 (blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0, 2021-09-01, #1056), which ended up getting reverted in da1254a39f (Revert "blocked-edges/4.7.4*: Targeted edge blocking and version 1.1.0", 2021-09-21, #1078) because the production Cincinnati was mad about the 1.1.0 version string. [3] taught Cincinnati to relax, and now that's live (and we never shipped any versions that would be mad about 1.1.0 to customers, the 4.6.0 Update Service operator went out before [4]). [1]: https://github.com/openshift/enhancements/pull/821 [2]: https://github.com/openshift/enhancements/pull/910 [3]: https://github.com/openshift/cincinnati/pull/538 [4]: https://github.com/openshift/cincinnati/pull/314 --- README.md | 29 +++++++++++++++++-- blocked-edges/4.7.4-auth-connection-leak.yaml | 9 ++++++ ...4-vsphere-hw-17-cross-node-networking.yaml | 12 ++++++++ .../4.7.4-zz-vsphere-hostnames-changing.yaml | 12 ++++++++ blocked-edges/4.7.4.yaml | 5 ---- version | 2 +- 6 files changed, 61 insertions(+), 8 deletions(-) create mode 100644 blocked-edges/4.7.4-auth-connection-leak.yaml create mode 100644 blocked-edges/4.7.4-vsphere-hw-17-cross-node-networking.yaml create mode 100644 blocked-edges/4.7.4-zz-vsphere-hostnames-changing.yaml delete mode 100644 blocked-edges/4.7.4.yaml diff --git a/README.md b/README.md index be78dc35c..302f12f86 100644 --- a/README.md +++ b/README.md @@ -98,9 +98,27 @@ declaring that, while 4.1.18 and 4.1.20 are in `candidate-4.2`, they should not ### Block Edges Create/edit an appropriate file in `blocked_edges/`. -- `to` is the release which has the existing incoming edges. -- `from` is a regex for the previous release versions. + +* `to` (required, [string][json-string]) is the release which has the existing incoming edges. +* `from` (required, [string][json-string]) is a regex for the previous release versions. If you want to require `from` to match the full version string (and not just a substring), you must include explicit `^` and `$` anchors. + Release version strings will receive [the architecture-suffix](#release-names) before being compared to this regular expression. +* `url` (optional, [string][json-string]), with a URI documenting the blocking reason. + For example, this could link to a bug's impact statement or knowledge-base article. +* `name` (optional, [string][json-string]), with a CamelCase reason suitable for [a `ClusterOperatorStatusCondition` `reason` property][api-reason]. +* `message` (optional, [string][json-string]), with a human-oriented message describing the blocking reason, suitable for [a `ClusterOperatorStatusCondition` `message` property][api-message]. +* `matchingRules` (optional, [array][json-array]), defining conditions for deciding which clusters have the update recommended and which do not. + The array is ordered by decreasing precedence. + Consumers should walk the array in order. + For a given entry, if a condition type is unrecognized, or fails to evaluate, consumers should proceed to the next entry. + If a condition successfully evaluates (either as a match or as an explicit does-not-match), that result is used, and no further entries should be attempted. + If no condition can be successfully evaluated, the update should not be recommended. + Each entry must be an [object][json-object] with at least the following property: + + * `type` (required, [string][json-string]), defining the type in [the condition type registry][cluster-condition-type-registry]. + For example, `type: PromQL` identifies the condition as [the `promql` type][cluster-condition-type-registry-promql]. + + Additional properties for each entry are defined in [the cluster-condition type registry][cluster-condition-type-registry]. For example: to block all incoming edges to a release create a file such as `blocked-edges/4.2.11.yaml` containing: @@ -118,10 +136,17 @@ from: ^4\.1\.(18|20)[+].*$ The `[+].*` portion absorbs [the architecture-suffix](#release-names) from the release name that consumers will use for comparisons. +[api-message]: https://github.com/openshift/api/blob/67c28690af52a69e0b8fa565916fe1b9b7f52f10/config/v1/types_cluster_operator.go#L135-L139 +[api-reason]: https://github.com/openshift/api/blob/67c28690af52a69e0b8fa565916fe1b9b7f52f10/config/v1/types_cluster_operator.go#L131-L133 [channel-semantics]: https://docs.openshift.com/container-platform/4.3/updating/updating-cluster-between-minor.html#understanding-upgrade-channels_updating-cluster-between-minor [Cincinnati]: https://github.com/openshift/cincinnati/ +[cluster-condition-type-registry]: https://github.com/openshift/enhancements/pull/821#FIXME +[cluster-condition-type-registry-promql]: https://github.com/openshift/enhancements/pull/821#FIXME [image-arch]: https://github.com/opencontainers/image-spec/blame/v1.0.1/config.md#L103 [iso-8601-durations]: https://en.wikipedia.org/wiki/ISO_8601#Durations +[json-array]: https://datatracker.ietf.org/doc/html/rfc8259#section-5 +[json-object]: https://datatracker.ietf.org/doc/html/rfc8259#section-4 +[json-string]: https://datatracker.ietf.org/doc/html/rfc8259#section-7 [rfc-3339-p13]: https://tools.ietf.org/html/rfc3339#page-13 [semver]: https://semver.org/spec/v2.0.0.html [semver-build]: https://semver.org/spec/v2.0.0.html#spec-item-10 diff --git a/blocked-edges/4.7.4-auth-connection-leak.yaml b/blocked-edges/4.7.4-auth-connection-leak.yaml new file mode 100644 index 000000000..3bd12cf97 --- /dev/null +++ b/blocked-edges/4.7.4-auth-connection-leak.yaml @@ -0,0 +1,9 @@ +to: 4.7.4 +from: 4\.6\..* +url: https://bugzilla.redhat.com/show_bug.cgi?id=1941840#c33 +name: AuthOAuthProxyLeakedConnections +message: On clusters with a Proxy configured, the authentication operator may keep many oauth-server connections open, resulting in high memory consumption by the authentication operator and router pods. +matchingRules: +- type: PromQL + promql: + promql: max(cluster_proxy_enabled{type=~"https?"}) diff --git a/blocked-edges/4.7.4-vsphere-hw-17-cross-node-networking.yaml b/blocked-edges/4.7.4-vsphere-hw-17-cross-node-networking.yaml new file mode 100644 index 000000000..d8f34c078 --- /dev/null +++ b/blocked-edges/4.7.4-vsphere-hw-17-cross-node-networking.yaml @@ -0,0 +1,12 @@ +to: 4.7.4 +from: 4\.6\..* +url: https://access.redhat.com/solutions/5896081 +name: VSphereHW14CrossNodeNetworkingError +message: Clusters on vSphere Virtual Hardware Version 14 and later may experience cross-node networking issues. +matchingRules: +- type: PromQL + promql: + promql: | + cluster_infrastructure_provider{type=~"VSphere|None"} + or + 0 * cluster_infrastructure_provider diff --git a/blocked-edges/4.7.4-zz-vsphere-hostnames-changing.yaml b/blocked-edges/4.7.4-zz-vsphere-hostnames-changing.yaml new file mode 100644 index 000000000..90dbf02ca --- /dev/null +++ b/blocked-edges/4.7.4-zz-vsphere-hostnames-changing.yaml @@ -0,0 +1,12 @@ +to: 4.7.4 +from: .* +url: https://bugzilla.redhat.com/show_bug.cgi?id=1942207#c3 +name: VSphereNodeNameChanges +message: vSphere clusters leveraging the vSphere cloud provider may lose node names which can have serious impacts on the stability of the control plane and workloads. +matchingRules: +- type: PromQL + promql: + promql: | + cluster_infrastructure_provider{type=~"VSphere|None"} + or + 0 * cluster_infrastructure_provider diff --git a/blocked-edges/4.7.4.yaml b/blocked-edges/4.7.4.yaml deleted file mode 100644 index 54e7001c0..000000000 --- a/blocked-edges/4.7.4.yaml +++ /dev/null @@ -1,5 +0,0 @@ -to: 4.7.4 -from: .* -# 4.7.4 introduced a node-hostname clear on vSphere: https://bugzilla.redhat.com/show_bug.cgi?id=1942207 -# 4.7 has cross-node SDN issues on vSphere Virtual Hardware Version 14 and later: https://bugzilla.redhat.com/show_bug.cgi?id=1935539 -# Authentication operator's endpoints controller fails to close connections, causing intermittent apiserver and authentication cluster operator instability and high memory usage by the router. A regression since 4.6, eventually fixed in 4.7.11 https://bugzilla.redhat.com/show_bug.cgi?id=1941840 diff --git a/version b/version index 3eefcb9dd..9084fa2f7 100644 --- a/version +++ b/version @@ -1 +1 @@ -1.0.0 +1.1.0