OCPNODE-3659: Not fail upgrade checks if all nodes are ready #30318

QiWang19 · 2025-09-29T21:51:24Z

Adjust the test to not fail upgrade checks if all nodes are ready. This allows for updates that do not require node drain. Shipping a default ClusterImagePolicy during upgrade openshift/cluster-update-keys#85

Signed-off-by: Qi Wang <qiwan@redhat.com>

openshift-ci · 2025-09-29T21:51:38Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

QiWang19 · 2025-09-29T21:55:50Z

/test all

QiWang19 · 2025-09-30T15:35:38Z

/retest-required

openshift-ci-robot · 2025-09-30T17:48:14Z

@QiWang19: This pull request references OCPNODE-3659 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

not fail upgrade checks if all nodes are ready. This allows for updates that do not require node drain.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-09-30T17:48:59Z

@QiWang19: This pull request references OCPNODE-3659 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

not fail upgrade checks if all nodes are ready. This allows for updates that do not require node drain. Shipping a default ClusterImagePolicy during upgrade openshift/cluster-update-keys#85

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-09-30T17:49:37Z

@QiWang19: This pull request references OCPNODE-3659 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Adjust the test to not fail upgrade checks if all nodes are ready. This allows for updates that do not require node drain. Shipping a default ClusterImagePolicy during upgrade openshift/cluster-update-keys#85

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

QiWang19 · 2025-09-30T18:01:03Z

/testwith openshift/cluster-update-keys/main/e2e-aws-upgrade openshift/cluster-update-keys#85 #30318

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-update-keys-85-openshift-cluster-update-keys-85-openshift-origin-30318-e2e-aws-upgrade/1973048767556882432

[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] test passed

openshift-ci · 2025-09-30T22:26:23Z

@QiWang19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`d3af51e`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-metal-ipi-ovn	`d3af51e`	link	false	`/test e2e-metal-ipi-ovn`
ci/prow/e2e-openstack-ovn	`d3af51e`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout	`d3af51e`	link	false	`/test e2e-metal-ipi-ovn-kube-apiserver-rollout`
ci/prow/e2e-aws-ovn-upgrade-rollback	`d3af51e`	link	false	`/test e2e-aws-ovn-upgrade-rollback`
ci/prow/e2e-aws-ovn-edge-zones	`d3af51e`	link	false	`/test e2e-aws-ovn-edge-zones`
ci/prow/e2e-aws-csi	`d3af51e`	link	false	`/test e2e-aws-csi`
ci/prow/e2e-aws-ovn-single-node	`d3af51e`	link	false	`/test e2e-aws-ovn-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

wking · 2025-10-01T02:20:42Z

test/e2e/upgrade/upgrade.go

 			framework.Logf("Waiting on pools to be upgraded")
 			if err := wait.PollImmediate(10*time.Second, 30*time.Minute, func() (bool, error) {
+
+				nodes, err := kubeClient.CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})


It's hard to imagine a MachineConfigPool update that cordons and drains control-plane nodes where we never see a single master Node that's Unschedulable=True in this 10s poll loop, so looks good to me.

wking

/lgtm

QiWang19 · 2025-10-01T17:41:12Z

/assign @neisw

QiWang19 · 2025-10-01T17:44:25Z

/verified by @QiWang19
verified by running /testwith jobs.

openshift-ci-robot · 2025-10-01T17:44:39Z

@QiWang19: This PR has been marked as verified by @QiWang19.

In response to this:

/verified by @QiWang19
verified by running /testwith jobs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

neisw · 2025-10-01T18:10:51Z

/approve

openshift-ci · 2025-10-01T18:11:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw, QiWang19, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [neisw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neisw · 2025-10-01T18:12:27Z

/retest-required

openshift-ci-robot · 2025-10-01T20:12:55Z

/retest-required

Remaining retests: 0 against base HEAD b2be961 and 2 for PR HEAD d3af51e in total

openshift-trt · 2025-10-01T23:36:30Z

Job Failure Risk Analysis for sha: d3af51e

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-upgrade-rollback	High [Monitor:oc-adm-upgrade-status][sig-cli][OCPFeatureGate:UpgradeStatus] oc adm upgrade status snapshots reflect the cluster upgrade lifecycle This test has passed 99.94% of 5114 runs on release 4.21 [Overall] in the last week.

QiWang19 · 2025-10-02T02:26:18Z

/retest-required

…atus The previous PR (openshift#30318) allowed non-drained updates but also required the node annotation "machineconfiguration.openshift.io/state"="Done". This condition is too strict, as the MCD may not be in the "Done" state when the nodes remain schedulable and fully functional. Signed-off-by: Qi Wang <qiwan@redhat.com>

…-openshift-cip"" This reverts commit 7a5dcee. This one has taken us some time: * 2025-08-27, 94f7582, openshift#82 was our first attempt at enabling the ClusterImagePolicy. * ...but it tripped up the origin test suite, so it was reverted in 2025-08-28, c40e7b9, openshift#83. * Qi then hardened the test suite with openshift/origin@d3af51e4acb (not fail upgrade checks if all nodes are ready, 2025-09-29, openshift/origin#30318) and openshift/origin@2fd0d8e242 (Upgrade test add 2min grace period allow non-drain updates to complete, 2025-11-12, openshift/origin#30480). * With the tougher CI in place, we tried a second time with 2025-11-17, 1f89a67, openshift#85. * ...but still tripped up origin, with runs like [1] taking 2.25m (more than the 2m grace period): I1119 17:26:21.890667 1511 upgrade.go:629] Waiting on pools to be upgraded I1119 17:26:21.939178 1511 upgrade.go:792] Pool master is still reporting (Updated: false, Updating: true, Degraded: false) I1119 17:26:21.939259 1511 upgrade.go:666] Invariant violation detected: master pool requires update but nodes not ready. Waiting up to 2m0s for non-draining updates to complete I1119 17:26:31.984116 1511 upgrade.go:792] Pool master is still reporting (Updated: false, Updating: true, Degraded: false) ... I1119 17:28:21.981438 1511 upgrade.go:792] Pool master is still reporting (Updated: false, Updating: true, Degraded: false) I1119 17:28:21.981514 1511 upgrade.go:673] Invariant violation detected: the "master" pool should be updated before the CVO reports available at the new version and: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1991158541779472384/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/inspect/cluster-scoped-resources/machineconfiguration.openshift.io/machineconfigpools/master.yaml | yaml2json | jq -r '.status.conditions[] | select(.type == "Updating") | .lastTransitionTime + " " + .status' 2025-11-19T17:28:36Z False 28:36 - 26:21 = 135s = 2.25m, which overshot the 2m grace period. The second attempt was reverted in 7a5dcee, openshift#87. * Qi then hardened the test suite further with openshift/origin@c17e560263 (Update grace period for cluster upgrade to 10 minutes, 2025-11-19, #openshift/origin#30506). * This commit is taking a third attempt at enabling the ClusterImagePolicy. [1]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1991158541779472384

not fail upgrade checks if all nodes are ready

d3af51e

Signed-off-by: Qi Wang <qiwan@redhat.com>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 29, 2025

QiWang19 mentioned this pull request Sep 29, 2025

OCPNODE-3611: promote openshift ClusterImagePolicy to default featureset openshift/cluster-update-keys#85

Merged

QiWang19 marked this pull request as ready for review September 30, 2025 17:47

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2025

QiWang19 changed the title ~~Not fail upgrade checks if all nodes are ready~~ OCPNODE-3659: Not fail upgrade checks if all nodes are ready Sep 30, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 30, 2025

openshift-ci bot requested review from deads2k and isabella-janssen September 30, 2025 17:48

wking reviewed Oct 1, 2025

View reviewed changes

wking approved these changes Oct 1, 2025

View reviewed changes

openshift-ci bot assigned wking Oct 1, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 1, 2025

openshift-ci bot assigned neisw Oct 1, 2025

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 1, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 1, 2025

openshift-merge-bot bot merged commit 01a7cc0 into openshift:main Oct 2, 2025
11 of 31 checks passed

QiWang19 mentioned this pull request Nov 11, 2025

Allow non-drained updates based on node schedulability and kubelet status #30477

Closed

wking mentioned this pull request Nov 19, 2025

TRT-2426: Revert #85 " \tOCPNODE-3611: promote openshift ClusterImagePolicy to default featureset" openshift/cluster-update-keys#87

Merged

wking mentioned this pull request Nov 19, 2025

TRT-2426: Third run at default feature set cluster image policy openshift/cluster-update-keys#89

Merged

OCPNODE-3659: Not fail upgrade checks if all nodes are ready #30318

OCPNODE-3659: Not fail upgrade checks if all nodes are ready #30318

Uh oh!

Conversation

QiWang19 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Sep 29, 2025

Uh oh!

QiWang19 commented Sep 29, 2025

Uh oh!

QiWang19 commented Sep 30, 2025

Uh oh!

openshift-ci-robot commented Sep 30, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Sep 30, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Sep 30, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QiWang19 commented Sep 30, 2025

Uh oh!

openshift-ci bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

wking left a comment

Choose a reason for hiding this comment

Uh oh!

QiWang19 commented Oct 1, 2025

Uh oh!

QiWang19 commented Oct 1, 2025

Uh oh!

openshift-ci-robot commented Oct 1, 2025

Uh oh!

neisw commented Oct 1, 2025

Uh oh!

openshift-ci bot commented Oct 1, 2025

Uh oh!

neisw commented Oct 1, 2025

Uh oh!

openshift-ci-robot commented Oct 1, 2025

Uh oh!

openshift-trt bot commented Oct 1, 2025

Uh oh!

QiWang19 commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

QiWang19 commented Sep 29, 2025 •

edited

Loading

openshift-ci-robot commented Sep 30, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Sep 30, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Sep 30, 2025 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Sep 30, 2025 •

edited

Loading