Skip to content

Conversation

@QiWang19
Copy link
Member

Set the grace period to 10 minutes. Current CI job indicate that the exisitng 2 minutes cause failures.
gcp-ovn-rt-upgrade
aws-ovn-upgrade-fips

@openshift-ci-robot
Copy link

Pipeline controller notification
This repository is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger optional jobs most likely to be impacted by the proposed changes.

@openshift-ci openshift-ci bot requested review from deads2k and sjenning November 19, 2025 21:04
Set the grace period to 10 minutes. Current CI job indicate that the exisitng 2 minutes cause failures.

Signed-off-by: Qi Wang <qiwan@redhat.com>
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2025
@wking
Copy link
Member

wking commented Nov 19, 2025

/payload-aggregate periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade 5
/payload-aggregate periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips 5

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 19, 2025

@wking: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/41e8c5c0-c58d-11f0-88f8-96baaed0c118-0

wking added a commit to wking/cluster-update-keys that referenced this pull request Nov 19, 2025
…-openshift-cip""

This reverts commit 7a5dcee.

This one has taken us some time:

* 2025-08-27, 94f7582, openshift#82 was our first attempt at enabling the
  ClusterImagePolicy.
* ...but it tripped up the origin test suite, so it was reverted in
  2025-08-28, c40e7b9, openshift#83.
* Qi then hardened the test suite with openshift/origin@d3af51e4acb
  (not fail upgrade checks if all nodes are ready, 2025-09-29,
  openshift/origin#30318) and openshift/origin@2fd0d8e242 (Upgrade
  test add 2min grace period allow non-drain updates to complete,
  2025-11-12, openshift/origin#30480).
* With the tougher CI in place, we tried a second time with
  2025-11-17, 1f89a67, openshift#85.
* ...but still tripped up origin, with runs like [1] taking 2.25m
  (more than the 2m grace period):

    I1119 17:26:21.890667 1511 upgrade.go:629] Waiting on pools to be upgraded
    I1119 17:26:21.939178 1511 upgrade.go:792] Pool master is still reporting (Updated: false, Updating: true, Degraded: false)
    I1119 17:26:21.939259 1511 upgrade.go:666] Invariant violation detected: master pool requires update but nodes not ready. Waiting up to 2m0s for non-draining updates to complete
    I1119 17:26:31.984116 1511 upgrade.go:792] Pool master is still reporting (Updated: false, Updating: true, Degraded: false)
    ...
    I1119 17:28:21.981438 1511 upgrade.go:792] Pool master is still reporting (Updated: false, Updating: true, Degraded: false)
    I1119 17:28:21.981514 1511 upgrade.go:673] Invariant violation detected: the "master" pool should be updated before the CVO reports available at the new version

  and:

    $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1991158541779472384/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/inspect/cluster-scoped-resources/machineconfiguration.openshift.io/machineconfigpools/master.yaml | yaml2json | jq -r '.status.conditions[] | select(.type == "Updating") | .lastTransitionTime + " " + .status'
    2025-11-19T17:28:36Z False

  28:36 - 26:21 = 135s = 2.25m, which overshot the 2m grace period.
  The second attempt was reverted in 7a5dcee, openshift#87.

* Qi then hardened the test suite further with
  openshift/origin@c17e560263 (Update grace period for cluster upgrade
  to 10 minutes, 2025-11-19, #openshift/origin#30506).
* This commit is taking a third attempt at enabling the
  ClusterImagePolicy.

[1]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade/1991158541779472384
@neisw
Copy link
Contributor

neisw commented Nov 19, 2025

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw, QiWang19, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2025
@QiWang19 QiWang19 changed the title Update grace period for cluster upgrade to 10 minutes OCPNODE-3877: Update grace period for cluster upgrade to 10 minutes Nov 19, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 19, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 19, 2025

@QiWang19: This pull request references OCPNODE-3877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Set the grace period to 10 minutes. Current CI job indicate that the exisitng 2 minutes cause failures.
gcp-ovn-rt-upgrade
aws-ovn-upgrade-fips

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@neisw
Copy link
Contributor

neisw commented Nov 20, 2025

/verified by payload-jobs

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 20, 2025
@openshift-ci-robot
Copy link

@neisw: This PR has been marked as verified by payload-jobs.

In response to this:

/verified by payload-jobs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wking
Copy link
Member

wking commented Nov 20, 2025

/retest-required

@wking
Copy link
Member

wking commented Nov 20, 2025

Hmm, trying to wiggle Prow out of the Waiting for status to be reported — Pipeline controller will trigger this test state. Maybe:

/reset
/skip
/refresh

@jmguzik
Copy link

jmguzik commented Nov 20, 2025

If for whatever reason second stage was not triggered, you can manually trigger it using:
/pipeline required

Please see https://docs.ci.openshift.org/docs/how-tos/creating-a-pipeline/ for more info

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-ovn-upgrade-rollback

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 484f6c0 and 2 for PR HEAD c17e560 in total

@neisw
Copy link
Contributor

neisw commented Nov 20, 2025

/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-vsphere-ovn

Both failed on Services should be rejected for evicted pods which are known flakes

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 20, 2025

@neisw: Overrode contexts on behalf of neisw: ci/prow/e2e-gcp-ovn, ci/prow/e2e-vsphere-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn
/override ci/prow/e2e-vsphere-ovn

Both failed on Services should be rejected for evicted pods which are known flakes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@neisw
Copy link
Contributor

neisw commented Nov 20, 2025

/test verify
/test verify-deps

@neisw
Copy link
Contributor

neisw commented Nov 20, 2025

/override ci/prow/e2e-metal-ipi-ovn-ipv6

passed previously and the changes are isolated

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 20, 2025

@neisw: Overrode contexts on behalf of neisw: ci/prow/e2e-metal-ipi-ovn-ipv6

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6

passed previously and the changes are isolated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@neisw
Copy link
Contributor

neisw commented Nov 20, 2025

/override ci/prow/e2e-vsphere-ovn
/override ci/prow/verify

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 20, 2025

@neisw: Overrode contexts on behalf of neisw: ci/prow/e2e-vsphere-ovn, ci/prow/verify

In response to this:

/override ci/prow/e2e-vsphere-ovn
/override ci/prow/verify

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 96ce99d into openshift:main Nov 20, 2025
20 of 21 checks passed
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 20, 2025

@QiWang19: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-upgrade-rollback c17e560 link false /test e2e-aws-ovn-upgrade-rollback

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants