-
Notifications
You must be signed in to change notification settings - Fork 2k
OCPQE-29735: Adding shard option to actual run command to work #65460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPQE-29735: Adding shard option to actual run command to work #65460
Conversation
|
@miyadav: This pull request references OCPQE-29735 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@miyadav: This pull request references OCPQE-29735 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
damdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
You'll need to add the rehearsal ack when happy
|
@miyadav you'll also need
|
| documentation: |- | ||
| Indicates if the active cluster is an OpenShift cluster or a derivative (e.g., Hypershift, Microshift). | ||
| A value of "true" means the cluster is OpenShift or a derivative, while "false" means it is not (e.g., AKS). | ||
| - name: SHARD_ARGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the arg used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got this failure in earlier run , hence was trying ( I tested it for one of the workflow where the step was openshift-extented-test and it passed , the chain used here also uses openshift-extended-test , so earlier I did not think it needed it but when failed , hence added it )
SHARD_ARGS="--shard-count 3 --shard-id 1" error: unknown flag: --shard-count
/hold
holding this up again and reviewing it further , apparently one of the run has passed so , will figure out what is the difference .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to go to clusterinfra-qe/regression/openshift-e2e-test-clusterinfra-qe-regression-chain.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test regression-clusterinfra-azure-ipi-mapi contains steps below. So which step we add the option in depends on which step we want to divide based on running time. Do you mean if we don't set the option in the step, it would error out? we need to figure it out why and it doesn't make much sense if we declare it but never use it.
steps:
- chain: cucushift-installer-check-cluster-health
- ref: idp-htpasswd
- ref: cucushift-pre
- ref: openshift-extended-test
- ref: cucushift-e2e
- ref: openshift-e2e-test-clusterinfra-qe
- ref: openshift-e2e-test-qe-report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @shellyyang1989 you are correct , reviewed it , checked with @liangxia as well , it doesn't have any requirement for chain to be aware about it , the if step knows it . Made all the changes now, the tests looking good now , will wait for them to finish and report status .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one without shard took 3 hrs to run for openshift-tests-private tests -
INFO[2025-05-22T04:56:41Z] Running step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test. INFO[2025-05-22T08:11:19Z] Step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test succeeded after 3h14m37s.
With sharding they took approx ~ 1 hr 10 mins- 1-2-3 shards
INFO[2025-05-30T08:08:14Z] Running step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test. INFO[2025-05-30T09:24:01Z] Step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test succeeded after 1h15m47s.
INFO[2025-05-30T08:12:19Z] Running step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test. INFO[2025-05-30T09:09:01Z] Step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test succeeded after 56m41s.
INFO[2025-05-30T08:22:45Z] Running step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test. INFO[2025-05-30T09:41:40Z] Step regression-clusterinfra-azure-ipi-mapi-openshift-extended-test succeeded after 1h18m54s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Milind. yeah, for the for openshift-tests-private tests, we have the shard option in ci-operator/step-registry/openshift-extended/test/openshift-extended-test-ref.yaml which is fine. I meant do we need the shard option to be defined for ci-operator/step-registry/openshift/e2e/test/clusterinfra-qe/openshift-e2e-test-clusterinfra-qe-ref.yaml. It doesn't used in openshift-e2e-test-clusterinfra-qe-commands.sh and we need to think about if we need to shard the step which run the tests in cluster-api-actuator-pkg. Am I misunderstanding anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you didn't misunderstood, we don't need it , unless we want to shard the cluster-api-actuator tests as well .
ci-operator/step-registry/openshift-extended/test/openshift-extended-test-ref.yaml
Outdated
Show resolved
Hide resolved
...step-registry/openshift/e2e/test/clusterinfra-qe/openshift-e2e-test-clusterinfra-qe-ref.yaml
Outdated
Show resolved
Hide resolved
ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh
Outdated
Show resolved
Hide resolved
3c61379 to
de0af77
Compare
|
/pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
de0af77 to
fa7d363
Compare
|
/pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
fa7d363 to
81e6e43
Compare
|
/pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/lgtm |
|
/pj-rehearse ack |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/unhold |
|
@miyadav: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
test the changes informing main chain as well of the param in the step Update ci-operator/step-registry/openshift-extended/test/openshift-extended-test-ref.yaml Update ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh Update ci-operator/step-registry/openshift/e2e/test/clusterinfra-qe/openshift-e2e-test-clusterinfra-qe-ref.yaml Co-authored-by: Penghao <pewang@redhat.com>
a59e7d7 to
bfc4da8
Compare
|
[REHEARSALNOTIFIER]
A total of 2740 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse ack |
|
@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damdo, liangxia, miyadav, shellyyang1989 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…hift#65460) * OCPQE-29735: Adding shard option to actual run command to work test the changes informing main chain as well of the param in the step Update ci-operator/step-registry/openshift-extended/test/openshift-extended-test-ref.yaml Update ci-operator/step-registry/openshift-extended/test/openshift-extended-test-commands.sh Update ci-operator/step-registry/openshift/e2e/test/clusterinfra-qe/openshift-e2e-test-clusterinfra-qe-ref.yaml Co-authored-by: Penghao <pewang@redhat.com> * removing shard_args from chain as not doing sharding for cluster-api-actuator-pkg --------- Co-authored-by: Penghao <pewang@redhat.com>
fixes - #64959 for now.
@sunzhaohua2 @huali9 @shellyyang1989 PTAL .
Validation looks good ( did for longduration cases ), discussing in slack , we can add to others steps/chains as well, once productivity team is good with changes .