ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152

wking · 2020-07-09T17:19:47Z

In 7ee21db (#10046), I'd mentioned generic names as a way for component teams to say "we don't care which platform, other folks can pick whatever they want to balance platform volume vs. capacity". @sdodson pushed back based on lack of precedent. And we currently do rely on platforms in job-names for monitored success rates. But these are presubmits, and presubmits are noisy (e.g. if you pull-request some buggy code, your presubmits will fail, but the delivered product is not affected by your in-flight bugs). So in the presubmit case we are not constrained by job-name-based health reporting. And we run a lot of presubmit volume, so even if this approach only works in the presubmit case, it gives us a lot of rebalancing flexibility.

Benefits for component teams (the CVO maintainers in the case of this commit) include not having to care about where their jobs get scheduled. And with the job names not changing during rebalances, they don't have to remember to /skip or /refresh or whatever, because Prow would understand that it's the same effective test regardless of the underlying platform.

Benefits to rebalancing admins is that they don't have to ask component teams to opt-in at rebalance time. Teams can opt in with this pattern, and rebalancing admins can just search for e2e steps that do not include a platform slug in their as config name.

Generated by manually changing ci-operator/config/... and then running:

$ make update

…p -> e2e for 4.4+ In 7ee21db (ci-operator/jobs/openshift/cluster-version-operator: Move 4.4 and later presubmits to GCP, 2020-07-02, openshift#10046), I'd mentioned generic names as a way for component teams to say "we don't care which platform, other folks can pick whatever they want to balance platform volume vs. capacity". Scott pushed back based on lack of precedent [1,2]. And we currently do rely on platforms in job-names for monitored success rates. But these are presubmits, and presubmits are noisy (e.g. if you pull-request some buggy code, your presubmits will fail, but the delivered product is not affected by your in-flight bugs). So in the presubmit case we are not constrained by job-name-based health reporting. And we run a lot of presubmit volume, so even if this approach only works in the presubmit case, it gives us a lot of rebalancing flexibility. Benefits for component teams (the CVO maintainers in the case of this commit) include not having to care about where their jobs get scheduled. And with the job names not changing during rebalances, they don't have to remember to /skip or /refresh or whatever, because Prow would understand that it's the same effective test regardless of the underlying platform. Benefits to rebalancing admins is that they don't have to ask component teams to opt-in at rebalance time. Teams can opt in with this pattern, and rebalancing admins can just search for e2e steps that do not include a platform slug in their 'as' config name. Generated by manually changing ci-operator/config/... and then running: $ make update [1]: openshift#10046 (comment) [2]: openshift#10046 (comment)

openshift-ci-robot · 2020-07-09T17:21:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift/cluster-version-operator/OWNERS~~ [stevekuznetsov,wking]
~~ci-operator/jobs/openshift/cluster-version-operator/OWNERS~~ [stevekuznetsov,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-07-09T17:29:12Z

@wking: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/rehearse/openshift/cluster-version-operator/master/e2e	`e7bb102`	link	`/test pj-rehearse`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking · 2020-07-09T17:31:50Z

e2e presubmit:

error: unable to connect to image repository registry.svc.ci.openshift.org/ci-op-gg96rlwy/stable@sha256:d3da4a0e896cf5fcf1302a8edd52264fd4a76cd8b4e17976c7cd314bb9524e03: endpoint "https://registry.svc.ci.openshift.org" does not support v2 API (got 503 Service Unavailable)

Seems unrelated to my change.

openshift-ci-robot · 2020-07-09T17:34:04Z

@wking: Updated the following 15 configmaps:

ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-master.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-master.yaml
ci-operator-4.6-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.6.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.6.yaml
job-config-master configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-master-presubmits.yaml
job-config-4.5 configmap in namespace ci at cluster api.ci using the following files:
- key openshift-cluster-version-operator-release-4.5-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.5-presubmits.yaml
job-config-4.6 configmap in namespace ci at cluster api.ci using the following files:
- key openshift-cluster-version-operator-release-4.6-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.6-presubmits.yaml
job-config-4.7 configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.7-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.7-presubmits.yaml
job-config-4.4 configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.4-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.4-presubmits.yaml
job-config-4.5 configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.5-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.5-presubmits.yaml
job-config-4.7 configmap in namespace ci at cluster api.ci using the following files:
- key openshift-cluster-version-operator-release-4.7-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.7-presubmits.yaml
ci-operator-4.5-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.5.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.5.yaml
ci-operator-4.4-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.4.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.4.yaml
ci-operator-4.7-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.7.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.7.yaml
job-config-master configmap in namespace ci at cluster api.ci using the following files:
- key openshift-cluster-version-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-master-presubmits.yaml
job-config-4.4 configmap in namespace ci at cluster api.ci using the following files:
- key openshift-cluster-version-operator-release-4.4-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.4-presubmits.yaml
job-config-4.6 configmap in namespace ci at cluster app.ci using the following files:
- key openshift-cluster-version-operator-release-4.6-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.6-presubmits.yaml

In response to this:

In 7ee21db (#10046), I'd mentioned generic names as a way for component teams to say "we don't care which platform, other folks can pick whatever they want to balance platform volume vs. capacity". @sdodson pushed back based on lack of precedent. And we currently do rely on platforms in job-names for monitored success rates. But these are presubmits, and presubmits are noisy (e.g. if you pull-request some buggy code, your presubmits will fail, but the delivered product is not affected by your in-flight bugs). So in the presubmit case we are not constrained by job-name-based health reporting. And we run a lot of presubmit volume, so even if this approach only works in the presubmit case, it gives us a lot of rebalancing flexibility.

Benefits for component teams (the CVO maintainers in the case of this commit) include not having to care about where their jobs get scheduled. And with the job names not changing during rebalances, they don't have to remember to /skip or /refresh or whatever, because Prow would understand that it's the same effective test regardless of the underlying platform.

Benefits to rebalancing admins is that they don't have to ask component teams to opt-in at rebalance time. Teams can opt in with this pattern, and rebalancing admins can just search for e2e steps that do not include a platform slug in their as config name.

Generated by manually changing ci-operator/config/... and then running:
$ make update

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), the "e2e-aws" -> "e2e" change declares the router presubmits to be platform-agnostic. The AWS -> GCP change takes advantage of the platform-agnosticism to shift CI load from AWS (where we're currently pegging Boskos lease capacity) to GCP (where we have some spare lease capacity).

… e2e-aws -> e2e for 4.6+ Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), the "e2e-aws" -> "e2e" change declares the router presubmits to be platform-agnostic (although David has specifically requested to be kept off Azure based on CI-success stability concerns). The AWS -> GCP change takes advantage of the platform-agnosticism to shift CI load from AWS (where we're currently pegging Boskos lease capacity) to GCP (where we have some spare lease capacity).

…-aws -> e2e for 4.6+ Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), the "e2e-aws" -> "e2e" change declares the monitoring-operator presubmits to be platform-agnostic. I'm leaving e2e-aws-operator alone, because it relies on AWS-specific gp2 storage configuration [1,2]. The AWS -> GCP change takes advantage of the platform-agnosticism to shift CI load from AWS (where we're currently pegging Boskos lease capacity) to GCP (where we have some spare lease capacity). Generated by manually changing ci-operator/config/... and then running: $ make update [1]: openshift#10377 (comment) [2]: https://github.com/openshift/cluster-monitoring-operator/blob/e1caabda745caba6e4784095aebd75f561e1244d/test/e2e/alertmanager_test.go#L51-L61

Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), this commit is using platform-agnostic names for the required tests. The role of platform-agnostic tests is discussed in 4212d7d (ci-operator/README: Discuss platform job rebalancing, 2020-07-09, openshift#10166). The platform-specific jobs are retained, in case folks want to ask for them explicitly with '/test e2e-aws', etc. while performing any platform-specific tuning logic. Making them on-demand reduces our load in platforms where we are near capacity. With this change: * We grow a new, platform-agnostic e2e that is always_run=true and optional=false. * e2e-gcp-upgrade becomes the platform-agnostic e2e-upgrade. * e2e-aws-operator becomes the platform-agnostic e2e-operator. * e2e-aws-disruptive becomes the platform-agnostic e2e-disruptive. * e2e-aws and e2e-gcp become always_run=false and optional=true. * e2e-azure and e2e-metal-ipi become always_run=false (they were already optional=true). I've also ordered the configs to place the platform-agnostic stuff first and shunt the platform-specific stuff towards the end. Generated by manually changing `ci-operator/config/...`, running: $ make update and tuning the configurable [1] always_run and optional. [1]: https://github.com/openshift/ci-tools/blob/7fb6fd8b3802e47162442c9a5e10807952ba12eb/GENERATOR.md#hand-edited-prow-configuration

Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), this commit is using platform-agnostic names. The role of platform-agnostic tests is discussed in ci-operator/platform-balance. The choice of a GCP implementation for the now-agnostic tests is because we currently have free GCP capacity, while AWS is maxed out. I've left the benchmark test on AWS, because I'm not sure what that's about. It might be AWS-specific. Generated by manually changing `ci-operator/config/...`, running: $ make update

Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), this commit is using platform-agnostic names. The role of platform-agnostic tests is discussed in ci-operator/platform-balance. The choice of a GCP implementation for the now-agnostic tests is because we currently have free GCP capacity, while AWS is maxed out. Generated by manually changing `ci-operator/config/...`, running: $ make update

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2020

openshift-ci-robot requested review from crawford and smarterclayton July 9, 2020 17:20

stevekuznetsov approved these changes Jul 9, 2020

View reviewed changes

openshift-ci-robot assigned stevekuznetsov Jul 9, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2020

openshift-merge-robot merged commit be73683 into openshift:master Jul 9, 2020

wking deleted the platform-agnostic-cvo-preflights branch July 9, 2020 17:41

wking mentioned this pull request Jul 10, 2020

ci-operator/config/openshift/router: Generic e2e-aws -> e2e for 4.6+ #10179

Merged

wking mentioned this pull request Jul 10, 2020

ci-operator/config/openshift/cluster-authentication-operator: Generic e2e-aws -> e2e for 4.6+ #10180

Merged

wking mentioned this pull request Jul 21, 2020

ci-operator/config/openshift/cluster-monitoring-operator: Generic e2e-aws -> e2e for 4.6+ #10377

Merged

wking mentioned this pull request Aug 1, 2020

Switch e2e and e2e-upgrade jobs to gcp in-keeping with current norms #10618

Merged

wking mentioned this pull request Aug 27, 2020

ci-operator/config/openshift/cluster-etcd-operator: Generic e2e for 4.6+ #11370

Merged

wking mentioned this pull request Aug 31, 2020

ci-operator/config/openshift/telemeter: Generic e2e for 4.6+ #11459

Closed

wking mentioned this pull request Aug 31, 2020

ci-operator/config/openshift/cluster-bootstrap: Generic e2e for 4.6+ #11460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152

ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152

Uh oh!

wking commented Jul 9, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020

Uh oh!

wking commented Jul 9, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020

Uh oh!

Uh oh!

ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152

ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152

Uh oh!

Conversation

wking commented Jul 9, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020

Uh oh!

wking commented Jul 9, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020

Uh oh!

Uh oh!