-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+ #10152
Conversation
…p -> e2e for 4.4+ In 7ee21db (ci-operator/jobs/openshift/cluster-version-operator: Move 4.4 and later presubmits to GCP, 2020-07-02, openshift#10046), I'd mentioned generic names as a way for component teams to say "we don't care which platform, other folks can pick whatever they want to balance platform volume vs. capacity". Scott pushed back based on lack of precedent [1,2]. And we currently do rely on platforms in job-names for monitored success rates. But these are presubmits, and presubmits are noisy (e.g. if you pull-request some buggy code, your presubmits will fail, but the delivered product is not affected by your in-flight bugs). So in the presubmit case we are not constrained by job-name-based health reporting. And we run a lot of presubmit volume, so even if this approach only works in the presubmit case, it gives us a lot of rebalancing flexibility. Benefits for component teams (the CVO maintainers in the case of this commit) include not having to care about where their jobs get scheduled. And with the job names not changing during rebalances, they don't have to remember to /skip or /refresh or whatever, because Prow would understand that it's the same effective test regardless of the underlying platform. Benefits to rebalancing admins is that they don't have to ask component teams to opt-in at rebalance time. Teams can opt in with this pattern, and rebalancing admins can just search for e2e steps that do not include a platform slug in their 'as' config name. Generated by manually changing ci-operator/config/... and then running: $ make update [1]: openshift#10046 (comment) [2]: openshift#10046 (comment)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stevekuznetsov, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Seems unrelated to my change. |
@wking: Updated the following 15 configmaps:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), the "e2e-aws" -> "e2e" change declares the router presubmits to be platform-agnostic. The AWS -> GCP change takes advantage of the platform-agnosticism to shift CI load from AWS (where we're currently pegging Boskos lease capacity) to GCP (where we have some spare lease capacity).
… e2e-aws -> e2e for 4.6+ Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), the "e2e-aws" -> "e2e" change declares the router presubmits to be platform-agnostic (although David has specifically requested to be kept off Azure based on CI-success stability concerns). The AWS -> GCP change takes advantage of the platform-agnosticism to shift CI load from AWS (where we're currently pegging Boskos lease capacity) to GCP (where we have some spare lease capacity).
…-aws -> e2e for 4.6+ Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), the "e2e-aws" -> "e2e" change declares the monitoring-operator presubmits to be platform-agnostic. I'm leaving e2e-aws-operator alone, because it relies on AWS-specific gp2 storage configuration [1,2]. The AWS -> GCP change takes advantage of the platform-agnosticism to shift CI load from AWS (where we're currently pegging Boskos lease capacity) to GCP (where we have some spare lease capacity). Generated by manually changing ci-operator/config/... and then running: $ make update [1]: openshift#10377 (comment) [2]: https://github.com/openshift/cluster-monitoring-operator/blob/e1caabda745caba6e4784095aebd75f561e1244d/test/e2e/alertmanager_test.go#L51-L61
Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), this commit is using platform-agnostic names for the required tests. The role of platform-agnostic tests is discussed in 4212d7d (ci-operator/README: Discuss platform job rebalancing, 2020-07-09, openshift#10166). The platform-specific jobs are retained, in case folks want to ask for them explicitly with '/test e2e-aws', etc. while performing any platform-specific tuning logic. Making them on-demand reduces our load in platforms where we are near capacity. With this change: * We grow a new, platform-agnostic e2e that is always_run=true and optional=false. * e2e-gcp-upgrade becomes the platform-agnostic e2e-upgrade. * e2e-aws-operator becomes the platform-agnostic e2e-operator. * e2e-aws-disruptive becomes the platform-agnostic e2e-disruptive. * e2e-aws and e2e-gcp become always_run=false and optional=true. * e2e-azure and e2e-metal-ipi become always_run=false (they were already optional=true). I've also ordered the configs to place the platform-agnostic stuff first and shunt the platform-specific stuff towards the end. Generated by manually changing `ci-operator/config/...`, running: $ make update and tuning the configurable [1] always_run and optional. [1]: https://github.com/openshift/ci-tools/blob/7fb6fd8b3802e47162442c9a5e10807952ba12eb/GENERATOR.md#hand-edited-prow-configuration
Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), this commit is using platform-agnostic names. The role of platform-agnostic tests is discussed in ci-operator/platform-balance. The choice of a GCP implementation for the now-agnostic tests is because we currently have free GCP capacity, while AWS is maxed out. I've left the benchmark test on AWS, because I'm not sure what that's about. It might be AWS-specific. Generated by manually changing `ci-operator/config/...`, running: $ make update
Following the pattern from e7bb102 (ci-operator/config/openshift/cluster-version-operator: Generic e2e-gcp -> e2e for 4.4+, 2020-07-09, openshift#10152), this commit is using platform-agnostic names. The role of platform-agnostic tests is discussed in ci-operator/platform-balance. The choice of a GCP implementation for the now-agnostic tests is because we currently have free GCP capacity, while AWS is maxed out. Generated by manually changing `ci-operator/config/...`, running: $ make update
In 7ee21db (#10046), I'd mentioned generic names as a way for component teams to say "we don't care which platform, other folks can pick whatever they want to balance platform volume vs. capacity". @sdodson pushed back based on lack of precedent. And we currently do rely on platforms in job-names for monitored success rates. But these are presubmits, and presubmits are noisy (e.g. if you pull-request some buggy code, your presubmits will fail, but the delivered product is not affected by your in-flight bugs). So in the presubmit case we are not constrained by job-name-based health reporting. And we run a lot of presubmit volume, so even if this approach only works in the presubmit case, it gives us a lot of rebalancing flexibility.
Benefits for component teams (the CVO maintainers in the case of this commit) include not having to care about where their jobs get scheduled. And with the job names not changing during rebalances, they don't have to remember to
/skip
or/refresh
or whatever, because Prow would understand that it's the same effective test regardless of the underlying platform.Benefits to rebalancing admins is that they don't have to ask component teams to opt-in at rebalance time. Teams can opt in with this pattern, and rebalancing admins can just search for e2e steps that do not include a platform slug in their
as
config name.Generated by manually changing
ci-operator/config/...
and then running:$ make update