Skip flaky tests #2624

perdasilva · 2022-02-14T09:18:01Z

Signed-off-by: perdasilva perdasilva@redhat.com

Description of the change:
This PR updates the e2e test mechanism to skip tests marked with a [FLAKY] tag.

Note

With Ginkgo v2.0 we can actually label tests, which might make this system less brittle. I just don't know what the upfront work required would be to make the upgrade. So, let's use this as a stop gap solution until we migrate and it should be easy enough to then migrate the flaky tests to use labels instead.

Motivation for the change:
Merging PRs is a PITA atm because of flakes. This gives us a way to mark flakes and remove them from the critical path while they get fixed

Reviewer Checklist

Implementation matches the proposed design, or proposal is updated to match implementation
Sufficient unit test coverage
Sufficient end-to-end test coverage
Docs updated or added to /doc
Commit messages sensible and descriptive

openshift-ci · 2022-02-14T09:18:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [perdasilva]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

perdasilva · 2022-02-14T09:18:14Z

/hold

Signed-off-by: perdasilva <perdasilva@redhat.com>

timflannagan

Any value in adding another workflow that's responsible for running the "[FLAKY]" regex so we don't skip the coverage of these tests? The overall CI signal is fairly watered down right now, so offloading them to their own workflow (instead of directly skipping them all together) wouldn't add a ton of value, but at least we wouldn't have to re-enable them, or manually promote a flaky test to a stable one, etc. over time.

That can be done as a follow-up as well.

perdasilva · 2022-02-14T15:51:56Z

Any value in adding another workflow that's responsible for running the "[FLAKY]" regex so we don't skip the coverage of these tests? The overall CI signal is fairly watered down right now, so offloading them to their own workflow (instead of directly skipping them all together) wouldn't add a ton of value, but at least we wouldn't have to re-enable them, or manually promote a flaky test to a stable one, etc. over time.

That can be done as a follow-up as well.

Yup - that's something I've had in mind to do as well. I want to find a way to execute these in isolation one at a time. To further reduce the failure noise while we fix them. I'm just re-running the e2e jobs over and over a few times until I feel the PR is "stable enough" and we have accounted for all (as many as possible) flaky tests and have those with a corresponding ticket.

Signed-off-by: perdasilva <perdasilva@redhat.com>

perdasilva · 2022-02-14T16:03:46Z

Any value in adding another workflow that's responsible for running the "[FLAKY]" regex so we don't skip the coverage of these tests? The overall CI signal is fairly watered down right now, so offloading them to their own workflow (instead of directly skipping them all together) wouldn't add a ton of value, but at least we wouldn't have to re-enable them, or manually promote a flaky test to a stable one, etc. over time.

That can be done as a follow-up as well.

My main worry though is, what is the value of tests that just add confusion? i.e. you don't know if they passed or failed or failed because of change, etc. I'd say it's hard to get away from the disable/fix/enable pattern. If a test is flaky, it will need to be fixed. When it's fixed, just remove [FLAKY] from the name. If it's flaky, add [FLAKY] to the name and remove it from the critical path and create a tracking ticket. Meaning, the toil can be piggyback off the natural PRs.

timflannagan · 2022-02-14T16:08:51Z

Yeah agreed - my main worry with simply skipping these tests as there's no tracking vehicle for fixing them right now, outside of creating individual issues and burning down those over time. Having a dedicated workflow that still gives CI signal, while not blocking a PR from merging as that flaky workflow isn't a required check, might be decent short term balance. I don't have a strong opinion on this approach as long as we commit to tracking these flakes $somewhere so they don't inevitably get swept under the rug.

perdasilva · 2022-02-14T18:00:02Z

Closing and reopening to see if it removes the required flag from the flaky-e2e tests

perdasilva · 2022-02-14T18:03:35Z

re-created as #2625

openshift-ci bot requested review from ankitathomas and ecordell February 14, 2022 09:18

openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 14, 2022

perdasilva force-pushed the mark_flaky_e2e branch 3 times, most recently from d3f1f1d to 84c6430 Compare February 14, 2022 13:39

Skip flaky tests

f30c8f4

Signed-off-by: perdasilva <perdasilva@redhat.com>

perdasilva force-pushed the mark_flaky_e2e branch from 84c6430 to f30c8f4 Compare February 14, 2022 14:03

timflannagan reviewed Feb 14, 2022

View reviewed changes

Add flaky e2e test stage

8b3a093

Signed-off-by: perdasilva <perdasilva@redhat.com>

perdasilva force-pushed the mark_flaky_e2e branch from 9738456 to 8b3a093 Compare February 14, 2022 15:59

perdasilva closed this Feb 14, 2022

timflannagan mentioned this pull request Feb 17, 2022

test/e2e: Refactor the Operator e2e tests to clean up testing resources #2518

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip flaky tests #2624

Skip flaky tests #2624

perdasilva commented Feb 14, 2022

openshift-ci bot commented Feb 14, 2022

perdasilva commented Feb 14, 2022

timflannagan left a comment •

edited

Loading

perdasilva commented Feb 14, 2022

perdasilva commented Feb 14, 2022

timflannagan commented Feb 14, 2022

perdasilva commented Feb 14, 2022

perdasilva commented Feb 14, 2022

Skip flaky tests #2624

Skip flaky tests #2624

Conversation

perdasilva commented Feb 14, 2022

Note

openshift-ci bot commented Feb 14, 2022

perdasilva commented Feb 14, 2022

timflannagan left a comment • edited Loading

Choose a reason for hiding this comment

perdasilva commented Feb 14, 2022

perdasilva commented Feb 14, 2022

timflannagan commented Feb 14, 2022

perdasilva commented Feb 14, 2022

perdasilva commented Feb 14, 2022

timflannagan left a comment •

edited

Loading