Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci also runs unit tests with -race enabled, but conditionally skips flaky tests #2705

Merged
merged 20 commits into from
Mar 15, 2022

Conversation

zmt
Copy link
Contributor

@zmt zmt commented Jan 29, 2022

Pull Request check list

  • Commit conforms to CONTRIBUTING.md
  • Proper tests/regressions included
  • Documentation updated

Affected functionality
Enable running the unit tests with race detector in CI/CD for PRs and releases as an additional github workflow.

Often fixing racy tests can be difficult and time-consuming. Some tests can hit negative interaction with the additional runtime overhead of the race detector, which can be impossible to resolve. To facilitate always running the majority of tests under the race detector:

  • add SkipFlakyTestUnderRaceConditionWithIssueFiled which does the following:
    • matches arg against SPIRE issue URL regexp or fails test
    • skips tests only when SKIP_FLAKY_TESTS_UNDER_RACE_DETECTOR is in the environment
  • add ci-race-test target in Makefile to set SKIP_FLAKY_TESTS_UNDER_RACE_DETECTOR and -race
  • add run_unit_tests_under_race_detector.sh and associated workflow configuration to run ci-race-test

Description of change
Add github workflow to run tests with race detector on PRs and releases. Add utility to skip specific tests during CI/CD with the race detector and associate the skip with a github issue.

Which issue this PR fixes
Fixes #2379.

@azdagron
Copy link
Member

I'm a little hesitant to introduce a mechanism to skip racy tests (or any tests, for that matter):

  1. Skipping tests is something easily done and then forgotten. Skipped tests are worthless tests.
  2. CI/CD will ideally only be running the tests with -race, so skipped tests equate to zero coverage.
  3. With the ability to skip, the pressure to fix the racy test is severely diminished.

I think we'd get better results long term if we direct efforts at fixing the racy tests and enable -race everywhere.

Curious how the other maintainers feel about it though.

@zmt zmt marked this pull request as draft January 31, 2022 19:45
@zmt
Copy link
Contributor Author

zmt commented Jan 31, 2022

  • reproducing flakiness of skipped tests to file detailed issues
  • adding comments referencing filed issues for all tests skipped under race detector
  • adding a comment in the SkipFlakyTest instructing to file issue and reference in comment

@zmt
Copy link
Contributor Author

zmt commented Jan 31, 2022

I'm a little hesitant to introduce a mechanism to skip racy tests (or any tests, for that matter):

  1. Skipping tests is something easily done and then forgotten. Skipped tests are worthless tests.

100% agree in principle.

  1. CI/CD will ideally only be running the tests with -race, so skipped tests equate to zero coverage.

I don't agree with this assertion per se. If we run the full suite without -race and only conditionally skip those that are known to be problematic under -race in the run with that enabled, that's better than waiting until we can resolve difficult timing issues in test objects which may or may not represent actual production runtime problems. The proposal in this PR is to do just that:

  • run full suite: make test
  • run almost full suite (less known issues run under -race): make race-test
  1. With the ability to skip, the pressure to fix the racy test is severely diminished.

If we don't run any unit tests in CI/CD with -race at all, we don't stand a chance of catching newly introduced problems. I am in the process of filing issues for each of the skips in the PR. I was thinking about adding a validator to the skip utility to basically require an argument that looks like a filed issue or fail the test instead to help mitigate this risk, but wasn't sure if that would be too draconian.

I think we'd get better results long term if we direct efforts at fixing the racy tests and enable -race everywhere.

This might not be possible due to the differences between test binaries built with -race and those without.

Curious how the other maintainers feel about it though.

@zmt zmt changed the title ci also runs unit tests with -race enabled, but skips flaky tests ci also runs unit tests with -race enabled, but conditionally skips flaky tests Jan 31, 2022
@zmt zmt force-pushed the issue/2379 branch 2 times, most recently from 90ceaea to 1f66324 Compare January 31, 2022 23:03
@azdagron azdagron self-assigned this Feb 3, 2022
Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @zmt. We discussed this in our maintainer sync and the consensus was that this was a fine short-term fix while we get all the racy tests sorted and that we could remove it once that was accomplished.

Just a few comments :)

test/util/race.go Outdated Show resolved Hide resolved
test/util/race.go Outdated Show resolved Hide resolved
.github/workflows/scripts/run_unit_tests.sh Outdated Show resolved Hide resolved
.github/workflows/scripts/run_unit_tests.sh Outdated Show resolved Hide resolved
@evan2645 evan2645 added this to the 1.2.1 milestone Feb 8, 2022
@rturner3
Copy link
Collaborator

Hey @zmt, just checking in since there hasn't been movement on this PR in the last couple weeks. Do you think you'll have time to address the comments in the next week or two? If not, I would request we close the PR for now and reopen when this is ready for review.

@azdagron azdagron modified the milestones: 1.2.1, 1.2.2 Mar 3, 2022
zmt added 5 commits March 10, 2022 22:49
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
@zmt
Copy link
Contributor Author

zmt commented Mar 10, 2022

Hey @zmt, just checking in since there hasn't been movement on this PR in the last couple weeks. Do you think you'll have time to address the comments in the next week or two? If not, I would request we close the PR for now and reopen when this is ready for review.

I didn't see this until I was already gearing up to start on it again. I thought leaving it in draft status would be OK. I've picked it back up now and hope to make quick work of it. I will be leaving it in draft mode until ready.

Signed-off-by: Zack Train <ztrain@uber.com>
zmt added 3 commits March 11, 2022 02:19
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
zmt added 3 commits March 11, 2022 03:41
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
zmt added 2 commits March 11, 2022 07:00
Signed-off-by: Zack Train <ztrain@uber.com>
Signed-off-by: Zack Train <ztrain@uber.com>
.github/workflows/pr_build.yaml Outdated Show resolved Hide resolved
@@ -345,7 +345,7 @@ func TestConfigureWindows(t *testing.T) {
}
}

func TestAidAttestationFailiures(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little noise, but this test had been flaky when I started and the typo worked against me, so I decided to leave it a little better than I found it.

@zmt zmt marked this pull request as ready for review March 11, 2022 07:27
Comment on lines +55 to +57
msg := "Skip only allowed with associated issue. "
msg += "%q does not appear to be an issue. "
msg += "File an issue and specify it to skip a test under race detector."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could use some additional wordsmithing...

Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I like the enforcement that the skipped test be justified by providing an issue we track. That's a nice touch. Just one small comment on the name of the CI job and I think we're off to the races!

.github/workflows/pr_build.yaml Outdated Show resolved Hide resolved
Co-authored-by: Andrew Harding <azdagron@gmail.com>
Signed-off-by: Zack Train <ztrain@uber.com>
@zmt
Copy link
Contributor Author

zmt commented Mar 11, 2022

Apparently we have a flaky test even without the race detector which failed on macos :-(

Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @zmt!

@azdagron azdagron merged commit 1b972f0 into spiffe:main Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set -race flag in CI unit test runs
4 participants