Race Condition in parallel_tests integration: an outdated report may be written #815

PragTob · 2020-01-11T13:43:20Z

Discovered/verified in #814

Basically apparently it can happen that we generate the correct report but override it with an outdated one. Discovered running the parallel tests part of the features introduced in #841 (currently tagged wip to avoid flakies, remove the wip tag if you want to test this out). It doesn't happen reliably. Might happen 3 times in a row or not at all for 20 runs (for me at least).

I gathered some examples here: #814 (comment)

What you can see there is something like this:

Coverage report generated for (2/8) to /home/tobi/github/simplecov/tmp/aruba/project/coverage. 23 / 33 LOC (69.7%) covered.
Coverage report generated for (1/8), (2/8), (3/8) to /home/tobi/github/simplecov/tmp/aruba/project/coverage. 37 / 43 LOC (86.05%) covered.
Coverage report generated for (1/8), (2/8), (3/8), (4/8) to /home/tobi/github/simplecov/tmp/aruba/project/coverage. 42 / 47 LOC (89.36%) covered.
Coverage report generated for (1/8), (2/8) to /home/tobi/github/simplecov/tmp/aruba/project/coverage. 32 / 39 LOC (82.05%) covered.

Notice that the second too last has the right number of loc and coverage expected in the test and also seems to take everything into account (unsure why it says n/8 though, guess: I could run 8 in parallel but parallel tests only spawned 4 as there are only 4 test files) but is seemingly later overridden by an outdated report.

If possible we should of course find and fix that race condition, and then remove the wip tags to always run these tests!

The text was updated successfully, but these errors were encountered:

It's WIP because it's flaky at the moment and we don't want our CI to be flaky. See #815

nickshatilo · 2020-02-16T17:11:59Z

+1 here

benissimo · 2020-02-19T17:49:52Z

Does bumping up merge_timeout mitigate this? It looks like the number of tests involved is small enough that the default timeout of 10 minutes should be more than enough. Just curious whether that's a factor.

I ran into the following case with some long running test suites:

[osm.jruby] Using 'RSpecdevn3' as SimpleCov command name
[osm.jruby] Using 'RSpecdevn1' as SimpleCov command name
[osm.jruby] Using 'RSpecdevn4' as SimpleCov command name
[osm.jruby] Using 'RSpecdevn2' as SimpleCov command name
[osm.jruby] Coverage report generated for RSpecdevn3 to rails/coverage. 44377 / 168731 LOC (26.3%) covered.
[osm.jruby] Coverage report generated for RSpecdevn2, RSpecdevn3 to rails/coverage. 69229 / 156447 LOC (44.25%) covered.
[osm.jruby] Coverage report generated for RSpecdevn1, RSpecdevn2 to rails/coverage. 73551 / 150657 LOC (48.82%) covered.
[osm.jruby] Coverage report generated for RSpecdevn1, RSpecdevn2, RSpecdevn4 to ...rails/coverage. 86913 / 144881 LOC (59.99%) covered.

and the fix, in my case, was to bump up merge_timeout.

That might be a separate issue with similar symptoms. Mentioning it here in case others googling for this fall into the same case.

PragTob · 2020-02-19T18:27:25Z

@benissimo similar symptoms, separate issue (more user configuration)

In the local test case the tests finish within ~2 seconds so it's way beyond being impacted by the merge timeout.

PragTob · 2020-08-11T15:06:02Z

This might be related to us not doing correct checking which I ask/discuss in grosser/parallel_tests#772 - after that is resolved and changes are implemented this should be worth re checking.

PragTob · 2020-08-16T18:18:46Z

It seems to be better now but not fixed. CI now occasionally fails with:

(::) failed steps (::)

Expected `bundle exec parallel_rspec spec` to succeed but got non-zero exit status and the following output:

2 processes for 4 specs, ~ 2 specs per process
...

Finished in 0.02936 seconds (files took 0.67039 seconds to load)
3 examples, 0 failures

...

Finished in 0.02201 seconds (files took 0.6368 seconds to load)
3 examples, 0 failures

Coverage report generated for (2/2) to /home/runner/work/simplecov/simplecov/tmp/aruba/project/coverage. 28 / 37 LOC (75.68%) covered.

which funnily enough happens on JRuby the 2-3 times I've seen it and only that spec. Needs a fix soon-ish or a wip on the cuke as not to be annoying/make people think they did anything wrong.

PragTob added Bug parallel_tests labels Jan 11, 2020

PragTob added a commit that referenced this issue Jan 11, 2020

Tag correctly as WIP and branch coverage

3e2336a

It's WIP because it's flaky at the moment and we don't want our CI to be flaky. See #815

PragTob added a commit that referenced this issue Jan 12, 2020

Tag correctly as WIP and branch coverage

360f2f9

It's WIP because it's flaky at the moment and we don't want our CI to be flaky. See #815

PragTob mentioned this issue Jan 12, 2020

Use latest aruba #823

Merged

PragTob mentioned this issue Jun 23, 2020

SimpleCov showing incorrect coverage result when parallelize is enabled in Rails 6.0.0 beta3 #718

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race Condition in parallel_tests integration: an outdated report may be written #815

Race Condition in parallel_tests integration: an outdated report may be written #815

PragTob commented Jan 11, 2020 •

edited

Loading

nickshatilo commented Feb 16, 2020

benissimo commented Feb 19, 2020 •

edited

Loading

PragTob commented Feb 19, 2020

PragTob commented Aug 11, 2020

PragTob commented Aug 16, 2020

Race Condition in parallel_tests integration: an outdated report may be written #815

Race Condition in parallel_tests integration: an outdated report may be written #815

Comments

PragTob commented Jan 11, 2020 • edited Loading

nickshatilo commented Feb 16, 2020

benissimo commented Feb 19, 2020 • edited Loading

PragTob commented Feb 19, 2020

PragTob commented Aug 11, 2020

PragTob commented Aug 16, 2020

PragTob commented Jan 11, 2020 •

edited

Loading

benissimo commented Feb 19, 2020 •

edited

Loading