Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try "ctest --rerun-failed" for handling flaky tests #2204

Open
derekbruening opened this issue Feb 20, 2017 · 29 comments · Fixed by #6075
Open

try "ctest --rerun-failed" for handling flaky tests #2204

derekbruening opened this issue Feb 20, 2017 · 29 comments · Fixed by #6075

Comments

@derekbruening
Copy link
Contributor

For Travis #1962 and AppVeyor #2145 we may want to try adding --rerun-failed to ctest.

Will our output parsing handle it?

Is this new? Which version of ctest added it? Can we rely on it being there? I don't remember seeing it when I first started using ctest.

@derekbruening
Copy link
Contributor Author

OK this is not what I thought it was. Xref https://gitlab.kitware.com/cmake/cmake/issues/16314
What we really want is the "--retry-fail-until-pass N" being proposed which is not yet in the main branch.

@derekbruening
Copy link
Contributor Author

Xref #2984.

@derekbruening
Copy link
Contributor Author

We're using ctest_test(). Looks like in 3.17 it has support for retrying failed tests:

https://cmake.org/cmake/help/latest/command/ctest_test.html

REPEAT UNTIL_PASS:3 would retry up to 3 times.

This may make greatly improve our problems with flaky tests on some platforms.

@derekbruening derekbruening self-assigned this May 20, 2023
@derekbruening
Copy link
Contributor Author

I will try this out. We're currently requiring cmake 3.7. We should investigate the cmake versions on the GA CI VM's. Looks like the Ubuntu20 image has cmake 3.26: https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2004-Readme.md. We don't use Ubuntu 18 anymore but it looks like it has 3.10 so requiring 3.17 could affect some users.

@derekbruening
Copy link
Contributor Author

The GA CI VS2019 image also has 3.26 https://github.com/actions/runner-images/blob/main/images/win/Windows2019-Readme.md

We could conditionally enable this feature if 3.17+ is detected, instead of mandating a new min version.

derekbruening added a commit that referenced this issue May 20, 2023
If cmake 3.17+ is in use, enables retrying of failed tests up to 3x
with any one of them passing resulting in an overall pass.  This
avoids flaky tests marking the whole suite red, which is even more
problematic with merge queues.

Tested:
I made a test which fails 3/4 of the time:
  --------------------------------------------------
  add_test(bogus bash -c "exit \$((RANDOM % 4))")
  --------------------------------------------------
I made it easy to run just this test (gave it a label; disabled other
builds, etc.) for convenience and then ran:
  --------------------------------------------------
  $ ctest -VV -S ../src/suite/runsuite.cmake,64_only
  --------------------------------------------------
Which resulted in:
  --------------------------------------------------
  test 1
      Start 1: bogus

  1: Test command: /usr/bin/bash "-c" "exit $((RANDOM % 4))"
  1: Working Directory: /usr/local/google/home/bruening/dr/git/build_suite/build_debug-internal-64
  1: Test timeout computed to be: 600
  1/1 Test #1: bogus ............................***Failed    0.00 sec
      Start 1: bogus

  1: Test command: /usr/bin/bash "-c" "exit $((RANDOM % 4))"
  1: Working Directory: /usr/local/google/home/bruening/dr/git/build_suite/build_debug-internal-64
  1: Test timeout computed to be: 600
      Test #1: bogus ............................   Passed    0.00 sec

  100% tests passed, 0 tests failed out of 1
  --------------------------------------------------

Issue: #2204, #5873
Fixes #2204
@derekbruening
Copy link
Contributor Author

Re-opening for two purposes:

  1. We still have some tests that fail 3x in a row as noted here for scattergather ASSERT tool.drcachesim.scattergather test: tracer.cpp:394: towrite <= ipc_pipe.get_atomic_write_size() && towrite > 0 #5329: Adopt Github merge queues #5873 (comment). We should track these and take some action on them (fix; add to auto-ignore list).
  2. While it is nice to have the commits and PR pages look greener, we would like to notice when new tests start failing intermittently. One idea is to have the post-merge Action have some scripting to identify failures that later passed and send an email (even if the final result is a green check).

@derekbruening
Copy link
Contributor Author

scattergather once again failed 3x in a row for x86-32 with the pipe assert #5329: https://github.com/DynamoRIO/dynamorio/actions/runs/5063129492/jobs/9089390892?pr=6079

@derekbruening
Copy link
Contributor Author

scattergather once again failed 3x in a row for x86-32 with the pipe assert #5329 for the merge to master of #6076. So it is failing 3x in a row on a regular basis.

@derekbruening
Copy link
Contributor Author

PR #6069's merge to master failed for windows 32-bit https://github.com/DynamoRIO/dynamorio/actions/runs/5070388840/jobs/9105354355 but it's not clear why: the run seems truncated. Did it hit some 45-min time limit? I put a 40-min limit on the merge queue but this is not the merge queue. The Github limit is 6 hours.

@derekbruening
Copy link
Contributor Author

Our hosted aarch64 runner was on cmake 3.16.3. I've upgraded it to 3.26.4.

@derekbruening
Copy link
Contributor Author

#6081 failed 3x in a row on win32

@derekbruening
Copy link
Contributor Author

tool.drcachesim.threads-with-config-file failed 3x in a row:
https://github.com/DynamoRIO/dynamorio/actions/runs/5075310358/jobs/9116397324?pr=6080
First it hit the type_is_instr assert which is #3320, followed by 2 90s timeouts: xref #4954 but that was release-build only and supposedly fixed?

@derekbruening
Copy link
Contributor Author

For the tool.drcachesim.threads-with-config-file: probably it was the pipe file left behind which caused the 2 subsequent hangs! So 3x in a row doesn't help with online drcachesim where it has no mechanism to clean up the pipe file. Should we try to add some cleanup to the suite at least if we can't to the drcachesim app, launcher, or client?

@derekbruening
Copy link
Contributor Author

win32 tool.drcacheoff.gencode failed 3x in a row with a timeout each time: https://github.com/DynamoRIO/dynamorio/actions/runs/5217398902/jobs/9417192797

@ksco
Copy link
Contributor

ksco commented Jun 9, 2023

win32 tool.drcacheoff.gencode and tool.drcacheoff.burst_replace failed due to timeout again: https://github.com/DynamoRIO/dynamorio/actions/runs/5217720624/jobs/9417832232#step:6:19819

@derekbruening
Copy link
Contributor Author

These new timeouts are filed as #6131

@ksco
Copy link
Contributor

ksco commented Jun 9, 2023

I've tried multi times, these timeouts seem pretty "stable"..

@derekbruening
Copy link
Contributor Author

I've tried multi times, these timeouts seem pretty "stable"..

Please move discussion to the dedicated issue #6131

@derekbruening
Copy link
Contributor Author

drcachesim.coherence failed 3x in a row for x86-32 on a master merge at https://github.com/DynamoRIO/dynamorio/actions/runs/5258228536/jobs/9502167966: the first time was the type_is_instr assert #3320 but the 2nd two were timeouts (150s -- the test has a custom 150s timeout; though the prior pass was just 4s).

@bete0
Copy link
Contributor

bete0 commented Jun 15, 2023

release-external-64 failed to build once in vs2019-builds due to Error copying file "/.../dynamorio/build_release-external-64/lib64/release/../drconfiglib.dll" to "/.../dynamorio/build_release-external-64/suite/tests/bin/drconfiglib.dll".: https://github.com/DynamoRIO/dynamorio/actions/runs/5279972848/jobs/9551416845#step:6:1914.

@derekbruening
Copy link
Contributor Author

release-external-64 failed to build once in vs2019-builds due to Error copying file "/.../dynamorio/build_release-external-64/lib64/release/../drconfiglib.dll" to "/.../dynamorio/build_release-external-64/suite/tests/bin/drconfiglib.dll".: https://github.com/DynamoRIO/dynamorio/actions/runs/5279972848/jobs/9551416845#step:6:1914.

That's the build race #5888 which does not get a 3x retry.

@derekbruening
Copy link
Contributor Author

max-global hit its 240s timeout 3x in a row:
https://github.com/DynamoRIO/dynamorio/actions/runs/5341125248/jobs/9681644494

272/317 Test #265: code_api|tool.drcacheoff.max-global ..........................***Timeout 241.44 sec

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

At https://github.com/DynamoRIO/dynamorio/actions/runs/5513552608/jobs/10051833149 we have a first failure #3320 assert in a drcachesim online tracing test:

295: Invalid trace entry type thread_exit (23) before a bundle
295: drcachesim: /home/runner/work/dynamorio/dynamorio/clients/drcachesim/reader/reader.cpp:215: virtual bool dynamorio::drmemtrace::reader_t::process_input_entry(): Assertion `type_is_instr(cur_ref_.instr.type) || cur_ref_.instr.type == TRACE_TYPE_INSTR_NO_FETCH' failed.
295/406 Test #295: code_api|tool.drcachesim.threads-with-config-file ................***Failed  Require

The next 2 retries both hit 90s timeouts -- presumably because of a stale pipe file! If we can't find any control point to clean up a pipe file from DR itself maybe we should add one to the online tests through some wrapper like a runmulti precmd or postcmd.

@derekbruening
Copy link
Contributor Author

@ksco
Copy link
Contributor

ksco commented Jul 17, 2023

@ksco
Copy link
Contributor

ksco commented Jul 17, 2023

x86-64 code_api|tool.drcacheoff.invariant_checker failed 3 times in a row: https://github.com/DynamoRIO/dynamorio/actions/runs/5573200427/jobs/10180157101?pr=6209#step:7:26203

#6212

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

We're now seeing win32 invariant checker failing only on the merge queue with a message "Syscall marker not preceded by timestamp": but the pre-merge runs have all been green. We may disable the merge queue at least temporarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants