Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple test failures when running tests -j12 #432

Open
mgorny opened this issue Dec 28, 2022 · 8 comments
Open

Multiple test failures when running tests -j12 #432

mgorny opened this issue Dec 28, 2022 · 8 comments

Comments

@mgorny
Copy link
Contributor

mgorny commented Dec 28, 2022

Describe the bug
When I'm running the test suite with ctest -j12 (i.e. 12 parallel jobs), I'm getting 2-3 different test failures in a run. Over a few runs, the following tests failed:

	289 - test_fill_special (Failed)
	291 - test_frame_get_offsets (SEGFAULT)
	706 - test_schunk_frame (Failed)
	707 - test_schunk_header (Failed)
	709 - test_sframe (Failed)
	710 - test_sframe_lazychunk (Failed)

Segfaults are especially concerning.

To Reproduce

mkdir build
cd build
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr -DBUILD_STATIC=OFF -DBUILD_TESTS=yes -DBUILD_BENCHMARKS=OFF -DBUILD_EXAMPLES=OFF -DBUILD_FUZZERS=OFF -DDEACTIVATE_ZLIB=no -DDEACTIVATE_ZSTD=no -DPREFER_EXTERNAL_LZ4=ON -DPREFER_EXTERNAL_ZLIB=ON -DPREFER_EXTERNAL_ZSTD=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
ninja
ctest -j12

Expected behavior
Tests should pass when run in parallel.

Logs
LastTest.log from the last run: LastTest.log

System information:

  • OS: Gentoo Linux amd64
  • Compiler: gcc 12.2.1
  • Version: 2.6.1
@DimitriPapadopoulos
Copy link
Contributor

I am able to reproduce segfaults even with a mere ctest, without -j12:

$ ctest
Test project /my/path/c-blosc2/build
[...]
          Start 1736: b2nd_example_serialize
1736/1736 Test #1736: b2nd_example_serialize ....................................   Passed    0.00 sec

99% tests passed, 1 tests failed out of 1736

Label Time Summary:
b2nd    =   0.50 sec*proc (8 tests)

Total Test time (real) =  53.04 sec

The following tests FAILED:
	1703 - test_lz4_bitshuffle_n (SEGFAULT)
Errors while running CTest
Output from these tests are in: /my/path/c-blosc2/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
$ 
$ ctest --rerun-failed --output-on-failure
Test project /my/path/c-blosc2/build
    Start 1703: test_lz4_bitshuffle_n
1/1 Test #1703: test_lz4_bitshuffle_n ............   Passed    0.41 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.45 sec
$ 

As you can see, in my case, errors seem to differ between ctest runs. Do tests fail consistently for you, or “randomly” as in my case?

  • OS : Ubuntu 22.04
  • Compiler : GCC 11.3.0
  • Version : main branch

@FrancescAlted
Copy link
Member

Today we have fixed something that may have created this: ca9d7c6

Could you give it another go?

@mgorny
Copy link
Contributor Author

mgorny commented Feb 16, 2023

I can still reproduce.

@FrancescAlted
Copy link
Member

Sorry, I was not explicit enough; I meant without parallelism (just ctest). For ctest -j12 this should require more work (although it is not a high priority).

@DimitriPapadopoulos
Copy link
Contributor

I do not see segfaults without -j12 any more – but in that case segfaults were sporadic.

@bnavigator
Copy link
Contributor

Still an issue with 2.7.1 and -j$N with N>1

@keszybz
Copy link
Contributor

keszybz commented May 13, 2023

I'm seeing this too, c51d050 and v2.9.1. Most of the time there are test failures, but occasionally segfualts. I didn't capture a coredump yet.

The following tests FAILED:
302 - test_copy (Failed)
311 - test_frame_offset (Failed)
726 - test_schunk_header (Failed)
1722 - test_example_frame_offset (Failed)

The following tests FAILED:
302 - test_copy (Failed)
308 - test_fill_special (Failed)
310 - test_frame_get_offsets (Failed)
311 - test_frame_offset (Failed)
1315 - test_example_frame_simple (Failed)

The following tests FAILED:
11 - test_b2nd_copy (Failed)
302 - test_copy (Failed)

The failure rate is 100% (i.e. at least one) on multiple machines.

@DimitriPapadopoulos
Copy link
Contributor

Tests could be modified to be run in a debugger. To get GDB to automatically print a backtrace in case of a crash:

gdb --batch --ex run --ex bt --args ./myprogram "$@" > gdb-backtrace.txt 2>&1

The above runs GDB in batch mode (--batch) and tells it to run the program (--ex run) and print a backtrace (--ex bt) if it crashes. The output is redirected to a file called gdb-backtrace.txt.

That said:

  • How to get ctest to run tests in the debugger as suggested above?
  • Tests running in the debugger might not crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants