Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel CMake out-of-tree build can fail on Ubuntu 16.04 #5223

Closed
gilles-peskine-arm opened this issue Nov 24, 2021 · 2 comments
Closed

Parallel CMake out-of-tree build can fail on Ubuntu 16.04 #5223

gilles-peskine-arm opened this issue Nov 24, 2021 · 2 comments
Labels
bug component-test Test framework and CI scripts

Comments

@gilles-peskine-arm
Copy link
Contributor

gilles-peskine-arm commented Nov 24, 2021

There may be a bug such as a missing dependency in our CMake configuration. Specifically, in certain conditions, a data file generated by tests/scripts/generate_psa_tests.py may be absent at the time generate_test_code.py tries to use it, even though it looks like it was generated immediately before. We've only observed this in parallel builds, so it could be a race condition.

To reproduce:

Or without relying on all.sh:

git checkout bfa273e507ae28a883635c979cb1925cb08db773
rm -rf mbedtls_out_of_source_build
mkdir mbedtls_out_of_source_build
cd mbedtls_out_of_source_build
cmake -D CMAKE_BUILD_TYPE:String=Check ${PWD%/*}
make -j2

Log:

…
[ 74%] Built target test_suite_cipher.chacha20
[ 74%] Generating suites/test_suite_psa_crypto_generate_key.generated.data, suites/test_suite_psa_crypto_not_supported.generated.data, suites/test_suite_psa_crypto_storage_format.current.data, suites/test_suite_psa_crypto_storage_format.v0.data
[ 74%] Linking C executable test_suite_cipher.padding
[ 74%] Built target test_suite_cipher.padding
[ 74%] Generating test_suite_psa_crypto_storage_format.v0.c
Traceback (most recent call last):
  File "/var/lib/build/tests/scripts/generate_test_code.py", line 1142, in <module>
    main()
  File "/var/lib/build/tests/scripts/generate_test_code.py", line 1137, in main
    c_file=out_c_file, out_data_file=out_data_file)
  File "/var/lib/build/tests/scripts/generate_test_code.py", line 1055, in generate_code
    raise IOError("ERROR: %s [%s] not found!" % (name, path))
OSError: ERROR: Data file [/var/lib/build/mbedtls_out_of_source_build/tests/suites/test_suite_psa_crypto_storage_format.v0.data] not found!
tests/CMakeFiles/test_suite_psa_crypto_storage_format.v0.dir/build.make:67: recipe for target 'tests/test_suite_psa_crypto_storage_format.v0.c' failed
make[2]: *** [tests/test_suite_psa_crypto_storage_format.v0.c] Error 1
CMakeFiles/Makefile2:5325: recipe for target 'tests/CMakeFiles/test_suite_psa_crypto_storage_format.v0.dir/all' failed
make[1]: *** [tests/CMakeFiles/test_suite_psa_crypto_storage_format.v0.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 74%] Generating test_suite_psa_crypto_not_supported.generated.c
Scanning dependencies of target test_suite_psa_crypto_not_supported.generated
[ 74%] Building C object tests/CMakeFiles/test_suite_psa_crypto_not_supported.generated.dir/test_suite_psa_crypto_not_supported.generated.c.o
[ 75%] Linking C executable test_suite_psa_crypto_not_supported.generated
[ 75%] Built target test_suite_psa_crypto_not_supported.generated
make: *** [all] Error 2
Makefile:138: recipe for target 'all' failed
^^^^test_cmake_out_of_source: build: cmake 'out-of-source' build: make -> 2^^^^

After some local experimentation, I can reproduce the problem reliably. The following conditions are necessary:

  • The pkcs12 test suite needs to be present. Without it, the build ran fine.
  • It has to be an out-of-source cmake build. In-tree builds are fine.
  • It has to be a parallel build. Non-parallel builds are fine.
  • The make step has to run under Ubuntu 16.04, as opposed to Ubuntu 18.04 or 20.04. A 16.04 chroot under 20.04 reproduces the bug. The cmake step can run under 18.04 or 16.04 (under 20.04, it seems to produce makefiles that use features of cmake that Ubuntu 16.04's cmake doesn't support).
  • Using /chroot/xenial/usr/bin/make under Ubuntu 20.04, the build is fine.
  • Using `PYTHON=/usr/bin/python3.5 (from the Deadsnakes PPA) under Ubuntu 20.04, the build is fine.
  • After the build fails, if I run either make -j2 again or make clean; make -j2, the second build is fine.

Staring at the files generated by CMake (well, mostly tests/CMakeFiles/test_suite_psa_crypto_storage_format.v0.dir/build.make), I don't see anything wrong. I tried observing with loggedfs and saw:

  • Near the beginning of the build, make and cmake check for suites/test_suite_psa_crypto_storage_format.v0.data in the build tree. It doesn't exist. This is as expected.
  • Shortly before the build errors out, Python script creates suites/test_suite_psa_crypto_storage_format.v0.data in the build tree. This is as expected.
  • After that, suites/test_suite_psa_crypto_storage_format.v0.data in the build tree. It exists. This is as expected.
    Unfortunately, my log didn't capture any activity at all from generate_test_code.py. I suspect this has to do with some interaction between chroot (I ran the build in an Ubuntu 16.04 chroot under 20.04) and loggedfs. Since at that point we'd decided we'd spent more than enough time investigating, I didn't investigate further.
@gilles-peskine-arm gilles-peskine-arm added bug wontfix component-test Test framework and CI scripts labels Nov 24, 2021
@gilles-peskine-arm
Copy link
Contributor Author

Given the highly specific requirements for the problem to manifest, we do not intend to fix it. In particular, we've only been able to reproduce the problem with an OS that's out of mainstream support. Unless the problem happens on a more modern OS, we do not intend to fix the bug.

We still have Ubuntu 16.04 on our CI, and the test_cmake_out_of_source component currently runs on that platform. Our workaround will be to immediately switch to running that component on Ubuntu 18.04 instead.

@gilles-peskine-arm
Copy link
Contributor Author

The symptoms are very similar to those reported in #5374 (comment), except that that issue was reported on macOS with a recent CMake, whereas the issue here was observed with an old CMake on Linux. It's possible that it's the same race condition which may or may not manifest in practice depending on a lot of environmental factors. #5429 added dependencies which probably fix the race condition underlying both issues. So I think #5429 can be considered to have fixed #5223 as well as #5374.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component-test Test framework and CI scripts
Projects
None yet
Development

No branches or pull requests

1 participant