Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of regression test setup with race condition #1245

Open
deslaughter opened this issue Sep 7, 2022 · 4 comments
Open

Failure of regression test setup with race condition #1245

deslaughter opened this issue Sep 7, 2022 · 4 comments

Comments

@deslaughter
Copy link
Collaborator

Bug description
Regression tests occasionally fail due to a race condition when copying shared test files to the build directory.

To Reproduce
This failure is difficult to reproduce as it's dependent on the timing of running the tests. The most likely way to reproduce is to run all the regression tests manually with CTest (currently around 80 tests) with as a large number of parallel threads, greater than 4 (flag --parallel).

Expected behavior
Regression tests should not fail when copying files.

OpenFAST Version
The failure was more likely to happen before commit 0f8237e was merged but is still possible.

Additional context
The race condition is discussed in PR #1199 and PR #1244, but not resolved. The recommended solution is to move the copying of shared test files from the individual test scripts to a main script which runs before the tests are executed.

@andrew-platt andrew-platt changed the title Bug report Failure of regression test setup with race condition Sep 7, 2022
@andrew-platt
Copy link
Collaborator

I can confirm that I still have sporadic failures of regression testing locally with CTest. What I observed is the 5MW_Land_DLL_WTurb_py case failed promptly when it was the first that CTest ran with the ctest -j12 command (all 81 cases in parralel batches of 12). This case then passes if I rerun it. I'm not sure if it is due to a race condition in the file copying, or a different ordering.

The 5MW_Land_DLL_WTurb_py case uses the input files from 5MW_Land_DLL_WTurb to run. Perhaps these were not first copied.

@andrew-platt
Copy link
Collaborator

See discussion #731 for additional ideas on what we can improve. Fixing this should be done with an overall improvement of the automated testing process.

@bjonkman
Copy link
Contributor

bjonkman commented Sep 7, 2022

Perhaps these were not first copied.

@andrew-platt , I think you are correct. I noticed those files weren't copied in the script to run the non-python HD driver of those test cases, so I added a copy of those files in #1222. (here: https://github.com/OpenFAST/openfast/pull/1222/files#diff-c1483b46f685cead3f87872ced29a4e253120dea6e7761ad19690719f30f966f)

I didn't modify the script for the python driver version, though, so if it happens to run before the glue code tests or the HD Fortran driver, the files won't be there.

@andrew-platt
Copy link
Collaborator

@bjonkman, I think you are right. On GitHub actions we might just be getting lucky that the build for rtest-interfaces takes longer than the build for the regression tests. With a local build and test, all the builds with make are completed before running ctest.

Screen Shot 2022-09-07 at 12 07 26 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants