Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests that run ElmerSolver multiple times. #593

Merged
merged 1 commit into from
Oct 14, 2024
Merged

Conversation

mmuetzel
Copy link
Contributor

@mmuetzel mmuetzel commented Oct 14, 2024

Use CMake macro EXECUTE_ELMER_SOLVER that sets up necessary environment variables before executing the ElmerSolver binary.

This fixes the test mgdyn_steady_quad_extruded_restart that is currently failing in CI on Windows MinGW.
The other tests would have failed only for higher numbers of parallel processes than what is available on the free GitHub hosted runners.

Use CMake macro EXECUTE_ELMER_SOLVER that sets up necessary environment
variables before executing the ElmerSolver binary.
@raback
Copy link
Contributor

raback commented Oct 14, 2024

Great fix!

@raback raback merged commit 4cc5d87 into ElmerCSC:devel Oct 14, 2024
6 of 10 checks passed
@mmuetzel
Copy link
Contributor Author

mmuetzel commented Oct 15, 2024

Thank you for accepting the change.

That still leaves one test (FrictionHeatMasked) that is failing consistently on the Windows MinGW runner. The logs contain the following just before the error:

  ComputeChange: SS (ITER=1) (NRM,RELC): ( 0.93676408E-01  2.0000000     ) :: stresssolver
   DummySolver
   ***********************************
  ComputeChange: SS (ITER=1) (NRM,RELC): (       Infinity            NaN ) :: dummyroutine
  ComputeWeight: All Done
  UpdatePartitionWeight: All Done
  ForceToStress: Starting assembly...
  SetZeroAtPeriodicNodes: All Done
  ForceToStress: Assembly done
  ForceToStress: Set boundaries done
  ERROR:: ComputeChange: Numerical Error: Norm of solution appears to be NaN

I can't reproduce that error locally. The test passes for me locally on Windows MinGW. NRM and RELC are 0.0 after the DummySolver here (the same on Ubuntu and on Windows). valgrind doesn't report anything suspicious when running that test on Ubuntu.
The strange thing is that the very similar test FrictionHeat is passing in the CI.

Do you have an idea what could be causing that test error in the CI? Does the dummy routine do anything different comparing FrictionHeat and FrictionHeatMasked? Does the dummy solver invoke any function from a linear algebra library that might be optimized for a specific processor architecture (like OpenBLAS) or anything different that might be hardware dependent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants