-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable FPE traps in CI #611
Conversation
/azp run |
Azure Pipelines successfully started running 5 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 5 pipeline(s). |
…/quokka into BenWibking/enable-fpe-in-ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine, but there is a design question we should consider. Producing a NaN is always an error, but producing underflows or overflows may not be. I'm thinking in particular of chemistry, where we have a lot of rate coefficients that take the form rate ~ exp(-T_activation / T_gas), where T_activation is the activation temperature for some reaction. At low T_gas this will underflow and give a rate coefficient of exactly zero, but that is the desired behavior. Similarly, for multigroup radiation we may well have groups where the exponential term in the Planck function underflows to zero in low-temperature parts of the simulation domain, and that is again the desired behavior. Thus I'm worried that a FPE that triggers on underflow or overflow may be too sensitive. Thoughts on this issue?
By design, it should only trap on overflow, not underflow. However, I think the underlying compiler option this turns on is not actually equivalent to the runtime behavior that is enabled with |
|
May need to turn on |
This is currently failing because of a division by zero in matplotlib:
|
FPEs do not seem to work on Linux ARM64. |
…/quokka into BenWibking/enable-fpe-in-ci
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
These GPU tests fail:
|
This reverts commit 382978b.
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Last remaining test failures on avatargpu:
It appears to fail when reading the turbulence driving field:
It's caused by an FPE inside the HDF5 library :/ |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
@markkrumholz Can you review this PR? |
Description
Enables signal handling and floating-point exception (FPE) traps for NANs when running the test suite, so it will stop and report an error when NANs are encountered. (This only works on CPUs.)
In order for this to work, we change the signal handling settings back to AMReX defaults. We also have to suppress FPEs when importing
numpy
(see numpy/numpy#20504).By default, many compilers perform optimizations that assume that floating-point exceptions are disabled, and will often produce spurious FPEs for vectorized code (however, this does not affect the results in any way). There are compiler-specific options to disable this behavior for Intel and Clang.
Note that
AMReX_FPE=ON
only sets the compiler option-ftrapv
(or similar). We instead add the runtime optionsamrex.fpe_trap_invalid=1
,amrex.fpe_trap_zero=1
, andamrex.fpe_trap_overflow=1
to the command-line arguments for each test. These options can also be added to the input file (or command line) to debug individual problems.Depends on:
Related issues
Partially resolves #556.
Checklist
Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an
x
inside the square brackets[ ]
in the Markdown source below:/azp run
.