enable FPE traps in CI #611

BenWibking · 2024-04-17T16:54:09Z

Description

Enables signal handling and floating-point exception (FPE) traps for NANs when running the test suite, so it will stop and report an error when NANs are encountered. (This only works on CPUs.)

In order for this to work, we change the signal handling settings back to AMReX defaults. We also have to suppress FPEs when importing numpy (see numpy/numpy#20504).

By default, many compilers perform optimizations that assume that floating-point exceptions are disabled, and will often produce spurious FPEs for vectorized code (however, this does not affect the results in any way). There are compiler-specific options to disable this behavior for Intel and Clang.

Note that AMReX_FPE=ON only sets the compiler option -ftrapv (or similar). We instead add the runtime options amrex.fpe_trap_invalid=1, amrex.fpe_trap_zero=1, and amrex.fpe_trap_overflow=1 to the command-line arguments for each test. These options can also be added to the input file (or command line) to debug individual problems.

Depends on:

Related issues

Partially resolves #556.

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

I have added a description (see above).
I have added a link to any related issues see (see above).
I have read the Contributing Guide.
I have added tests for any new physics that this PR adds to the code.
I have tested this PR on my local computer and all tests pass.
I have manually triggered the GPU tests with the magic comment /azp run.
I have requested a reviewer for this PR.

BenWibking · 2024-04-17T16:54:25Z

/azp run

azure-pipelines · 2024-04-17T16:54:44Z

Azure Pipelines successfully started running 5 pipeline(s).

psharda · 2024-04-29T11:03:48Z

/azp run

azure-pipelines · 2024-04-29T11:04:06Z

Azure Pipelines successfully started running 5 pipeline(s).

…/quokka into BenWibking/enable-fpe-in-ci

markkrumholz

Code looks fine, but there is a design question we should consider. Producing a NaN is always an error, but producing underflows or overflows may not be. I'm thinking in particular of chemistry, where we have a lot of rate coefficients that take the form rate ~ exp(-T_activation / T_gas), where T_activation is the activation temperature for some reaction. At low T_gas this will underflow and give a rate coefficient of exactly zero, but that is the desired behavior. Similarly, for multigroup radiation we may well have groups where the exponential term in the Planck function underflows to zero in low-temperature parts of the simulation domain, and that is again the desired behavior. Thus I'm worried that a FPE that triggers on underflow or overflow may be too sensitive. Thoughts on this issue?

BenWibking · 2024-07-02T13:02:00Z

Code looks fine, but there is a design question we should consider. Producing a NaN is always an error, but producing underflows or overflows may not be. I'm thinking in particular of chemistry, where we have a lot of rate coefficients that take the form rate ~ exp(-T_activation / T_gas), where T_activation is the activation temperature for some reaction. At low T_gas this will underflow and give a rate coefficient of exactly zero, but that is the desired behavior. Similarly, for multigroup radiation we may well have groups where the exponential term in the Planck function underflows to zero in low-temperature parts of the simulation domain, and that is again the desired behavior. Thus I'm worried that a FPE that triggers on underflow or overflow may be too sensitive. Thoughts on this issue?

By design, it should only trap on overflow, not underflow.

However, I think the underlying compiler option this turns on is not actually equivalent to the runtime behavior that is enabled with amrex.fpe_trap_invalid, amrex.fpe_trap_overflow and amrex.fpe_trap_zero (division-by-zero), so I need to re-work this PR and test it explicitly.

BenWibking · 2024-07-02T16:33:48Z

~~The Intel compiler produces spurious FPEs for optimized code, will have to disable for that.~~

Ref: https://stackoverflow.com/questions/52644343/replacing-01-with-02-intel-compiler-option-leads-to-throwing-fpe-at-small/52652657#52652657

BenWibking · 2024-07-02T16:48:17Z

May need to turn on -ffp-exception-behavior=maytrap with Clang compilers (i.e., for Intel and macOS tests): https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffp-exception-behavior

BenWibking · 2024-07-02T18:22:28Z

This is currently failing because of a division by zero in matplotlib:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e611800)
  * frame #0: 0x0000000107b0636c _multiarray_umath.cpython-312-darwin.so`longdouble_true_divide + 364
    frame #1: 0x00000001018b1aac Python`binary_op1 + 88
    frame #2: 0x00000001018b1e9c Python`PyNumber_TrueDivide + 32
    frame #3: 0x00000001019c3988 Python`_PyEval_EvalFrameDefault + 53224
    frame #4: 0x00000001019b6730 Python`PyEval_EvalCode + 184
    frame #5: 0x00000001019b2fe4 Python`builtin_exec + 416
    frame #6: 0x000000010191e4b8 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 92
    frame #7: 0x00000001019c2f20 Python`_PyEval_EvalFrameDefault + 50560
    frame #8: 0x00000001018cf430 Python`object_vacall + 212
    frame #9: 0x00000001018cf310 Python`PyObject_CallMethodObjArgs + 104
    frame #10: 0x00000001019f5fc0 Python`PyImport_ImportModuleLevelObject + 996
    frame #11: 0x00000001019b1940 Python`builtin___import__ + 172
    frame #12: 0x000000010191e4b8 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 92
    frame #13: 0x00000001019c2f20 Python`_PyEval_EvalFrameDefault + 50560
    frame #14: 0x00000001018cf430 Python`object_vacall + 212
    frame #15: 0x00000001018cf310 Python`PyObject_CallMethodObjArgs + 104
    frame #16: 0x00000001019f5fc0 Python`PyImport_ImportModuleLevelObject + 996
    frame #17: 0x00000001019b1940 Python`builtin___import__ + 172
    frame #18: 0x000000010191e4b8 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 92
    frame #19: 0x00000001019c2f20 Python`_PyEval_EvalFrameDefault + 50560
    frame #20: 0x00000001018cf430 Python`object_vacall + 212
    frame #21: 0x00000001018cf310 Python`PyObject_CallMethodObjArgs + 104
    frame #22: 0x00000001019f5fc0 Python`PyImport_ImportModuleLevelObject + 996
    frame #23: 0x00000001019b1940 Python`builtin___import__ + 172
    frame #24: 0x000000010191e4b8 Python`cfunction_vectorcall_FASTCALL_KEYWORDS + 92
    frame #25: 0x00000001018ce81c Python`_PyObject_CallFunctionVa + 152
    frame #26: 0x00000001018ce778 Python`PyObject_CallFunction + 60
    frame #27: 0x00000001019f5a00 Python`PyImport_Import + 192
    frame #28: 0x00000001019f58e4 Python`PyImport_ImportModule + 60
    frame #29: 0x000000010000d124 test_advection`_import_array() at __multiarray_api.h:1482:21
    frame #30: 0x000000010000ce2c test_advection`matplotlibcpp::detail::_interpreter::import_numpy(this=0x0000000100510000) at matplotlibcpp.h:126:3
    frame #31: 0x000000010000b9b4 test_advection`matplotlibcpp::detail::_interpreter::_interpreter(this=0x0000000100510000) at matplotlibcpp.h:160:3
    frame #32: 0x000000010000b8d8 test_advection`matplotlibcpp::detail::_interpreter::_interpreter(this=0x0000000100510000) at matplotlibcpp.h:141:2
    frame #33: 0x000000010000a764 test_advection`matplotlibcpp::detail::_interpreter::get() at matplotlibcpp.h:103:23
    frame #34: 0x000000010000a43c test_advection`_object* matplotlibcpp::get_array<double>(v=size=400) at matplotlibcpp.h:320:2
    frame #35: 0x000000010000a1cc test_advection`bool matplotlibcpp::plot<double>(x=size=400, y=size=400, keywords=size=1) at matplotlibcpp.h:375:21
    frame #36: 0x000000010000529c test_advection`matplotlibcpp::plot(x=size=400, y=size=400, keywords=size=1) at matplotlibcpp.h:1870:9
    frame #37: 0x00000001000046d8 test_advection`AdvectionSimulation<SawtoothProblem>::computeReferenceSolution(this=0x000000016fdfdfb8, ref=0x000000016fdfd1b8, dx=0x000000016fdfd170, prob_lo=0x000000016fdfd168, prob_hi=0x000000016fdfd160) at test_advection.cpp:110:3

BenWibking · 2024-07-02T18:46:22Z

FPEs do not seem to work on Linux ARM64.

…/quokka into BenWibking/enable-fpe-in-ci

BenWibking · 2024-07-07T21:28:41Z

/azp run

azure-pipelines · 2024-07-07T21:28:54Z

Azure Pipelines successfully started running 2 pipeline(s).

BenWibking · 2024-07-07T23:03:47Z

These GPU tests fail:

The following tests FAILED:
	 30 - BinaryOrbitCIC (Failed)
	 37 - PopIII (Failed)
	 39 - StarCluster (Failed)

This reverts commit 382978b.

BenWibking · 2024-07-08T00:18:11Z

/azp run

azure-pipelines · 2024-07-08T00:18:24Z

Azure Pipelines successfully started running 2 pipeline(s).

BenWibking · 2024-07-08T02:05:27Z

Last remaining test failures on avatargpu:

The following tests FAILED:
	 37 - PopIII (Failed)
	 39 - StarCluster (Failed)

It appears to fail when reading the turbulence driving field:

CUDA initialized with 1 device.
AMReX (24.06) initialized
Successfully read inputs file ... 
Using derived variables:
	log_density

Creating new distribution map on level: 0
Initializing turbulence data...
data_file: zdrv.hdf5.
Erroneous arithmetic operation
See Backtrace.0 file for details

It's caused by an FPE inside the HDF5 library :/
See HDFGroup/hdf5#3831.

BenWibking · 2024-07-08T18:24:22Z

/azp run

azure-pipelines · 2024-07-08T18:24:37Z

Azure Pipelines successfully started running 2 pipeline(s).

BenWibking · 2024-07-10T22:22:35Z

@markkrumholz Can you review this PR?

enable FPE traps in CI

41a522b

psharda added enhancement New feature or request floating-point issue with floating-point roundoff error CI labels Apr 17, 2024

Merge branch 'development' into BenWibking/enable-fpe-in-ci

7561ee9

BenWibking changed the title ~~enable FPE traps in CI~~ [WIP] enable FPE traps in CI Jun 29, 2024

BenWibking added 3 commits July 1, 2024 20:50

Merge branch 'development' into BenWibking/enable-fpe-in-ci

d4cdeee

Merge branch 'BenWibking/enable-fpe-in-ci' of github.com:quokka-astro…

9fbf6a8

…/quokka into BenWibking/enable-fpe-in-ci

add -DAMReX_FPE=ON to all CPU tests

da23304

BenWibking marked this pull request as ready for review July 2, 2024 00:54

BenWibking changed the title ~~[WIP] enable FPE traps in CI~~ enable FPE traps in CI Jul 2, 2024

BenWibking requested a review from markkrumholz July 2, 2024 01:11

BenWibking enabled auto-merge July 2, 2024 01:13

markkrumholz reviewed Jul 2, 2024

View reviewed changes

BenWibking disabled auto-merge July 2, 2024 13:02

BenWibking marked this pull request as draft July 2, 2024 13:02

enable runtime FPE options for test suite

4175afa

add -ffp-exception-behavior=maytrap

03f359a

BenWibking added 3 commits July 2, 2024 15:13

fix div-by-zero edge cases in EnforceLimits

5a3aead

fix corner case

5d5374a

fix corner case

247dee7

BenWibking added 2 commits July 7, 2024 16:17

Merge branch 'BenWibking/enable-fpe-in-ci' of github.com:quokka-astro…

5c446b8

…/quokka into BenWibking/enable-fpe-in-ci

remove whitespace

d95842d

BenWibking marked this pull request as ready for review July 7, 2024 21:28

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 7, 2024

BenWibking requested a review from psharda July 7, 2024 21:31

fix incorrect number of ghost cells for FlattenShocks

382978b

BenWibking marked this pull request as draft July 7, 2024 23:06

BenWibking added 2 commits July 7, 2024 18:09

Revert "fix incorrect number of ghost cells for FlattenShocks"

1a4805a

This reverts commit 382978b.

fix div-by-zero

4235cf3

Merge branch 'development' into BenWibking/enable-fpe-in-ci

59f3eba

BenWibking marked this pull request as ready for review July 8, 2024 18:30

Merge branch 'development' into BenWibking/enable-fpe-in-ci

3a9a259

BenWibking enabled auto-merge July 10, 2024 22:52

markkrumholz approved these changes Jul 10, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 10, 2024

BenWibking added this pull request to the merge queue Jul 10, 2024

Merged via the queue into development with commit 14469c0 Jul 11, 2024
18 of 19 checks passed

BenWibking deleted the BenWibking/enable-fpe-in-ci branch July 11, 2024 14:49

This was referenced Jul 22, 2024

RadTube test failed with "Erroneous arithmetic operation"; resolved if set reconstructionOrder_ to 3 #679

Closed

PPL-opacity branch starting from PR #667 #680

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable FPE traps in CI #611

enable FPE traps in CI #611

BenWibking commented Apr 17, 2024 •

edited

Loading

BenWibking commented Apr 17, 2024

azure-pipelines bot commented Apr 17, 2024

psharda commented Apr 29, 2024

azure-pipelines bot commented Apr 29, 2024

markkrumholz left a comment

BenWibking commented Jul 2, 2024

BenWibking commented Jul 2, 2024 •

edited

Loading

BenWibking commented Jul 2, 2024 •

edited

Loading

BenWibking commented Jul 2, 2024

BenWibking commented Jul 2, 2024

BenWibking commented Jul 7, 2024

azure-pipelines bot commented Jul 7, 2024

BenWibking commented Jul 7, 2024

BenWibking commented Jul 8, 2024

azure-pipelines bot commented Jul 8, 2024

BenWibking commented Jul 8, 2024 •

edited

Loading

BenWibking commented Jul 8, 2024

azure-pipelines bot commented Jul 8, 2024

BenWibking commented Jul 10, 2024

enable FPE traps in CI #611

enable FPE traps in CI #611

Conversation

BenWibking commented Apr 17, 2024 • edited Loading

Description

Related issues

Checklist

BenWibking commented Apr 17, 2024

azure-pipelines bot commented Apr 17, 2024

psharda commented Apr 29, 2024

azure-pipelines bot commented Apr 29, 2024

markkrumholz left a comment

Choose a reason for hiding this comment

BenWibking commented Jul 2, 2024

BenWibking commented Jul 2, 2024 • edited Loading

BenWibking commented Jul 2, 2024 • edited Loading

BenWibking commented Jul 2, 2024

BenWibking commented Jul 2, 2024

BenWibking commented Jul 7, 2024

azure-pipelines bot commented Jul 7, 2024

BenWibking commented Jul 7, 2024

BenWibking commented Jul 8, 2024

azure-pipelines bot commented Jul 8, 2024

BenWibking commented Jul 8, 2024 • edited Loading

BenWibking commented Jul 8, 2024

azure-pipelines bot commented Jul 8, 2024

BenWibking commented Jul 10, 2024

BenWibking commented Apr 17, 2024 •

edited

Loading

BenWibking commented Jul 2, 2024 •

edited

Loading

BenWibking commented Jul 2, 2024 •

edited

Loading

BenWibking commented Jul 8, 2024 •

edited

Loading