Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129

valassi · 2021-03-18T09:11:01Z

I open this issue as a reminder: check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA).

This is related to issue #117 about the use of fast math for complex number arithmetics. The best speed in both Fortran and C++ is obtained using the most aggressive compiler options, including -O3 and fast math. This disables IEEE 754 compliance for NaN and Inf amongst other things. It is not clear what would happen if one event out of many has a NaN or Inf. In the standalone C++ driver, this was correctly handled (the event is just disacraded) before fast math, but one should check if fast math changes this. In madevent instead it's not clear how this would be handled.

This is also related to issue #5 about single precision. In the standalone C++/CUDA we observed that single precision does lead to the appearance of one NaN every million events or so. One could check for instance what happens when both single precision and fast math are used.

valassi · 2021-03-30T17:47:41Z

Good that I opened this issue.

Indeed, there is an issue due to fast math. In the context of issue #143 I tried to build and test in single precision.

I get one run that fails with fast math, but no longer fails without fast math:

With fast math
    time ./check.exe -p 2048 256 12
    ***********************************************************************
    NumBlocksPerGrid            = 2048
    NumThreadsPerBlock          = 256
    NumIterations               = 12
    -----------------------------------------------------------------------
    FP precision                = FLOAT (nan=0)
    Complex type                = STD::COMPLEX
    RanNumb memory layout       = AOSOA[8]
    Momenta memory layout       = AOSOA[8]
    Random number generation    = COMMON RANDOM (C++ code)
    OMP threads / `nproc --all` = 1 / 4
    MatrixElements compiler     = gcc (GCC) 9.2.0
    -----------------------------------------------------------------------
    NumberOfEntries             = 12
    TotalTime[Rnd+Rmb+ME] (123) = ( 7.872868e+00                 )  sec
    TotalTime[Rambo+ME]    (23) = ( 7.788832e+00                 )  sec
    TotalTime[RndNumGen]    (1) = ( 8.403606e-02                 )  sec
    TotalTime[Rambo]        (2) = ( 1.559249e+00                 )  sec
    TotalTime[MatrixElems]  (3) = ( 6.229583e+00                 )  sec
    MeanTimeInMatrixElems       = ( 5.191319e-01                 )  sec
    [Min,Max]TimeInMatrixElems  = [ 5.188715e-01 ,  5.195643e-01 ]  sec
    -----------------------------------------------------------------------
    TotalEventsComputed         = 6291456
    EvtsPerSec[Rnd+Rmb+ME](123) = ( 7.991314e+05                 )  sec^-1
    EvtsPerSec[Rmb+ME]     (23) = ( 8.077535e+05                 )  sec^-1
    EvtsPerSec[MatrixElems] (3) = ( 1.009932e+06                 )  sec^-1
    ***********************************************************************
    NumMatrixElements(notNan)   = 6291456
    MeanMatrixElemValue         = ( -nan +- -nan )  GeV^0
    [Min,Max]MatrixElemValue    = [ 6.004423e-03 ,  4.260640e-02 ]  GeV^0
    StdDevMatrixElemValue       = ( -nan                 )  GeV^0
    MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
    [Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
    StdDevWeight                = ( 0.000000e+00                 )
    ***********************************************************************
    0a ProcInit :     0.000401 sec
    0b MemAlloc :     0.037210 sec
    0c GenCreat :     0.000448 sec
    1b GenRnGen :     0.084036 sec
    2a RamboIni :     0.072432 sec
    2b RamboFin :     1.486817 sec
    3a SigmaKin :     6.229582 sec
    4a DumpLoop :     0.066444 sec
    8a CompStat :     0.016088 sec
    9a GenDestr :     0.000003 sec
    9b DumpScrn :     0.009665 sec
    9c DumpJson :     0.000005 sec
    TOTAL       :     8.003131 sec
    TOTAL (123) :     7.872867 sec
    TOTAL  (23) :     7.788831 sec
    TOTAL   (1) :     0.084036 sec
    TOTAL   (2) :     1.559249 sec
    TOTAL   (3) :     6.229582 sec
    ***********************************************************************
    real    0m8.024s
    user    0m8.203s
    sys     0m0.247s

Without fast math:
time ./check.exe -p 2048 256 12
***********************************************************************
NumBlocksPerGrid            = 2048
NumThreadsPerBlock          = 256
NumIterations               = 12
-----------------------------------------------------------------------
FP precision                = FLOAT (nan=5)
Complex type                = STD::COMPLEX
RanNumb memory layout       = AOSOA[8]
Momenta memory layout       = AOSOA[8]
Random number generation    = COMMON RANDOM (C++ code)
OMP threads / `nproc --all` = 1 / 4
MatrixElements compiler     = gcc (GCC) 9.2.0
-----------------------------------------------------------------------
NumberOfEntries             = 12
TotalTime[Rnd+Rmb+ME] (123) = ( 1.021230e+01                 )  sec
TotalTime[Rambo+ME]    (23) = ( 1.012776e+01                 )  sec
TotalTime[RndNumGen]    (1) = ( 8.454761e-02                 )  sec
TotalTime[Rambo]        (2) = ( 1.987170e+00                 )  sec
TotalTime[MatrixElems]  (3) = ( 8.140586e+00                 )  sec
MeanTimeInMatrixElems       = ( 6.783822e-01                 )  sec
[Min,Max]TimeInMatrixElems  = [ 6.779236e-01 ,  6.788596e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed         = 6291456
EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.160663e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23) = ( 6.212093e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 7.728505e+05                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)   = 6291451
MeanMatrixElemValue         = ( 1.371780e-02 +- 3.268987e-06 )  GeV^0
[Min,Max]MatrixElemValue    = [ 1.084707e-03 ,  8.123530e-02 ]  GeV^0
StdDevMatrixElemValue       = ( 8.199524e-03                 )  GeV^0
MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight                = ( 0.000000e+00                 )
***********************************************************************
0a ProcInit :     0.000362 sec
0b MemAlloc :     0.037545 sec
0c GenCreat :     0.000468 sec
1b GenRnGen :     0.084548 sec
2a RamboIni :     0.072931 sec
2b RamboFin :     1.914239 sec
3a SigmaKin :     8.140587 sec
4a DumpLoop :     0.065026 sec
8a CompStat :     0.042349 sec
9a GenDestr :     0.000002 sec
9b DumpScrn :     0.008876 sec
9c DumpJson :     0.000007 sec
TOTAL       :    10.366940 sec
TOTAL (123) :    10.212305 sec
TOTAL  (23) :    10.127757 sec
TOTAL   (1) :     0.084548 sec
TOTAL   (2) :     1.987170 sec
TOTAL   (3) :     8.140587 sec
***********************************************************************
real    0m10.384s
user    0m10.569s
sys     0m0.233s

I will try to have a look...

valassi · 2021-03-30T18:14:28Z

Ok this will take some time..

time ./check.exe -p 2048 256 1 -d
DEBUG: omp_get_num_threads() = 1
DEBUG: omp_get_max_threads() = 4
DEBUG: ${OMP_NUM_THREADS}    = '[not set]'
DEBUG: OMP_NUM_THREADS is not set: will use only 1 thread
DEBUG: omp_get_num_threads() = 1
DEBUG: omp_get_max_threads() = 1
DEBUG: ME[310744] = -nan
DEBUG: is nan = 0
DEBUG: is finite = 1
DEBUG: is normal = 0
DEBUG: is zero = 1
DEBUG: ME is ME = 1
DEBUG: ME+1 is ME = 1
DEBUG: ME+1 is zero = 1
DEBUG: ME+1 is 1 = 1
DEBUG: abs(ME[310744]) = nan
DEBUG: is nan = 0
DEBUG: is finite = 1
DEBUG: is normal = 0
DEBUG: is zero = 1
...

from

    // Debug nan issues
    if ( ievtALL == 310744 )
    {
      std::cout << "DEBUG: ME[" << ievtALL << "] = " << matrixelementALL[ievtALL] << std::endl;
      std::cout << "DEBUG: is nan = " << std::isnan( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is finite = " << std::isfinite( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is normal = " << std::isnormal( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is zero = " << ( matrixelementALL[ievtALL] == 0 ) << std::endl;
      std::cout << "DEBUG: ME is ME = " << ( matrixelementALL[ievtALL] == matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: ME+1 is ME = " << ( matrixelementALL[ievtALL] + 1 == matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: ME+1 is zero = " << ( matrixelementALL[ievtALL] + 1 == 0 ) << std::endl;
      std::cout << "DEBUG: ME+1 is 1 = " << ( matrixelementALL[ievtALL] + 1 == 1 ) << std::endl;
      std::cout << "DEBUG: abs(ME[" << ievtALL << "]) = " << std::abs( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is nan = " << std::isnan( std::abs( matrixelementALL[ievtALL] ) ) << std::endl;
      std::cout << "DEBUG: is finite = " << std::isfinite( std::abs( matrixelementALL[ievtALL] ) ) << std::endl;
      std::cout << "DEBUG: is normal = " << std::isnormal( std::abs( matrixelementALL[ievtALL] ) ) << std::endl;
      std::cout << "DEBUG: is zero = " << ( std::abs( matrixelementALL[ievtALL] ) == 0 ) << std::endl;
    }

I guess we need to build our own isnan function, although people advise against it...
https://stackoverflow.com/a/22931368

In any case it's clear that there must be a way to find these strange numbers.. if they behaved normally, the sums would be ok, but they are not. Maybe compariong to 0, or summing 1 and comparing to 0 again... Or, as suggested in SO, maybe build without fast math the code that does the isnan check, as simple as that? [Edited: above I have added the results of comparing ME+1 to itself or to zero, indeed these could be used.]

…graph5#129 time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = FLOAT (nan=5) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] Momenta memory layout = AOSOA[8] Random number generation = COMMON RANDOM (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.021230e+01 ) sec TotalTime[Rambo+ME] (23) = ( 1.012776e+01 ) sec TotalTime[RndNumGen] (1) = ( 8.454761e-02 ) sec TotalTime[Rambo] (2) = ( 1.987170e+00 ) sec TotalTime[MatrixElems] (3) = ( 8.140586e+00 ) sec MeanTimeInMatrixElems = ( 6.783822e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.779236e-01 , 6.788596e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.160663e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.212093e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 7.728505e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291451 MeanMatrixElemValue = ( 1.371780e-02 +- 3.268987e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 1.084707e-03 , 8.123530e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.199524e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000362 sec 0b MemAlloc : 0.037545 sec 0c GenCreat : 0.000468 sec 1b GenRnGen : 0.084548 sec 2a RamboIni : 0.072931 sec 2b RamboFin : 1.914239 sec 3a SigmaKin : 8.140587 sec 4a DumpLoop : 0.065026 sec 8a CompStat : 0.042349 sec 9a GenDestr : 0.000002 sec 9b DumpScrn : 0.008876 sec 9c DumpJson : 0.000007 sec TOTAL : 10.366940 sec TOTAL (123) : 10.212305 sec TOTAL (23) : 10.127757 sec TOTAL (1) : 0.084548 sec TOTAL (2) : 1.987170 sec TOTAL (3) : 8.140587 sec *********************************************************************** real 0m10.384s user 0m10.569s sys 0m0.233s

Declare ME is nan if both ME==0 and ME==1 are true. For future studies, include also the number of ME==0 found. This is without fast math (which would find nans correctly anyway). Note that the MEs which are nan are not also equal to zero. time ./check.exe -p 2048 256 12 -d DEBUG: omp_get_num_threads() = 1 DEBUG: omp_get_max_threads() = 4 DEBUG: ${OMP_NUM_THREADS} = '[not set]' DEBUG: OMP_NUM_THREADS is not set: will use only 1 thread DEBUG: omp_get_num_threads() = 1 DEBUG: omp_get_max_threads() = 1 WARNING! ME[310744] is nan WARNING! ME[451171] is nan WARNING! ME[3007871] is nan WARNING! ME[3163868] is nan WARNING! ME[4471038] is nan *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = FLOAT (nan=5, zero=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] Momenta memory layout = AOSOA[8] Random number generation = COMMON RANDOM (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.022795e+01 ) sec TotalTime[Rambo+ME] (23) = ( 1.014364e+01 ) sec TotalTime[RndNumGen] (1) = ( 8.430775e-02 ) sec TotalTime[Rambo] (2) = ( 1.985546e+00 ) sec TotalTime[MatrixElems] (3) = ( 8.158094e+00 ) sec MeanTimeInMatrixElems = ( 6.798411e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.793488e-01 , 6.803043e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.151240e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.202366e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 7.711919e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291451 MeanMatrixElemValue = ( 1.371780e-02 +- 3.268987e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 1.084707e-03 , 8.123530e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.199524e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000296 sec 0b MemAlloc : 0.037358 sec 0c GenCreat : 0.000326 sec 1b GenRnGen : 0.084308 sec 2a RamboIni : 0.073522 sec 2b RamboFin : 1.912024 sec 3a SigmaKin : 8.158094 sec 4a DumpLoop : 0.068626 sec 8a CompStat : 0.089024 sec 9a GenDestr : 0.000005 sec 9b DumpScrn : 0.009184 sec 9c DumpJson : 0.000008 sec TOTAL : 10.432773 sec TOTAL (123) : 10.227947 sec TOTAL (23) : 10.143640 sec TOTAL (1) : 0.084308 sec TOTAL (2) : 1.985546 sec TOTAL (3) : 8.158094 sec *********************************************************************** real 0m10.450s user 0m10.621s sys 0m0.284s

valassi mentioned this issue Mar 18, 2021

1.5x speedup from gcc8 to gcc9 in C++ (__muldc3 overhead in gcc8 - for complex numbers?) #117

Closed

valassi mentioned this issue Mar 31, 2021

[epoch1] Single precision: fix build failures, improve NaN determination #144

Merged

valassi mentioned this issue Apr 29, 2021

Investigate the possible use of single-precision random numbers also for double-precision madgraph #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129

Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129

valassi commented Mar 18, 2021

valassi commented Mar 30, 2021

valassi commented Mar 30, 2021 •

edited

Loading

Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129

Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129

Comments

valassi commented Mar 18, 2021

valassi commented Mar 30, 2021

valassi commented Mar 30, 2021 • edited Loading

valassi commented Mar 30, 2021 •

edited

Loading