Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129

Open
valassi opened this issue Mar 18, 2021 · 2 comments

Comments

@valassi
Copy link
Member

valassi commented Mar 18, 2021

I open this issue as a reminder: check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA).

This is related to issue #117 about the use of fast math for complex number arithmetics. The best speed in both Fortran and C++ is obtained using the most aggressive compiler options, including -O3 and fast math. This disables IEEE 754 compliance for NaN and Inf amongst other things. It is not clear what would happen if one event out of many has a NaN or Inf. In the standalone C++ driver, this was correctly handled (the event is just disacraded) before fast math, but one should check if fast math changes this. In madevent instead it's not clear how this would be handled.

This is also related to issue #5 about single precision. In the standalone C++/CUDA we observed that single precision does lead to the appearance of one NaN every million events or so. One could check for instance what happens when both single precision and fast math are used.

@valassi
Copy link
Member Author

valassi commented Mar 30, 2021

Good that I opened this issue.

Indeed, there is an issue due to fast math. In the context of issue #143 I tried to build and test in single precision.

I get one run that fails with fast math, but no longer fails without fast math:

With fast math
    time ./check.exe -p 2048 256 12
    ***********************************************************************
    NumBlocksPerGrid            = 2048
    NumThreadsPerBlock          = 256
    NumIterations               = 12
    -----------------------------------------------------------------------
    FP precision                = FLOAT (nan=0)
    Complex type                = STD::COMPLEX
    RanNumb memory layout       = AOSOA[8]
    Momenta memory layout       = AOSOA[8]
    Random number generation    = COMMON RANDOM (C++ code)
    OMP threads / `nproc --all` = 1 / 4
    MatrixElements compiler     = gcc (GCC) 9.2.0
    -----------------------------------------------------------------------
    NumberOfEntries             = 12
    TotalTime[Rnd+Rmb+ME] (123) = ( 7.872868e+00                 )  sec
    TotalTime[Rambo+ME]    (23) = ( 7.788832e+00                 )  sec
    TotalTime[RndNumGen]    (1) = ( 8.403606e-02                 )  sec
    TotalTime[Rambo]        (2) = ( 1.559249e+00                 )  sec
    TotalTime[MatrixElems]  (3) = ( 6.229583e+00                 )  sec
    MeanTimeInMatrixElems       = ( 5.191319e-01                 )  sec
    [Min,Max]TimeInMatrixElems  = [ 5.188715e-01 ,  5.195643e-01 ]  sec
    -----------------------------------------------------------------------
    TotalEventsComputed         = 6291456
    EvtsPerSec[Rnd+Rmb+ME](123) = ( 7.991314e+05                 )  sec^-1
    EvtsPerSec[Rmb+ME]     (23) = ( 8.077535e+05                 )  sec^-1
    EvtsPerSec[MatrixElems] (3) = ( 1.009932e+06                 )  sec^-1
    ***********************************************************************
    NumMatrixElements(notNan)   = 6291456
    MeanMatrixElemValue         = ( -nan +- -nan )  GeV^0
    [Min,Max]MatrixElemValue    = [ 6.004423e-03 ,  4.260640e-02 ]  GeV^0
    StdDevMatrixElemValue       = ( -nan                 )  GeV^0
    MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
    [Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
    StdDevWeight                = ( 0.000000e+00                 )
    ***********************************************************************
    0a ProcInit :     0.000401 sec
    0b MemAlloc :     0.037210 sec
    0c GenCreat :     0.000448 sec
    1b GenRnGen :     0.084036 sec
    2a RamboIni :     0.072432 sec
    2b RamboFin :     1.486817 sec
    3a SigmaKin :     6.229582 sec
    4a DumpLoop :     0.066444 sec
    8a CompStat :     0.016088 sec
    9a GenDestr :     0.000003 sec
    9b DumpScrn :     0.009665 sec
    9c DumpJson :     0.000005 sec
    TOTAL       :     8.003131 sec
    TOTAL (123) :     7.872867 sec
    TOTAL  (23) :     7.788831 sec
    TOTAL   (1) :     0.084036 sec
    TOTAL   (2) :     1.559249 sec
    TOTAL   (3) :     6.229582 sec
    ***********************************************************************
    real    0m8.024s
    user    0m8.203s
    sys     0m0.247s

Without fast math:
time ./check.exe -p 2048 256 12
***********************************************************************
NumBlocksPerGrid            = 2048
NumThreadsPerBlock          = 256
NumIterations               = 12
-----------------------------------------------------------------------
FP precision                = FLOAT (nan=5)
Complex type                = STD::COMPLEX
RanNumb memory layout       = AOSOA[8]
Momenta memory layout       = AOSOA[8]
Random number generation    = COMMON RANDOM (C++ code)
OMP threads / `nproc --all` = 1 / 4
MatrixElements compiler     = gcc (GCC) 9.2.0
-----------------------------------------------------------------------
NumberOfEntries             = 12
TotalTime[Rnd+Rmb+ME] (123) = ( 1.021230e+01                 )  sec
TotalTime[Rambo+ME]    (23) = ( 1.012776e+01                 )  sec
TotalTime[RndNumGen]    (1) = ( 8.454761e-02                 )  sec
TotalTime[Rambo]        (2) = ( 1.987170e+00                 )  sec
TotalTime[MatrixElems]  (3) = ( 8.140586e+00                 )  sec
MeanTimeInMatrixElems       = ( 6.783822e-01                 )  sec
[Min,Max]TimeInMatrixElems  = [ 6.779236e-01 ,  6.788596e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed         = 6291456
EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.160663e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23) = ( 6.212093e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 7.728505e+05                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)   = 6291451
MeanMatrixElemValue         = ( 1.371780e-02 +- 3.268987e-06 )  GeV^0
[Min,Max]MatrixElemValue    = [ 1.084707e-03 ,  8.123530e-02 ]  GeV^0
StdDevMatrixElemValue       = ( 8.199524e-03                 )  GeV^0
MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight                = ( 0.000000e+00                 )
***********************************************************************
0a ProcInit :     0.000362 sec
0b MemAlloc :     0.037545 sec
0c GenCreat :     0.000468 sec
1b GenRnGen :     0.084548 sec
2a RamboIni :     0.072931 sec
2b RamboFin :     1.914239 sec
3a SigmaKin :     8.140587 sec
4a DumpLoop :     0.065026 sec
8a CompStat :     0.042349 sec
9a GenDestr :     0.000002 sec
9b DumpScrn :     0.008876 sec
9c DumpJson :     0.000007 sec
TOTAL       :    10.366940 sec
TOTAL (123) :    10.212305 sec
TOTAL  (23) :    10.127757 sec
TOTAL   (1) :     0.084548 sec
TOTAL   (2) :     1.987170 sec
TOTAL   (3) :     8.140587 sec
***********************************************************************
real    0m10.384s
user    0m10.569s
sys     0m0.233s

I will try to have a look...

@valassi
Copy link
Member Author

valassi commented Mar 30, 2021

Ok this will take some time..

time ./check.exe -p 2048 256 1 -d
DEBUG: omp_get_num_threads() = 1
DEBUG: omp_get_max_threads() = 4
DEBUG: ${OMP_NUM_THREADS}    = '[not set]'
DEBUG: OMP_NUM_THREADS is not set: will use only 1 thread
DEBUG: omp_get_num_threads() = 1
DEBUG: omp_get_max_threads() = 1
DEBUG: ME[310744] = -nan
DEBUG: is nan = 0
DEBUG: is finite = 1
DEBUG: is normal = 0
DEBUG: is zero = 1
DEBUG: ME is ME = 1
DEBUG: ME+1 is ME = 1
DEBUG: ME+1 is zero = 1
DEBUG: ME+1 is 1 = 1
DEBUG: abs(ME[310744]) = nan
DEBUG: is nan = 0
DEBUG: is finite = 1
DEBUG: is normal = 0
DEBUG: is zero = 1
...

from

    // Debug nan issues
    if ( ievtALL == 310744 )
    {
      std::cout << "DEBUG: ME[" << ievtALL << "] = " << matrixelementALL[ievtALL] << std::endl;
      std::cout << "DEBUG: is nan = " << std::isnan( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is finite = " << std::isfinite( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is normal = " << std::isnormal( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is zero = " << ( matrixelementALL[ievtALL] == 0 ) << std::endl;
      std::cout << "DEBUG: ME is ME = " << ( matrixelementALL[ievtALL] == matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: ME+1 is ME = " << ( matrixelementALL[ievtALL] + 1 == matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: ME+1 is zero = " << ( matrixelementALL[ievtALL] + 1 == 0 ) << std::endl;
      std::cout << "DEBUG: ME+1 is 1 = " << ( matrixelementALL[ievtALL] + 1 == 1 ) << std::endl;
      std::cout << "DEBUG: abs(ME[" << ievtALL << "]) = " << std::abs( matrixelementALL[ievtALL] ) << std::endl;
      std::cout << "DEBUG: is nan = " << std::isnan( std::abs( matrixelementALL[ievtALL] ) ) << std::endl;
      std::cout << "DEBUG: is finite = " << std::isfinite( std::abs( matrixelementALL[ievtALL] ) ) << std::endl;
      std::cout << "DEBUG: is normal = " << std::isnormal( std::abs( matrixelementALL[ievtALL] ) ) << std::endl;
      std::cout << "DEBUG: is zero = " << ( std::abs( matrixelementALL[ievtALL] ) == 0 ) << std::endl;
    }

I guess we need to build our own isnan function, although people advise against it...
https://stackoverflow.com/a/22931368

In any case it's clear that there must be a way to find these strange numbers.. if they behaved normally, the sums would be ok, but they are not. Maybe compariong to 0, or summing 1 and comparing to 0 again... Or, as suggested in SO, maybe build without fast math the code that does the isnan check, as simple as that? [Edited: above I have added the results of comparing ME+1 to itself or to zero, indeed these could be used.]

valassi added a commit to valassi/madgraph4gpu that referenced this issue Mar 31, 2021
…graph5#129

time ./check.exe -p 2048 256 12
***********************************************************************
NumBlocksPerGrid            = 2048
NumThreadsPerBlock          = 256
NumIterations               = 12
-----------------------------------------------------------------------
FP precision                = FLOAT (nan=5)
Complex type                = STD::COMPLEX
RanNumb memory layout       = AOSOA[8]
Momenta memory layout       = AOSOA[8]
Random number generation    = COMMON RANDOM (C++ code)
OMP threads / `nproc --all` = 1 / 4
MatrixElements compiler     = gcc (GCC) 9.2.0
-----------------------------------------------------------------------
NumberOfEntries             = 12
TotalTime[Rnd+Rmb+ME] (123) = ( 1.021230e+01                 )  sec
TotalTime[Rambo+ME]    (23) = ( 1.012776e+01                 )  sec
TotalTime[RndNumGen]    (1) = ( 8.454761e-02                 )  sec
TotalTime[Rambo]        (2) = ( 1.987170e+00                 )  sec
TotalTime[MatrixElems]  (3) = ( 8.140586e+00                 )  sec
MeanTimeInMatrixElems       = ( 6.783822e-01                 )  sec
[Min,Max]TimeInMatrixElems  = [ 6.779236e-01 ,  6.788596e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed         = 6291456
EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.160663e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23) = ( 6.212093e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 7.728505e+05                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)   = 6291451
MeanMatrixElemValue         = ( 1.371780e-02 +- 3.268987e-06 )  GeV^0
[Min,Max]MatrixElemValue    = [ 1.084707e-03 ,  8.123530e-02 ]  GeV^0
StdDevMatrixElemValue       = ( 8.199524e-03                 )  GeV^0
MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight                = ( 0.000000e+00                 )
***********************************************************************
0a ProcInit :     0.000362 sec
0b MemAlloc :     0.037545 sec
0c GenCreat :     0.000468 sec
1b GenRnGen :     0.084548 sec
2a RamboIni :     0.072931 sec
2b RamboFin :     1.914239 sec
3a SigmaKin :     8.140587 sec
4a DumpLoop :     0.065026 sec
8a CompStat :     0.042349 sec
9a GenDestr :     0.000002 sec
9b DumpScrn :     0.008876 sec
9c DumpJson :     0.000007 sec
TOTAL       :    10.366940 sec
TOTAL (123) :    10.212305 sec
TOTAL  (23) :    10.127757 sec
TOTAL   (1) :     0.084548 sec
TOTAL   (2) :     1.987170 sec
TOTAL   (3) :     8.140587 sec
***********************************************************************
real    0m10.384s
user    0m10.569s
sys     0m0.233s
valassi added a commit to valassi/madgraph4gpu that referenced this issue Mar 31, 2021
Declare ME is nan if both ME==0 and ME==1 are true.
For future studies, include also the number of ME==0 found.

This is without fast math (which would find nans correctly anyway).
Note that the MEs which are nan are not also equal to zero.

time ./check.exe -p 2048 256 12 -d
DEBUG: omp_get_num_threads() = 1
DEBUG: omp_get_max_threads() = 4
DEBUG: ${OMP_NUM_THREADS}    = '[not set]'
DEBUG: OMP_NUM_THREADS is not set: will use only 1 thread
DEBUG: omp_get_num_threads() = 1
DEBUG: omp_get_max_threads() = 1
WARNING! ME[310744] is nan
WARNING! ME[451171] is nan
WARNING! ME[3007871] is nan
WARNING! ME[3163868] is nan
WARNING! ME[4471038] is nan
***********************************************************************
NumBlocksPerGrid            = 2048
NumThreadsPerBlock          = 256
NumIterations               = 12
-----------------------------------------------------------------------
FP precision                = FLOAT (nan=5, zero=0)
Complex type                = STD::COMPLEX
RanNumb memory layout       = AOSOA[8]
Momenta memory layout       = AOSOA[8]
Random number generation    = COMMON RANDOM (C++ code)
OMP threads / `nproc --all` = 1 / 4
MatrixElements compiler     = gcc (GCC) 9.2.0
-----------------------------------------------------------------------
NumberOfEntries             = 12
TotalTime[Rnd+Rmb+ME] (123) = ( 1.022795e+01                 )  sec
TotalTime[Rambo+ME]    (23) = ( 1.014364e+01                 )  sec
TotalTime[RndNumGen]    (1) = ( 8.430775e-02                 )  sec
TotalTime[Rambo]        (2) = ( 1.985546e+00                 )  sec
TotalTime[MatrixElems]  (3) = ( 8.158094e+00                 )  sec
MeanTimeInMatrixElems       = ( 6.798411e-01                 )  sec
[Min,Max]TimeInMatrixElems  = [ 6.793488e-01 ,  6.803043e-01 ]  sec
-----------------------------------------------------------------------
TotalEventsComputed         = 6291456
EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.151240e+05                 )  sec^-1
EvtsPerSec[Rmb+ME]     (23) = ( 6.202366e+05                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 7.711919e+05                 )  sec^-1
***********************************************************************
NumMatrixElements(notNan)   = 6291451
MeanMatrixElemValue         = ( 1.371780e-02 +- 3.268987e-06 )  GeV^0
[Min,Max]MatrixElemValue    = [ 1.084707e-03 ,  8.123530e-02 ]  GeV^0
StdDevMatrixElemValue       = ( 8.199524e-03                 )  GeV^0
MeanWeight                  = ( 4.515827e-01 +- 0.000000e+00 )
[Min,Max]Weight             = [ 4.515827e-01 ,  4.515827e-01 ]
StdDevWeight                = ( 0.000000e+00                 )
***********************************************************************
0a ProcInit :     0.000296 sec
0b MemAlloc :     0.037358 sec
0c GenCreat :     0.000326 sec
1b GenRnGen :     0.084308 sec
2a RamboIni :     0.073522 sec
2b RamboFin :     1.912024 sec
3a SigmaKin :     8.158094 sec
4a DumpLoop :     0.068626 sec
8a CompStat :     0.089024 sec
9a GenDestr :     0.000005 sec
9b DumpScrn :     0.009184 sec
9c DumpJson :     0.000008 sec
TOTAL       :    10.432773 sec
TOTAL (123) :    10.227947 sec
TOTAL  (23) :    10.143640 sec
TOTAL   (1) :     0.084308 sec
TOTAL   (2) :     1.985546 sec
TOTAL   (3) :     8.158094 sec
***********************************************************************
real    0m10.450s
user    0m10.621s
sys     0m0.284s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant