-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA) #129
Comments
Good that I opened this issue. Indeed, there is an issue due to fast math. In the context of issue #143 I tried to build and test in single precision. I get one run that fails with fast math, but no longer fails without fast math:
I will try to have a look... |
Ok this will take some time..
from
I guess we need to build our own isnan function, although people advise against it... In any case it's clear that there must be a way to find these strange numbers.. if they behaved normally, the sums would be ok, but they are not. Maybe compariong to 0, or summing 1 and comparing to 0 again... Or, as suggested in SO, maybe build without fast math the code that does the isnan check, as simple as that? [Edited: above I have added the results of comparing ME+1 to itself or to zero, indeed these could be used.] |
…graph5#129 time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = FLOAT (nan=5) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] Momenta memory layout = AOSOA[8] Random number generation = COMMON RANDOM (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.021230e+01 ) sec TotalTime[Rambo+ME] (23) = ( 1.012776e+01 ) sec TotalTime[RndNumGen] (1) = ( 8.454761e-02 ) sec TotalTime[Rambo] (2) = ( 1.987170e+00 ) sec TotalTime[MatrixElems] (3) = ( 8.140586e+00 ) sec MeanTimeInMatrixElems = ( 6.783822e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.779236e-01 , 6.788596e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.160663e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.212093e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 7.728505e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291451 MeanMatrixElemValue = ( 1.371780e-02 +- 3.268987e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 1.084707e-03 , 8.123530e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.199524e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000362 sec 0b MemAlloc : 0.037545 sec 0c GenCreat : 0.000468 sec 1b GenRnGen : 0.084548 sec 2a RamboIni : 0.072931 sec 2b RamboFin : 1.914239 sec 3a SigmaKin : 8.140587 sec 4a DumpLoop : 0.065026 sec 8a CompStat : 0.042349 sec 9a GenDestr : 0.000002 sec 9b DumpScrn : 0.008876 sec 9c DumpJson : 0.000007 sec TOTAL : 10.366940 sec TOTAL (123) : 10.212305 sec TOTAL (23) : 10.127757 sec TOTAL (1) : 0.084548 sec TOTAL (2) : 1.987170 sec TOTAL (3) : 8.140587 sec *********************************************************************** real 0m10.384s user 0m10.569s sys 0m0.233s
Declare ME is nan if both ME==0 and ME==1 are true. For future studies, include also the number of ME==0 found. This is without fast math (which would find nans correctly anyway). Note that the MEs which are nan are not also equal to zero. time ./check.exe -p 2048 256 12 -d DEBUG: omp_get_num_threads() = 1 DEBUG: omp_get_max_threads() = 4 DEBUG: ${OMP_NUM_THREADS} = '[not set]' DEBUG: OMP_NUM_THREADS is not set: will use only 1 thread DEBUG: omp_get_num_threads() = 1 DEBUG: omp_get_max_threads() = 1 WARNING! ME[310744] is nan WARNING! ME[451171] is nan WARNING! ME[3007871] is nan WARNING! ME[3163868] is nan WARNING! ME[4471038] is nan *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = FLOAT (nan=5, zero=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[8] Momenta memory layout = AOSOA[8] Random number generation = COMMON RANDOM (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.022795e+01 ) sec TotalTime[Rambo+ME] (23) = ( 1.014364e+01 ) sec TotalTime[RndNumGen] (1) = ( 8.430775e-02 ) sec TotalTime[Rambo] (2) = ( 1.985546e+00 ) sec TotalTime[MatrixElems] (3) = ( 8.158094e+00 ) sec MeanTimeInMatrixElems = ( 6.798411e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.793488e-01 , 6.803043e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.151240e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.202366e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 7.711919e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291451 MeanMatrixElemValue = ( 1.371780e-02 +- 3.268987e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 1.084707e-03 , 8.123530e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.199524e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000296 sec 0b MemAlloc : 0.037358 sec 0c GenCreat : 0.000326 sec 1b GenRnGen : 0.084308 sec 2a RamboIni : 0.073522 sec 2b RamboFin : 1.912024 sec 3a SigmaKin : 8.158094 sec 4a DumpLoop : 0.068626 sec 8a CompStat : 0.089024 sec 9a GenDestr : 0.000005 sec 9b DumpScrn : 0.009184 sec 9c DumpJson : 0.000008 sec TOTAL : 10.432773 sec TOTAL (123) : 10.227947 sec TOTAL (23) : 10.143640 sec TOTAL (1) : 0.084308 sec TOTAL (2) : 1.985546 sec TOTAL (3) : 8.158094 sec *********************************************************************** real 0m10.450s user 0m10.621s sys 0m0.284s
I open this issue as a reminder: check the handling of NaN/Inf (both in Fortran madevent and in our standalone C++/CUDA).
This is related to issue #117 about the use of fast math for complex number arithmetics. The best speed in both Fortran and C++ is obtained using the most aggressive compiler options, including -O3 and fast math. This disables IEEE 754 compliance for NaN and Inf amongst other things. It is not clear what would happen if one event out of many has a NaN or Inf. In the standalone C++ driver, this was correctly handled (the event is just disacraded) before fast math, but one should check if fast math changes this. In madevent instead it's not clear how this would be handled.
This is also related to issue #5 about single precision. In the standalone C++/CUDA we observed that single precision does lead to the appearance of one NaN every million events or so. One could check for instance what happens when both single precision and fast math are used.
The text was updated successfully, but these errors were encountered: