-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge epoch2 and epoch1 - second part (still without CPPProcess) #149
Conversation
…value in implementation)
… - copy it to ep2 Epoch2 before fastmath: time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 9.806006e+00 ) sec TotalTime[Rambo+ME] (23)= ( 9.456839e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.491671e-01 ) sec TotalTime[Rambo] (2)= ( 2.018251e+00 ) sec TotalTime[MatrixElems] (3)= ( 7.438588e+00 ) sec MeanTimeInMatrixElems = ( 6.198823e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.183559e-01 , 6.259246e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.415921e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 6.652811e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 8.457864e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000397 sec 0b MemAlloc : 0.000043 sec 0c GenCreat : 0.000955 sec 1a GenSeed : 0.000031 sec 1b GenRnGen : 0.349136 sec 2a RamboIni : 0.138318 sec 2b RamboFin : 1.879934 sec 3a SigmaKin : 7.438588 sec 4a DumpLoop : 0.087978 sec 8a CompStat : 0.045155 sec 9a GenDestr : 0.000113 sec 9b DumpScrn : 0.000223 sec 9c DumpJson : 0.000001 sec TOTAL : 9.940873 sec TOTAL (123) : 9.806006 sec TOTAL (23) : 9.456840 sec TOTAL (1) : 0.349167 sec TOTAL (2) : 2.018251 sec TOTAL (3) : 7.438588 sec *********************************************************************** real 0m9.971s user 0m9.812s sys 0m0.157s Epoch2 after fastmath: NOT FASTER (?!) time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 9.747692e+00 ) sec TotalTime[Rambo+ME] (23)= ( 9.397507e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.501850e-01 ) sec TotalTime[Rambo] (2)= ( 1.976519e+00 ) sec TotalTime[MatrixElems] (3)= ( 7.420988e+00 ) sec MeanTimeInMatrixElems = ( 6.184157e-01 ) sec [Min,Max]TimeInMatrixElems = [ 6.178201e-01 , 6.216142e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.454303e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 6.694814e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 8.477922e+05 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000400 sec 0b MemAlloc : 0.000043 sec 0c GenCreat : 0.001004 sec 1a GenSeed : 0.000032 sec 1b GenRnGen : 0.350153 sec 2a RamboIni : 0.140705 sec 2b RamboFin : 1.835814 sec 3a SigmaKin : 7.420989 sec 4a DumpLoop : 0.083478 sec 8a CompStat : 0.045091 sec 9a GenDestr : 0.000119 sec 9b DumpScrn : 0.000269 sec 9c DumpJson : 0.000001 sec TOTAL : 9.878097 sec TOTAL (123) : 9.747692 sec TOTAL (23) : 9.397507 sec TOTAL (1) : 0.350185 sec TOTAL (2) : 1.976519 sec TOTAL (3) : 7.420989 sec *********************************************************************** real 0m9.908s user 0m9.769s sys 0m0.138s
…osmetics and copy ep1 to ep2 What ep1 had which is now added also to ep2: OMP, fastmath, Wextra, clang patch, host info Using fastmath also here, the speed does increase in epoch2 (note that HelAmps is compiled here via an include, so it makes sense) Epoch2: time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 8.066252e+00 ) sec TotalTime[Rambo+ME] (23)= ( 7.716077e+00 ) sec TotalTime[RndNumGen] (1)= ( 3.501755e-01 ) sec TotalTime[Rambo] (2)= ( 1.981157e+00 ) sec TotalTime[MatrixElems] (3)= ( 5.734920e+00 ) sec MeanTimeInMatrixElems = ( 4.779100e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.771928e-01 , 4.813840e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 7.799726e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 8.153698e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 1.097043e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000383 sec 0b MemAlloc : 0.000041 sec 0c GenCreat : 0.001009 sec 1a GenSeed : 0.000049 sec 1b GenRnGen : 0.350127 sec 2a RamboIni : 0.137961 sec 2b RamboFin : 1.843195 sec 3a SigmaKin : 5.734920 sec 4a DumpLoop : 0.085327 sec 8a CompStat : 0.027027 sec 9a GenDestr : 0.000147 sec 9b DumpScrn : 0.000251 sec 9c DumpJson : 0.000001 sec TOTAL : 8.180439 sec TOTAL (123) : 8.066252 sec TOTAL (23) : 7.716077 sec TOTAL (1) : 0.350176 sec TOTAL (2) : 1.981157 sec TOTAL (3) : 5.734920 sec *********************************************************************** real 0m8.211s user 0m8.072s sys 0m0.137s Note that epoch1 is always a bit faster... Epoch1: time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 7.710680e+00 ) sec TotalTime[Rambo+ME] (23) = ( 7.382994e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.276863e-01 ) sec TotalTime[Rambo] (2) = ( 1.939835e+00 ) sec TotalTime[MatrixElems] (3) = ( 5.443159e+00 ) sec MeanTimeInMatrixElems = ( 4.535966e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.533969e-01 , 4.538179e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 8.159405e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 8.521551e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 1.155846e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 0a ProcInit : 0.000411 sec 0b MemAlloc : 0.074275 sec 0c GenCreat : 0.000958 sec 1a GenSeed : 0.000023 sec 1b GenRnGen : 0.327663 sec 2a RamboIni : 0.100796 sec 2b RamboFin : 1.839039 sec 3a SigmaKin : 5.443159 sec 4a DumpLoop : 0.082644 sec 8a CompStat : 0.027072 sec 9a GenDestr : 0.000104 sec 9b DumpScrn : 0.013933 sec 9c DumpJson : 0.000006 sec TOTAL : 7.910083 sec TOTAL (123) : 7.710680 sec TOTAL (23) : 7.382994 sec TOTAL (1) : 0.327686 sec TOTAL (2) : 1.939835 sec TOTAL (3) : 5.443159 sec *********************************************************************** real 0m7.939s user 0m7.790s sys 0m0.147s Conversely, epoch2 is 10% faster than epoch1 in CUDA??? Epoch2: time ./gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123)= ( 1.042367e-01 ) sec TotalTime[Rambo+ME] (23)= ( 9.679775e-02 ) sec TotalTime[RndNumGen] (1)= ( 7.438907e-03 ) sec TotalTime[Rambo] (2)= ( 8.743204e-02 ) sec TotalTime[MatrixElems] (3)= ( 9.365707e-03 ) sec MeanTimeInMatrixElems = ( 7.804756e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.767680e-04 , 7.837020e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123)= ( 6.035742e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23)= ( 6.499589e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3)= ( 6.717545e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 2.037707 sec 0a ProcInit : 0.000523 sec 0b MemAlloc : 0.035856 sec 0c GenCreat : 0.009784 sec 0d SGoodHel : 0.001597 sec 1a GenSeed : 0.000021 sec 1b GenRnGen : 0.007418 sec 2a RamboIni : 0.000088 sec 2b RamboFin : 0.000045 sec 2c CpDTHwgt : 0.007396 sec 2d CpDTHmom : 0.079903 sec 3a SigmaKin : 0.000087 sec 3b CpDTHmes : 0.009279 sec 4a DumpLoop : 0.087360 sec 8a CompStat : 0.044967 sec 9a GenDestr : 0.000068 sec 9b DumpScrn : 0.000254 sec 9c DumpJson : 0.000002 sec TOTAL : 2.322353 sec TOTAL (123) : 0.104237 sec TOTAL (23) : 0.096798 sec TOTAL (1) : 0.007439 sec TOTAL (2) : 0.087432 sec TOTAL (3) : 0.009366 sec *********************************************************************** real 0m2.630s user 0m0.426s sys 0m0.781s Epoch1: time ./gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) MatrixElements compiler = nvcc 11.0.221 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.056586e-01 ) sec TotalTime[Rambo+ME] (23) = ( 9.805914e-02 ) sec TotalTime[RndNumGen] (1) = ( 7.599440e-03 ) sec TotalTime[Rambo] (2) = ( 8.761816e-02 ) sec TotalTime[MatrixElems] (3) = ( 1.044098e-02 ) sec MeanTimeInMatrixElems = ( 8.700821e-04 ) sec [Min,Max]TimeInMatrixElems = [ 8.588060e-04 , 8.841980e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 5.954515e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.415981e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 6.025730e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071582e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** 00 CudaFree : 1.039487 sec 0a ProcInit : 0.000524 sec 0b MemAlloc : 0.035999 sec 0c GenCreat : 0.011516 sec 0d SGoodHel : 0.001738 sec 1a GenSeed : 0.000021 sec 1b GenRnGen : 0.007579 sec 2a RamboIni : 0.000098 sec 2b RamboFin : 0.000061 sec 2c CpDTHwgt : 0.007369 sec 2d CpDTHmom : 0.080091 sec 3a SigmaKin : 0.000084 sec 3b CpDTHmes : 0.010357 sec 4a DumpLoop : 0.087430 sec 8a CompStat : 0.045176 sec 9a GenDestr : 0.000067 sec 9b DumpScrn : 0.000222 sec 9c DumpJson : 0.000002 sec TOTAL : 1.327819 sec TOTAL (123) : 0.105659 sec TOTAL (23) : 0.098059 sec TOTAL (1) : 0.007599 sec TOTAL (2) : 0.087618 sec TOTAL (3) : 0.010441 sec *********************************************************************** real 0m1.636s user 0m0.523s sys 0m0.867s
…smetics) to ep2 Minimal changes in epoch1: - remove unused headers in epoch1 - remove two empty lines in the code doing the performance dump Port to epoch2 many changes from epoch1: - add omp.h in epoch2 - use the ep1 printout about '-d' also in epoch2 - use the ep1 printout about OMP_NUM_THREADS also in epoch2 - export OMP_NUM_THREADS=1 if not set also in epoch2 - initialize T() in hstMakeUnique also in epoch2 - comment out unused stdwtim also in epoch2 - add one space per line in the performance dump also in epoch2 - add OMP info in the performance dump also in epoch2 - add gcc compiler info in the performance dump also in epoch2 - return 0 at the end of main also in epoch2
…smetics) to ep2 Minimal changes in epoch1: - remove unused headers in epoch1 - remove two empty lines in the code doing the performance dump Port to epoch2 many changes from epoch1: - add omp.h in epoch2 - use the ep1 printout about '-d' also in epoch2 - use the ep1 printout about OMP_NUM_THREADS also in epoch2 - export OMP_NUM_THREADS=1 if not set also in epoch2 - initialize T() in hstMakeUnique also in epoch2 - comment out unused stdwtim also in epoch2 - add one space per line in the performance dump also in epoch2 - add OMP info in the performance dump also in epoch2 - [commented out] add gcc compiler info in the performance dump also in epoch2 - return 0 at the end of main also in epoch2 No change in performance in epoch2: c++ 1.09E6, cuda 6.71E8
…rs as in epoch1 Indeed, check.cc was not compiling in SINGLE mode otherwise: Makefile:44: CUDA_HOME is not set or is invalid. Export CUDA_HOME to compile with cuda /cvmfs/sft.cern.ch/lcg/releases/gcc/9.2.0-afc57/x86_64-centos7/bin/g++ -O3 -std=c++11 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -fopenmp -DMGONGPU_COMMONRAND_ONHOST -ffast-math -c check.cc -o check.o check.cc: In function ‘int main(int, char**)’: check.cc:312:81: error: conversion from ‘vector<float>’ to non-scalar type ‘vector<double>’ requested 312 | std::vector<double> commonRnd = commonRandomPromises[iiter].get_future().get(); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ make: *** [check.o] Error 1 Note (issue madgraph5#143) that neither epoch2 nor epoch1 build in single precision, anyway...
…piler This also requires adding Process::getCompiler to ep2 CPPProcess.cc/h. Now check.cc is identical in both epoch2 and epoch1 (and runTest.cc is almost identical, except for the test name). Will now include PR madgraph5#144 for single precision in epoch1, and will copy check.cc again (and runTest.cc with some changes). Epoch2 baseline remains epoch2 C++ 1.10e6, cuda 6.6e8 time ./check.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Random number generation = CURAND (C++ code) OMP threads / `nproc --all` = 1 / 4 MatrixElements compiler = gcc (GCC) 9.2.0 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 7.968041e+00 ) sec TotalTime[Rambo+ME] (23) = ( 7.643061e+00 ) sec TotalTime[RndNumGen] (1) = ( 3.249804e-01 ) sec TotalTime[Rambo] (2) = ( 1.928639e+00 ) sec TotalTime[MatrixElems] (3) = ( 5.714422e+00 ) sec MeanTimeInMatrixElems = ( 4.762018e-01 ) sec [Min,Max]TimeInMatrixElems = [ 4.760149e-01 , 4.765775e-01 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 7.895863e+05 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 8.231592e+05 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 1.100979e+06 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) *********************************************************************** time ./gcheck.exe -p 2048 256 12 *********************************************************************** NumBlocksPerGrid = 2048 NumThreadsPerBlock = 256 NumIterations = 12 ----------------------------------------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Random number generation = CURAND DEVICE (CUDA code) MatrixElements compiler = nvcc 11.0.221 ----------------------------------------------------------------------- NumberOfEntries = 12 TotalTime[Rnd+Rmb+ME] (123) = ( 1.044441e-01 ) sec TotalTime[Rambo+ME] (23) = ( 9.709213e-02 ) sec TotalTime[RndNumGen] (1) = ( 7.351930e-03 ) sec TotalTime[Rambo] (2) = ( 8.758798e-02 ) sec TotalTime[MatrixElems] (3) = ( 9.504147e-03 ) sec MeanTimeInMatrixElems = ( 7.920122e-04 ) sec [Min,Max]TimeInMatrixElems = [ 7.825940e-04 , 8.001750e-04 ] sec ----------------------------------------------------------------------- TotalEventsComputed = 6291456 EvtsPerSec[Rnd+Rmb+ME](123) = ( 6.023757e+07 ) sec^-1 EvtsPerSec[Rmb+ME] (23) = ( 6.479882e+07 ) sec^-1 EvtsPerSec[MatrixElems] (3) = ( 6.619696e+08 ) sec^-1 *********************************************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = ( 1.372152e-02 +- 3.269516e-06 ) GeV^0 [Min,Max]MatrixElemValue = [ 6.071581e-03 , 3.374925e-02 ] GeV^0 StdDevMatrixElemValue = ( 8.200854e-03 ) GeV^0 MeanWeight = ( 4.515827e-01 +- 0.000000e+00 ) [Min,Max]Weight = [ 4.515827e-01 , 4.515827e-01 ] StdDevWeight = ( 0.000000e+00 ) ***********************************************************************
…adgraph5#144 Note that now ep2 and ep1 runTest.cc are identical except for the test name EP2/EP1
…ferent names in epoch_process_id.h
I have decided to split this further into two PR. I have done everything except CPPProcess, but this is the most complex part (and I actually even see a minor performance differences). I will split that out in a third PR. Recap about issue #139
More in detail about this PR #149 below. In src:
In SubProcesses and below:
Note: at this stage, epoch1 is slightly faster than epoch2 in c++, but the inverse in CUDA.
First batch of changes Minimal changes in epoch1:
Port to epoch2 many changes from epoch1:
7bis) runTest.cc A large batch of additional changes (mainly in PR #144) came from fixing epoch2 check.cc to use fptype for random numbers as in epoch1. This triggered many additional checks about single precision, included in PR #144, which also includes a better treatment of NaNs. This is all at the time of this PR (after some previous ones). Then the rest will be about CPPProcess. |
Self-merging. |
This is the PR to complete issue #139.
I keep it as WIP for now, it's 80% done but still needs a few (quite important as performance-relevant) tweaks.