forked from madgraph5/madgraph4gpu
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
IMPROVE GLOBAL so that it can use many more blocks.
Much slower than LOCAL or SHARED. NB 1. Numbers are now on a 1GPU/4CPU system, throughputs are 10% higher NB 2. There is a functional bug also for LOCAL (mean ME decreases with #iterations) time ./gcheck.exe -p 16384 32 12 *************************************** NumIterations = 12 NumThreadsPerBlock = 32 NumBlocksPerGrid = 16384 --------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX Momenta memory layout = AOSOA[32] Wavefunction GPU memory = GLOBAL Curand generation = DEVICE (CUDA code) --------------------------------------- NumberOfEntries = 12 TotalTimeInWaveFuncs = 3.505193e-02 sec MeanTimeInWaveFuncs = 2.920994e-03 sec StdDevTimeInWaveFuncs = 6.692049e-05 sec MinTimeInWaveFuncs = 2.893571e-03 sec MaxTimeInWaveFuncs = 2.903941e-03 sec --------------------------------------- NumMatrixElementsComputed = 6291456 MatrixElementsPerSec = 1.794896e+08 sec^-1 *************************************** NumMatrixElements(notNan) = 6291456 MeanMatrixElemValue = 1.393760e-02 GeV^0 StdErrMatrixElemValue = 3.035624e-06 GeV^0 StdDevMatrixElemValue = 7.614188e-03 GeV^0 MinMatrixElemValue = 1.807353e-11 GeV^0 MaxMatrixElemValue = 3.374925e-02 GeV^0 *************************************** 00 CudaFree : 0.160871 sec 0a ProcInit : 0.000533 sec 0b MemAlloc : 0.078608 sec 0c GenCreat : 0.015293 sec 1a GenSeed : 0.000021 sec 1b GenRnGen : 0.007990 sec 2a RamboIni : 0.000112 sec 2b RamboFin : 0.000056 sec 2c CpDTHwgt : 0.007330 sec 2d CpDTHmom : 0.074818 sec 3a SigmaKin : 0.000111 sec 3b CpDTHmes : 0.034941 sec 4a DumpLoop : 0.022739 sec 9a DumpAll : 0.046494 sec 9b GenDestr : 0.000237 sec 9c MemFree : 0.022495 sec 9d CudReset : 0.041520 sec TOTAL : 0.514170 sec TOTAL(n-2) : 0.311779 sec *************************************** real 0m0.530s user 0m0.206s sys 0m0.313s
- Loading branch information
Showing
4 changed files
with
41 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters