forked from madgraph5/madgraph4gpu
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement typedefs for single-precision.
Some values trigger a nan however: eg see here from 2 to 3 iterations. time ./gcheck.exe -p 2048 256 2 ************************************* NumIterations = 2 NumThreadsPerBlock = 256 NumBlocksPerGrid = 2048 ------------------------------------- Momenta memory layout = AOSOA[32] Wavefunction GPU memory = LOCAL Curand generation = DEVICE (CUDA code) ------------------------------------- NumberOfEntries = 2 TotalTimeInWaveFuncs = 8.336140e-04 sec MeanTimeInWaveFuncs = 4.168070e-04 sec StdDevTimeInWaveFuncs = 4.385687e-05 sec MinTimeInWaveFuncs = 3.729500e-04 sec MaxTimeInWaveFuncs = 3.729500e-04 sec ------------------------------------- ProcessID: = 10883 NProcesses = 1 NumMatrixElements = 1048576 MatrixElementsPerSec = 1.257868e+09 sec^-1 ************************************* NumMatrixElements = 1048576 MeanMatrixElemValue = 1.369856e-02 GeV^0 StdErrMatrixElemValue = 8.025736e-06 GeV^0 StdDevMatrixElemValue = 8.218354e-03 GeV^0 MinMatrixElemValue = 2.904703e-03 GeV^0 MaxMatrixElemValue = 3.983529e-02 GeV^0 ************************************* 00 CudaFree : 0.142726 sec 0a ProcInit : 0.000587 sec 0b MemAlloc : 0.022513 sec 0c GenCreat : 0.014579 sec 1a GenSeed : 0.000006 sec 1b GenRnGen : 0.001332 sec 2a RamboIni : 0.000034 sec 2b RamboFin : 0.000011 sec 2c CpDTHwgt : 0.000652 sec 2d CpDTHmom : 0.005778 sec 3a SigmaKin : 0.000021 sec 3b CpDTHmes : 0.001626 sec 4a DumpLoop : 0.003264 sec 9a DumpAll : 0.004115 sec 9b GenDestr : 0.000191 sec 9c MemFree : 0.008790 sec 9d CudReset : 0.040436 sec TOTAL : 0.246659 sec ************************************* real 0m0.257s user 0m0.066s sys 0m0.189s time ./gcheck.exe -p 2048 256 3 ************************************* NumIterations = 3 NumThreadsPerBlock = 256 NumBlocksPerGrid = 2048 ------------------------------------- Momenta memory layout = AOSOA[32] Wavefunction GPU memory = LOCAL Curand generation = DEVICE (CUDA code) ------------------------------------- NumberOfEntries = 3 TotalTimeInWaveFuncs = 1.181985e-03 sec MeanTimeInWaveFuncs = 3.939950e-04 sec StdDevTimeInWaveFuncs = 3.520531e-05 sec MinTimeInWaveFuncs = 3.679280e-04 sec MaxTimeInWaveFuncs = 3.679280e-04 sec ------------------------------------- ProcessID: = 10878 NProcesses = 1 NumMatrixElements = 1572864 MatrixElementsPerSec = 1.330697e+09 sec^-1 ************************************* NumMatrixElements = 1572864 MeanMatrixElemValue = nan GeV^0 StdErrMatrixElemValue = nan GeV^0 StdDevMatrixElemValue = nan GeV^0 MinMatrixElemValue = 2.904703e-03 GeV^0 MaxMatrixElemValue = 4.248643e-02 GeV^0 ************************************* 00 CudaFree : 0.152579 sec 0a ProcInit : 0.000604 sec 0b MemAlloc : 0.024280 sec 0c GenCreat : 0.014731 sec 1a GenSeed : 0.000008 sec 1b GenRnGen : 0.001941 sec 2a RamboIni : 0.000041 sec 2b RamboFin : 0.000014 sec 2c CpDTHwgt : 0.000985 sec 2d CpDTHmom : 0.008649 sec 3a SigmaKin : 0.000027 sec 3b CpDTHmes : 0.002310 sec 4a DumpLoop : 0.004946 sec 9a DumpAll : 0.006056 sec 9b GenDestr : 0.000193 sec 9c MemFree : 0.008837 sec 9d CudReset : 0.040970 sec TOTAL : 0.267171 sec ************************************* real 0m0.277s user 0m0.082s sys 0m0.182s
- Loading branch information
Showing
6 changed files
with
174 additions
and
158 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.