Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuRand instead of CPU random numbers #4

Closed
roiser opened this issue Aug 12, 2020 · 2 comments
Closed

Use cuRand instead of CPU random numbers #4

roiser opened this issue Aug 12, 2020 · 2 comments
Assignees
Labels
enhancement A feature we want to develop upstream Ready to be included in the MG5 code generator

Comments

@roiser
Copy link
Member

roiser commented Aug 12, 2020

First implementation by Peter. Final done by AV

in eemumu_AV/master

Includes also the rambo part. Peter used curand device / API. AV used host / API (simpler code). AV used the fastest (not the “best”) generator.

@roiser roiser added enhancement A feature we want to develop upstream Ready to be included in the MG5 code generator labels Aug 12, 2020
@valassi
Copy link
Member

valassi commented Aug 13, 2020

Integrated here: roiser@9c10b20

Note that I am using the host API, unlike Peter that was using the device API. This is usefule because we can use the curand library also in c++ on a CPU with no GPU attached.

On a GPU with host API, the generation can be done both on the CPU host or on the GPU device. The default is now that it runs o device: this is simply a header #ifdef switch, roiser@847d158

About the choice of generator, see https://github.com/roiser/madgraph4gpu/blob/56a5b3af5a4df29a6e7020d2f3a4bc182fb5bb29/examples/gpu/eemumu_AV/src/rambo2toNm0.cc#L271

    // [NB Timings are for host generation of 32*256*1 events: rn(0) is 0.0012s]
    const curandRngType_t type = CURAND_RNG_PSEUDO_MTGP32;          // 0.0021s (FOR FAST TESTS)
    //const curandRngType_t type = CURAND_RNG_PSEUDO_XORWOW;        // 1.13s
    //const curandRngType_t type = CURAND_RNG_PSEUDO_MRG32K3A;      // 10.5s (better but slower)
    //const curandRngType_t type = CURAND_RNG_PSEUDO_MT19937;       // 43s
    //const curandRngType_t type = CURAND_RNG_PSEUDO_PHILOX4_32_10; // segfaults

The timings are the results of some tests I did while developing, The generator recommended by Lorenzo is MRG32K3A, but it is much slower. I then used MTPG32. On the other hand, we should check that this does not cause issues (eg is issue #20 related to the choice of random generator??)

@roiser roiser modified the milestone: epoch2 Nov 26, 2020
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 23, 2021
…builds.

The build fails on clang10 at compilation time

clang++: /build/gcc/build/contrib/clang-10.0.0/src/clang/10.0.0/tools/clang/lib/CodeGen/CGExpr.cpp:596: clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitReferenceBindingToExpr(const clang::Expr*): Assertion `LV.isSimple()' failed.
Stack dump:
0.      Program arguments: /cvmfs/sft.cern.ch/lcg/releases/clang/10.0.0-62e61/x86_64-centos7/bin/clang++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -fopenmp -ffast-math -march=skylake-avx512 -mprefer-vector-width=256 -I/usr/local/cuda-11.0/include/ -c CPPProcess.cc -o CPPProcess.o
1.      <eof> parser at end of file
2.      Per-file LLVM IR generation
3.      ../../src/mgOnGpuVectors.h:59:16: Generating code for declaration 'mgOnGpu::cxtype_v::operator[]'
 #0 0x0000000001af5f9a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/cvmfs/sft.cern.ch/lcg/releases/clang/10.0.0-62e61/x86_64-centos7/bin/clang+++0x1af5f9a)
 #1 0x0000000001af3d54 llvm::sys::RunSignalHandlers() (/cvmfs/sft.cern.ch/lcg/releases/clang/10.0.0-62e61/x86_64-centos7/bin/clang+++0x1af3d54)
 #2 0x0000000001af3fa9 llvm::sys::CleanupOnSignal(unsigned long) (/cvmfs/sft.cern.ch/lcg/releases/clang/10.0.0-62e61/x86_64-centos7/bin/clang+++0x1af3fa9)
 madgraph5#3 0x0000000001a6ed08 CrashRecoverySignalHandler(int) (/cvmfs/sft.cern.ch/lcg/releases/clang/10.0.0-62e61/x86_64-centos7/bin/clang+++0x1a6ed08)
 madgraph5#4 0x00007fd31c178630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 madgraph5#5 0x00007fd31ac8c3d7 raise (/lib64/libc.so.6+0x363d7)
 madgraph5#6 0x00007fd31ac8dac8 abort (/lib64/libc.so.6+0x37ac8)
 madgraph5#7 0x00007fd31ac851a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 madgraph5#8 0x00007fd31ac85252 (/lib64/libc.so.6+0x2f252)
 madgraph5#9 0x000000000203a042 clang::CodeGen::CodeGenFunction::EmitReferenceBindingToExpr(clang::Expr const*) (/cvmfs/sft.cern.ch/lcg/releases/clang/10.0.0-62e61/x86_64-centos7/bin/clang+++0x203a042)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 23, 2021
-------------------------------------------------------------------------
Process                     = EPOCH1_EEMUMU_CPP
FP precision                = DOUBLE (NaN/abnormal=0, zero=0 )
Internal loops fptype_sv    = VECTOR[1] ('none': scalar, no SIMD)
MatrixElements compiler     = clang 11.0.0
EvtsPerSec[MatrixElems] (3) = ( 1.263547e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372113e-02 +- 3.270608e-06 )  GeV^0
TOTAL       :     7.168746 sec
real    0m7.176s
=Symbols in CPPProcess.o= (~sse4: 1241) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
Process                     = EPOCH2_EEMUMU_CPP
FP precision                = DOUBLE (NaN/abnormal=0, zero=0 )
MatrixElements compiler     = clang 11.0.0
EvtsPerSec[MatrixElems] (3) = ( 1.218104e+06                 )  sec^-1
MeanMatrixElemValue         = ( 1.372113e-02 +- 3.270608e-06 )  GeV^0
TOTAL       :     7.455322 sec
real    0m7.463s
=Symbols in CPPProcess.o= (~sse4: 1165) (avx2:    0) (512y:    0) (512z:    0)
-------------------------------------------------------------------------

The build with vectors still fails also on clang11 in the same place

clang++: /build/dkonst/CONTRIB/build/contrib/clang-11.0.0/src/clang/11.0.0/clang/lib/CodeGen/CGExpr.cpp:613: clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitReferenceBindingToExpr(const clang::Expr*): Assertion `LV.isSimple()' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /cvmfs/sft.cern.ch/lcg/releases/clang/11.0.0-77a9f/x86_64-centos7/bin/clang++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -DMGONGPU_COMMONRAND_ONHOST -ffast-math -march=skylake-avx512 -mprefer-vector-width=256 -c CPPProcess.cc -o CPPProcess.o
1.      <eof> parser at end of file
2.      Per-file LLVM IR generation
3.      ../../src/mgOnGpuVectors.h:59:16: Generating code for declaration 'mgOnGpu::cxtype_v::operator[]'
 #0 0x0000000001ce208a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/cvmfs/sft.cern.ch/lcg/releases/clang/11.0.0-77a9f/x86_64-centos7/bin/clang+++0x1ce208a)
 #1 0x0000000001cdfe94 llvm::sys::RunSignalHandlers() (/cvmfs/sft.cern.ch/lcg/releases/clang/11.0.0-77a9f/x86_64-centos7/bin/clang+++0x1cdfe94)
 #2 0x0000000001c52d98 CrashRecoverySignalHandler(int) (/cvmfs/sft.cern.ch/lcg/releases/clang/11.0.0-77a9f/x86_64-centos7/bin/clang+++0x1c52d98)
 madgraph5#3 0x00007f1836000630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 madgraph5#4 0x00007f18350f13d7 raise (/lib64/libc.so.6+0x363d7)
 madgraph5#5 0x00007f18350f2ac8 abort (/lib64/libc.so.6+0x37ac8)
@valassi
Copy link
Member

valassi commented Oct 21, 2021

This is clearly done. Curand is our default, and c++ common random is our backup.

I am closing this

@valassi valassi closed this as completed Oct 21, 2021
valassi added a commit to valassi/madgraph4gpu that referenced this issue Feb 23, 2022
…ns is different for fcheck

> ./fcheck.exe  2048 64 10
 GPUBLOCKS=          2048
 GPUTHREADS=           64
 NITERATIONS=          10
WARNING! Instantiate host Bridge (nevt=131072)
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
WARNING! Instantiate host Sampler (nevt=131072)
Iteration #1
Iteration #2
Iteration madgraph5#3
Iteration madgraph5#4
Iteration madgraph5#5
Iteration madgraph5#6
Iteration madgraph5#7
Iteration madgraph5#8
Iteration madgraph5#9
WARNING! flagging abnormal ME for ievt=111162
Iteration madgraph5#10
 Average Matrix Element:   1.3716954486179133E-002
 Abnormal MEs:           1

> ./check.exe -p  2048 64 10 | grep FLOAT
FP precision                = FLOAT (NaN/abnormal=2, zero=0)

I imagine that this is because momenta in Fortran get translated from float to double and back to float, while in c++ they stay in float?
valassi added a commit to valassi/madgraph4gpu that referenced this issue May 20, 2022
…failing

patching file Source/dsample.f
Hunk madgraph5#3 FAILED at 181.
Hunk madgraph5#4 succeeded at 197 (offset 2 lines).
Hunk madgraph5#5 FAILED at 211.
Hunk madgraph5#6 succeeded at 893 (offset 3 lines).
2 out of 6 hunks FAILED -- saving rejects to file Source/dsample.f.rej
patching file SubProcesses/addmothers.f
patching file SubProcesses/cuts.f
patching file SubProcesses/makefile
Hunk madgraph5#3 FAILED at 61.
Hunk madgraph5#4 succeeded at 94 (offset 6 lines).
Hunk madgraph5#5 succeeded at 122 (offset 6 lines).
1 out of 5 hunks FAILED -- saving rejects to file SubProcesses/makefile.rej
patching file SubProcesses/reweight.f
Hunk #1 FAILED at 1782.
Hunk #2 succeeded at 1827 (offset 27 lines).
Hunk madgraph5#3 succeeded at 1841 (offset 27 lines).
Hunk madgraph5#4 succeeded at 1963 (offset 27 lines).
1 out of 4 hunks FAILED -- saving rejects to file SubProcesses/reweight.f.rej
patching file auto_dsig.f
Hunk madgraph5#6 FAILED at 301.
Hunk madgraph5#10 succeeded at 773 with fuzz 2 (offset 4 lines).
Hunk madgraph5#11 succeeded at 912 (offset 16 lines).
Hunk madgraph5#12 succeeded at 958 (offset 16 lines).
Hunk madgraph5#13 succeeded at 971 (offset 16 lines).
Hunk madgraph5#14 succeeded at 987 (offset 16 lines).
Hunk madgraph5#15 succeeded at 1006 (offset 16 lines).
Hunk madgraph5#16 succeeded at 1019 (offset 16 lines).
1 out of 16 hunks FAILED -- saving rejects to file auto_dsig.f.rej
patching file driver.f
patching file matrix1.f
patching file auto_dsig1.f
Hunk #2 succeeded at 220 (offset 7 lines).
Hunk madgraph5#3 succeeded at 290 (offset 7 lines).
Hunk madgraph5#4 succeeded at 453 (offset 8 lines).
Hunk madgraph5#5 succeeded at 464 (offset 8 lines).
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 14, 2022
./cmadevent_cudacpp < /tmp/avalassi/input_ggtt_cpp | grep DEBUG_ | sort | uniq -c
  16416  DEBUG_SMATRIX1 #1
 262656  DEBUG_SMATRIX1 #1a
      1  DEBUG_SMATRIX1 #2
  16416  DEBUG_SMATRIX1 madgraph5#4
     25  DEBUG_SMATRIX1 #4a
      1  DEBUG_SMATRIX1 #4b
  16416  DEBUG_SMATRIX1 madgraph5#7
  16416  DEBUG_SMATRIX1 madgraph5#8
jtchilders pushed a commit to jtchilders/madgraph4gpu that referenced this issue Nov 15, 2022
valassi pushed a commit to valassi/madgraph4gpu that referenced this issue Jul 13, 2023
Fix for including cuda in test compilation when compiling in HIP
valassi added a commit to valassi/madgraph4gpu that referenced this issue May 17, 2024
…#845 in log_gqttq_mad_f_inl0_hrd0.txt, the rest as expected

STARTED  AT Thu May 16 01:24:16 AM CEST 2024
(SM tests)
ENDED(1) AT Thu May 16 05:58:45 AM CEST 2024 [Status=0]
(BSM tests)
ENDED(1) AT Thu May 16 06:07:42 AM CEST 2024 [Status=0]

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
18 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

The new issue madgraph5#845 is the following
+Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
+
+Backtrace for this error:
+#0  0x7f2a1a623860 in ???
+#1  0x7f2a1a622a05 in ???
+#2  0x7f2a1a254def in ???
+madgraph5#3  0x7f2a1ae20acc in ???
+madgraph5#4  0x7f2a1acc4575 in ???
+madgraph5#5  0x7f2a1ae1d4c9 in ???
+madgraph5#6  0x7f2a1ae2570d in ???
+madgraph5#7  0x7f2a1ae2afa1 in ???
+madgraph5#8  0x43008b in ???
+madgraph5#9  0x431c10 in ???
+madgraph5#10  0x432d47 in ???
+madgraph5#11  0x433b1e in ???
+madgraph5#12  0x44a921 in ???
+madgraph5#13  0x42ebbf in ???
+madgraph5#14  0x40371e in ???
+madgraph5#15  0x7f2a1a23feaf in ???
+madgraph5#16  0x7f2a1a23ff5f in ???
+madgraph5#17  0x403844 in ???
+madgraph5#18  0xffffffffffffffff in ???
+./madX.sh: line 379: 3004240 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp}
+ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x10_cudacpp > /tmp/avalassi/output_gqttq_x10_cudacpp' failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A feature we want to develop upstream Ready to be included in the MG5 code generator
Projects
None yet
Development

No branches or pull requests

2 participants