Skip to content

Commit

Permalink
[june24] regenerate all processes, force WARP_SIZE=32 and NB_WARP=512…
Browse files Browse the repository at this point in the history
… i.e. VECSIZE_MEMMAX=16384 (see #887 and #885)
  • Loading branch information
valassi committed Jul 5, 2024
1 parent 2aa94e1 commit bede049
Show file tree
Hide file tree
Showing 46 changed files with 244 additions and 244 deletions.
16 changes: 8 additions & 8 deletions epochX/cudacpp/ee_mumu.mad/CODEGEN_mad_ee_mumu_log.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ generate e+ e- > mu+ mu-
No model currently active, so we import the Standard Model
INFO: load particles
INFO: load vertices
DEBUG: model prefixing takes 0.0056269168853759766 
DEBUG: model prefixing takes 0.005285739898681641 
INFO: Restrict model sm with file models/sm/restrict_default.dat .
DEBUG: Simplifying conditional expressions 
DEBUG: remove interactions: u s w+ at order: QED=1 
Expand Down Expand Up @@ -177,7 +177,7 @@ INFO: Generating Helas calls for process: e+ e- > mu+ mu- WEIGHTED<=4 @1
INFO: Processing color information for process: e+ e- > mu+ mu- @1
INFO: Creating files in directory P1_epem_mupmum
DEBUG: kwargs[prefix] = 0 [model_handling.py at line 1151] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7f77222998b0> [export_v4.py at line 6304] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7f6d8fdd98b0> [export_v4.py at line 6304] 
INFO: Creating files in directory .
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.h
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.cc
Expand All @@ -203,19 +203,19 @@ INFO: Finding symmetric diagrams for subprocess group epem_mupmum
DEBUG: split[i] =  {true};}; [model_handling.py at line 1589] 
DEBUG: split[i] =  {true}, // iconfigC=1, diag=2 [model_handling.py at line 1594] 
Generated helas calls for 1 subprocesses (2 diagrams) in 0.004 s
Wrote files for 8 helas calls in 0.114 s
Wrote files for 8 helas calls in 0.111 s
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates FFV2 routines
ALOHA: aloha creates FFV4 routines
ALOHA: aloha creates 3 routines in 0.203 s
ALOHA: aloha creates 3 routines in 0.201 s
DEBUG: Entering PLUGIN_ProcessExporter.convert_model (create the model) [output.py at line 208] 
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates FFV2 routines
ALOHA: aloha creates FFV4 routines
ALOHA: aloha creates FFV2_4 routines
ALOHA: aloha creates 7 routines in 0.267 s
ALOHA: aloha creates 7 routines in 0.258 s
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV2
Expand Down Expand Up @@ -260,9 +260,9 @@ Type "launch" to generate events from this process, or see
Run "open index.html" to see more information about this process.
quit

real 0m2.104s
user 0m1.833s
sys 0m0.259s
real 0m2.117s
user 0m1.772s
sys 0m0.265s
Code generation completed in 2 seconds
************************************************************
* *
Expand Down
4 changes: 2 additions & 2 deletions epochX/cudacpp/ee_mumu.mad/Cards/run_card.dat
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@
-O3 -ffast-math -fbounds-check = global_flag ! build flags for all Fortran code (for a fair comparison to cudacpp; default is -O)
--fast-math = aloha_flag ! fortran optimization flag for aloha function. Suggestions: '-ffast-math'
-O3 = matrix_flag ! fortran optimization flag for matrix.f function. Suggestions: '-O3'
16384 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
1 = nb_warp ! total number of warp/frontwave
32 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
512 = nb_warp ! total number of warp/frontwave

#*********************************************************************
# Customization (custom cuts/scale/bias/...) *
Expand Down
4 changes: 2 additions & 2 deletions epochX/cudacpp/ee_mumu.mad/Source/vector.inc
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ C BECAUSE IT DOES NOT GO THROUGH THE CPP PREPROCESSOR
C (see https://github.com/madgraph5/madgraph4gpu/issues/458).
C
INTEGER WARP_SIZE
PARAMETER (WARP_SIZE=16384)
PARAMETER (WARP_SIZE=32)
INTEGER NB_WARP
PARAMETER (NB_WARP=1)
PARAMETER (NB_WARP=512)
INTEGER VECSIZE_MEMMAX
PARAMETER (VECSIZE_MEMMAX=16384)
12 changes: 6 additions & 6 deletions epochX/cudacpp/ee_mumu.sa/CODEGEN_cudacpp_ee_mumu_log.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ generate e+ e- > mu+ mu-
No model currently active, so we import the Standard Model
INFO: load particles
INFO: load vertices
DEBUG: model prefixing takes 0.00566411018371582 
DEBUG: model prefixing takes 0.005351543426513672 
INFO: Restrict model sm with file models/sm/restrict_default.dat .
DEBUG: Simplifying conditional expressions 
DEBUG: remove interactions: u s w+ at order: QED=1 
Expand Down Expand Up @@ -154,7 +154,7 @@ INFO: Checking for minimal orders which gives processes.
INFO: Please specify coupling orders to bypass this step.
INFO: Trying process: e+ e- > mu+ mu- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
1 processes with 2 diagrams generated in 0.004 s
1 processes with 2 diagrams generated in 0.005 s
Total: 1 processes with 2 diagrams
output standalone_cudacpp ../TMPOUT/CODEGEN_cudacpp_ee_mumu
Load PLUGIN.CUDACPP_OUTPUT
Expand Down Expand Up @@ -184,7 +184,7 @@ ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates FFV2 routines
ALOHA: aloha creates FFV4 routines
ALOHA: aloha creates FFV2_4 routines
ALOHA: aloha creates 4 routines in 0.272 s
ALOHA: aloha creates 4 routines in 0.266 s
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV2
Expand All @@ -203,7 +203,7 @@ INFO: Created files Parameters_sm.h and Parameters_sm.cc in directory
INFO: /data/avalassi/GPU2023/madgraph4gpuX/MG5aMC/TMPOUT/CODEGEN_cudacpp_ee_mumu/src/. and /data/avalassi/GPU2023/madgraph4gpuX/MG5aMC/TMPOUT/CODEGEN_cudacpp_ee_mumu/src/.
quit

real 0m0.660s
user 0m0.599s
sys 0m0.056s
real 0m0.706s
user 0m0.584s
sys 0m0.058s
Code generation completed in 0 seconds
16 changes: 8 additions & 8 deletions epochX/cudacpp/gg_tt.mad/CODEGEN_mad_gg_tt_log.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ generate g g > t t~
No model currently active, so we import the Standard Model
INFO: load particles
INFO: load vertices
DEBUG: model prefixing takes 0.005669832229614258 
DEBUG: model prefixing takes 0.005337715148925781 
INFO: Restrict model sm with file models/sm/restrict_default.dat .
DEBUG: Simplifying conditional expressions 
DEBUG: remove interactions: u s w+ at order: QED=1 
Expand Down Expand Up @@ -178,7 +178,7 @@ INFO: Generating Helas calls for process: g g > t t~ WEIGHTED<=2 @1
INFO: Processing color information for process: g g > t t~ @1
INFO: Creating files in directory P1_gg_ttx
DEBUG: kwargs[prefix] = 0 [model_handling.py at line 1151] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7ff248791f10> [export_v4.py at line 6304] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7ff466020f10> [export_v4.py at line 6304] 
INFO: Creating files in directory .
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.h
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.cc
Expand Down Expand Up @@ -206,16 +206,16 @@ INFO: Finding symmetric diagrams for subprocess group gg_ttx
DEBUG: split[i] =  {false, true};}; [model_handling.py at line 1589] 
DEBUG: split[i] =  {false, true}, // iconfigC=2, diag=3 [model_handling.py at line 1594] 
Generated helas calls for 1 subprocesses (3 diagrams) in 0.006 s
Wrote files for 10 helas calls in 0.125 s
Wrote files for 10 helas calls in 0.121 s
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates VVV1 set of routines with options: P0
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates 2 routines in 0.152 s
ALOHA: aloha creates 2 routines in 0.143 s
DEBUG: Entering PLUGIN_ProcessExporter.convert_model (create the model) [output.py at line 208] 
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates VVV1 set of routines with options: P0
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates 4 routines in 0.137 s
ALOHA: aloha creates 4 routines in 0.132 s
<class 'aloha.create_aloha.AbstractRoutine'> VVV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
Expand Down Expand Up @@ -256,9 +256,9 @@ Type "launch" to generate events from this process, or see
Run "open index.html" to see more information about this process.
quit

real 0m1.956s
user 0m1.680s
sys 0m0.264s
real 0m1.904s
user 0m1.640s
sys 0m0.262s
Code generation completed in 2 seconds
************************************************************
* *
Expand Down
4 changes: 2 additions & 2 deletions epochX/cudacpp/gg_tt.mad/Cards/run_card.dat
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,8 @@
-O3 -ffast-math -fbounds-check = global_flag ! build flags for all Fortran code (for a fair comparison to cudacpp; default is -O)
--fast-math = aloha_flag ! fortran optimization flag for aloha function. Suggestions: '-ffast-math'
-O3 = matrix_flag ! fortran optimization flag for matrix.f function. Suggestions: '-O3'
16384 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
1 = nb_warp ! total number of warp/frontwave
32 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
512 = nb_warp ! total number of warp/frontwave

#*********************************************************************
# Customization (custom cuts/scale/bias/...) *
Expand Down
4 changes: 2 additions & 2 deletions epochX/cudacpp/gg_tt.mad/Source/vector.inc
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ C BECAUSE IT DOES NOT GO THROUGH THE CPP PREPROCESSOR
C (see https://github.com/madgraph5/madgraph4gpu/issues/458).
C
INTEGER WARP_SIZE
PARAMETER (WARP_SIZE=16384)
PARAMETER (WARP_SIZE=32)
INTEGER NB_WARP
PARAMETER (NB_WARP=1)
PARAMETER (NB_WARP=512)
INTEGER VECSIZE_MEMMAX
PARAMETER (VECSIZE_MEMMAX=16384)
10 changes: 5 additions & 5 deletions epochX/cudacpp/gg_tt.sa/CODEGEN_cudacpp_gg_tt_log.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ generate g g > t t~
No model currently active, so we import the Standard Model
INFO: load particles
INFO: load vertices
DEBUG: model prefixing takes 0.005598783493041992 
DEBUG: model prefixing takes 0.0054721832275390625 
INFO: Restrict model sm with file models/sm/restrict_default.dat .
DEBUG: Simplifying conditional expressions 
DEBUG: remove interactions: u s w+ at order: QED=1 
Expand Down Expand Up @@ -183,7 +183,7 @@ Generated helas calls for 1 subprocesses (3 diagrams) in 0.006 s
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates VVV1 set of routines with options: P0
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates 2 routines in 0.147 s
ALOHA: aloha creates 2 routines in 0.142 s
<class 'aloha.create_aloha.AbstractRoutine'> VVV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
Expand All @@ -198,7 +198,7 @@ INFO: Created files Parameters_sm.h and Parameters_sm.cc in directory
INFO: /data/avalassi/GPU2023/madgraph4gpuX/MG5aMC/TMPOUT/CODEGEN_cudacpp_gg_tt/src/. and /data/avalassi/GPU2023/madgraph4gpuX/MG5aMC/TMPOUT/CODEGEN_cudacpp_gg_tt/src/.
quit

real 0m0.584s
user 0m0.496s
sys 0m0.040s
real 0m0.549s
user 0m0.470s
sys 0m0.056s
Code generation completed in 0 seconds
22 changes: 11 additions & 11 deletions epochX/cudacpp/gg_tt01g.mad/CODEGEN_mad_gg_tt01g_log.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ generate g g > t t~
No model currently active, so we import the Standard Model
INFO: load particles
INFO: load vertices
DEBUG: model prefixing takes 0.005761623382568359 
DEBUG: model prefixing takes 0.00555872917175293 
INFO: Restrict model sm with file models/sm/restrict_default.dat .
DEBUG: Simplifying conditional expressions 
DEBUG: remove interactions: u s w+ at order: QED=1 
Expand Down Expand Up @@ -188,7 +188,7 @@ INFO: Generating Helas calls for process: g g > t t~ WEIGHTED<=2 @1
INFO: Processing color information for process: g g > t t~ @1
INFO: Creating files in directory P2_gg_ttxg
DEBUG: kwargs[prefix] = 0 [model_handling.py at line 1151] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7f2fac586280> [export_v4.py at line 6304] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7efe37f18280> [export_v4.py at line 6304] 
INFO: Creating files in directory .
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.h
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.cc
Expand Down Expand Up @@ -253,7 +253,7 @@ INFO: Finding symmetric diagrams for subprocess group gg_ttxg
DEBUG: split[i] =  {true, true, false, true, false, true}, // iconfigC=14, diag=15 [model_handling.py at line 1594] 
INFO: Creating files in directory P1_gg_ttx
DEBUG: kwargs[prefix] = 0 [model_handling.py at line 1151] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7f2fac581790> [export_v4.py at line 6304] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7efe37f13790> [export_v4.py at line 6304] 
INFO: Creating files in directory .
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.h
FileWriter <class 'PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for ././CPPProcess.cc
Expand All @@ -280,23 +280,23 @@ INFO: Finding symmetric diagrams for subprocess group gg_ttx
DEBUG: split[i] =  {false,true};}; [model_handling.py at line 1587] 
DEBUG: split[i] =  {false, true};}; [model_handling.py at line 1589] 
DEBUG: split[i] =  {false, true}, // iconfigC=2, diag=3 [model_handling.py at line 1594] 
Generated helas calls for 2 subprocesses (19 diagrams) in 0.044 s
Wrote files for 46 helas calls in 0.393 s
Generated helas calls for 2 subprocesses (19 diagrams) in 0.042 s
Wrote files for 46 helas calls in 0.377 s
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates VVV1 routines
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates VVVV1 set of routines with options: P0
ALOHA: aloha creates VVVV3 set of routines with options: P0
ALOHA: aloha creates VVVV4 set of routines with options: P0
ALOHA: aloha creates 5 routines in 0.329 s
ALOHA: aloha creates 5 routines in 0.320 s
DEBUG: Entering PLUGIN_ProcessExporter.convert_model (create the model) [output.py at line 208] 
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates VVV1 routines
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates VVVV1 set of routines with options: P0
ALOHA: aloha creates VVVV3 set of routines with options: P0
ALOHA: aloha creates VVVV4 set of routines with options: P0
ALOHA: aloha creates 10 routines in 0.314 s
ALOHA: aloha creates 10 routines in 0.306 s
<class 'aloha.create_aloha.AbstractRoutine'> VVV1
<class 'aloha.create_aloha.AbstractRoutine'> VVV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
Expand Down Expand Up @@ -351,10 +351,10 @@ Type "launch" to generate events from this process, or see
Run "open index.html" to see more information about this process.
quit

real 0m3.014s
user 0m2.509s
sys 0m0.302s
Code generation completed in 3 seconds
real 0m2.742s
user 0m2.421s
sys 0m0.313s
Code generation completed in 2 seconds
************************************************************
* *
* W E L C O M E to *
Expand Down
4 changes: 2 additions & 2 deletions epochX/cudacpp/gg_tt01g.mad/Cards/run_card.dat
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@
-O3 -ffast-math -fbounds-check = global_flag ! build flags for all Fortran code (for a fair comparison to cudacpp; default is -O)
--fast-math = aloha_flag ! fortran optimization flag for aloha function. Suggestions: '-ffast-math'
-O3 = matrix_flag ! fortran optimization flag for matrix.f function. Suggestions: '-O3'
16384 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
1 = nb_warp ! total number of warp/frontwave
32 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
512 = nb_warp ! total number of warp/frontwave

#*********************************************************************
# Customization (custom cuts/scale/bias/...) *
Expand Down
4 changes: 2 additions & 2 deletions epochX/cudacpp/gg_tt01g.mad/Source/vector.inc
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ C BECAUSE IT DOES NOT GO THROUGH THE CPP PREPROCESSOR
C (see https://github.com/madgraph5/madgraph4gpu/issues/458).
C
INTEGER WARP_SIZE
PARAMETER (WARP_SIZE=16384)
PARAMETER (WARP_SIZE=32)
INTEGER NB_WARP
PARAMETER (NB_WARP=1)
PARAMETER (NB_WARP=512)
INTEGER VECSIZE_MEMMAX
PARAMETER (VECSIZE_MEMMAX=16384)
Loading

0 comments on commit bede049

Please sign in to comment.