-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an example of a calculation with nprocesses>1 #272
Comments
Maybe pp to tt? David suggested that it may be useful to try this anyway for other reasons. |
Adding pptt is definitely needed - Olivier's bug #337 probably comes from this |
In PR #396 for #343, I am removing mirror processes in uutt as done by @oliviermattelaer in branch 311. This means that now pptt subprocesses all have nprocesses==1. Actually I have even added a static assert for nprocesses==1. To go beyond this, we should add an example with nprocesses>1. Os, actually: was nprocesses>1 ONLY necessary for "mirror processes", and can this now be completely removed if these have been removed?... (question for @oliviermattelaer ) |
Is this also related to the difference between MAXFLOW and NCOLOR in the fortran? oliviermattelaer/mg5amc_test#21 |
So in the latest gpucpp branch. (Compare to my original email, this is now done in a "decent" way meaning that identical matrix-element are correctly handle) |
Thanks Olivier! I updated to your latest upstream code in PR #619. This is related to the discussion we had at the meeting earlier this wek https://indico.cern.ch/event/1263518/ I could close this as fixed by PR #619. But I will keep it open for now, maybe I will clean up the code, remove a lot of nprocesses and/or add a few cleaner comments mentioning this #272 and PR #619 |
…l ok with no change This completes the first "susy" patch: now susy_gg_tt can be generated correctly (but it does not build). In practice, the main (only?) issue it addresses is madgraph5#622 Further patches (susy2 and possibly more) will attempt to fix these builds. NB: At this stage, CODEGEN is still using the upstream mg5amcnlo without "split_nonidentical_grouping" (PR madgraph5#619 and madgraph5#272)
NB this has two P1 subdirectories gu_ttxu/gux_ttxux! (but nprocesses=1 in each see madgraph5#272) The code builds and runs successfully in each of the two subdirectories
NB this has a single P1 subdirectory gq_ttxq! (but nprocesses=1 in each see madgraph5#272) The Fortran code has DSIG1 and also DSIG2. Clearly the DSIG2 code is not correctly interfaced to our cudacpp The code fails to build: ... ccache g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=skylake-avx512 -mprefer-vector-width=256 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o In file included from CPPProcess.cc:25: coloramps.h:18:3: error: too many initializers for ‘const bool [5][4]’ 18 | }; | ^ ... ccache /usr/local/cuda-12.0/bin/nvcc -O3 -lineinfo -I. -I../../src -I../../../../../tools -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o coloramps.h(13): error: too many initializer values NB this is still using the old upstream mg5amcnlo before Olivier's split_nonidentical_grouping (see madgraph5#619) I have previewed that after merging that, there will be two separate P1 subdirectories also in .mad (as in .sa)
This is cmd="define q = u c d s u~ c~ d~ s~; generate g q > t t~ l- l+ q" This currently generates successfully, but has some formatting issues in codegen Note: this has several DSIG1, DSIG2, DSIG3, DSIG4 in the Fortran for .mad, but still nprocesses=1 in the CUDACPP I had initially tested this for madgraph5#272, but in the end gq_ttq is enough (I will keep it in codegen anyway) Note: this has four separate directories in the .sa. The build fails, eg in the first one [avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttllq.sa/SubProcesses/P1_Sigma_sm_gu_ttxemepu> make -j ccache g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=skylake-avx512 -mprefer-vector-width=256 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o CPPProcess.cc: In function ‘void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, fptype_sv*, int)’: CPPProcess.cc:286:85: error: wrong type argument to unary minus 286 | FFV2_5_0<W_ACCESS, A_ACCESS, CI_ACCESS>( w_fp[10], w_fp[6], w_fp[11], -COUPs[3], COUPs[5], &_fp[0] ); | ~~~~~~~^ (I initially thought this only hapened in SUSY processes, but actually it also happens in this SM process...)
In the end I have done a lot more analysis about this issue. I did not want to simply merge MR #619 and close this one. I wanted to make sure that before the MR we had a problem with nprocesses>1, and after the MR we did not. This took several iterations and extra issues opened on the way
Anyway, my understanding now is the following (I will discuss with Olivier)
So in a way this #272 (nprocesses>1) and the #534 (more than one P1) are related, because the sam eissues can be moved from the former to the latter In practice now
|
also it seems that some matrix element are missing (the symmetric ones) in the susy case, this need to be fix. |
NB this has two P1 subdirectories gu_ttxu/gux_ttxux! (but nprocesses=1 in each see madgraph5#272) The code builds and runs successfully in each of the two subdirectories
NB this has a single P1 subdirectory gq_ttxq! (but nprocesses=1 in each see madgraph5#272) The Fortran code has DSIG1 and also DSIG2. Clearly the DSIG2 code is not correctly interfaced to our cudacpp The code fails to build: ... ccache g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=skylake-avx512 -mprefer-vector-width=256 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o In file included from CPPProcess.cc:25: coloramps.h:18:3: error: too many initializers for ‘const bool [5][4]’ 18 | }; | ^ ... ccache /usr/local/cuda-12.0/bin/nvcc -O3 -lineinfo -I. -I../../src -I../../../../../tools -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o coloramps.h(13): error: too many initializer values NB this is still using the old upstream mg5amcnlo before Olivier's split_nonidentical_grouping (see madgraph5#619) I have previewed that after merging that, there will be two separate P1 subdirectories also in .mad (as in .sa)
This is cmd="define q = u c d s u~ c~ d~ s~; generate g q > t t~ l- l+ q" This currently generates successfully, but has some formatting issues in codegen Note: this has several DSIG1, DSIG2, DSIG3, DSIG4 in the Fortran for .mad, but still nprocesses=1 in the CUDACPP I had initially tested this for madgraph5#272, but in the end gq_ttq is enough (I will keep it in codegen anyway) Note: this has four separate directories in the .sa. The build fails, eg in the first one [avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttllq.sa/SubProcesses/P1_Sigma_sm_gu_ttxemepu> make -j ccache g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=skylake-avx512 -mprefer-vector-width=256 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o CPPProcess.cc: In function ‘void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, fptype_sv*, int)’: CPPProcess.cc:286:85: error: wrong type argument to unary minus 286 | FFV2_5_0<W_ACCESS, A_ACCESS, CI_ACCESS>( w_fp[10], w_fp[6], w_fp[11], -COUPs[3], COUPs[5], &_fp[0] ); | ~~~~~~~^ (I initially thought this only hapened in SUSY processes, but actually it also happens in this SM process...)
…neProcessExporter (0 in standalone, 1,2... in madevent+cudacpp) This will later be forced to be always equal to 0 or 1 (there must be a single DSIG1!) This is essentially what the 'nprocesses=1' assumption of madgraph5#272 is about. But it seems that nprocesses already is always 1 in generated code (will keep that too).
…proc and codegen_procid (for sanity checks only madgraph5#272 and madgraph5#619)
…oc and codegen_procid (for sanity checks only madgraph5#272 and madgraph5#619)
…me process.nprocesses == 1" (madgraph5#272 and madgraph5#343)
…1_gux_ttxux to P1_gu_ttxu The gqttq tests fail anyway and will need to be fixed (madgraph5#630). However, this completes the addition of gq_ttq as a new process to the repo. In particular it includes proof that Olivier's "split_nonidentical_grouping" madgraph5#619 fixes the gqttq builds. It also includes a lot of cleanup for "nprocesses" (madgraph5#272 and madgraph5#343) Revert "[gqttq] retry the tmad gqttq test with the P1_gu_ttxu directory - the test continues to fail (madgraph5#630)" This reverts commit 2dea1f7. Revert "[gqttq] temporarely use P1_gu_ttxu instead of P1_gux_ttxux for gqttq tmad tests" This reverts commit ea23a9a.
thanks Olivier... can you elaborate please? which process? expect build/runtime failures or just wrong results? I am about to close this issue, we can open a new one (unless this is covered by the other issues already open for SUSY) |
I am finally closing this. There have been a lot of changes (and I am not 100% sure I understand all the implications), but essentially cudacpp is now configured so that
I have added some improved comments about the issues above in PR #635, which I have just merged. One last word: having removed mirror processes and the possibility to have DSIG2, now cudacpp relies on having several P1 subdirectories. In some cases (actually not the pp_tt or gq_ttq processes above), there are problems because some files should be different in the various P1 and they are not: see for instance pp to tttt in WIP PR #534, where the value of nwf should be different. [NB I have tested that this issue happens in pp_tttt but not in pp_tt] For the summary of the summary, about the "example processes"
FINALLY closing this. |
…neProcessExporter (0 in standalone, 1,2... in madevent+cudacpp) This will later be forced to be always equal to 0 or 1 (there must be a single DSIG1!) This is essentially what the 'nprocesses=1' assumption of madgraph5/madgraph4gpu#272 is about. But it seems that nprocesses already is always 1 in generated code (will keep that too).
…proc and codegen_procid (for sanity checks only madgraph5/madgraph4gpu#272 and madgraph5/madgraph4gpu#619)
…oc and codegen_procid (for sanity checks only madgraph5/madgraph4gpu#272 and madgraph5/madgraph4gpu#619)
…me process.nprocesses == 1" (madgraph5/madgraph4gpu#272 and madgraph5/madgraph4gpu#343)
…matting, and especially the build will fail. Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200) This uses Olivier's 'fix_mirror' branch for PR madgraph5#754 In particular a6731bd Olivier Mattelaer Wed Aug 23 13:23:12 2023 +0200 Merge branch 'fix_mirror' 2556cdd Olivier Mattelaer Wed Aug 23 09:27:38 2023 +0200 avoid that mirroring is reset by the plugin These lines fail the build (as well as clang formatting) [NOT OK] Check formatting in: pp_tt012j.mad/SubProcesses/P0_uux_ttx/CPPProcess.cc 786c786 < constexpr int helcolDenominators[1] = { 36,36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343) --- > constexpr int helcolDenominators[1] = { 36, 36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343) The same happens in each P subdirectory. Build errors: ccache /usr/local/cuda-12.0/bin/nvcc -Xcompiler -O3 -lineinfo -I. -I../../src -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o gCPPProcess.cu(779): error: static assertion failed with "Assume nprocesses == 1" gCPPProcess.cu(786): error: too many initializer values 2 errors detected in the compilation of "gCPPProcess.cu".
Just a reminder. In the epochX work #244, I kept several of the comments I had all over the code "FIXME this assumes nprocesses=1". I am not sure what the code should look like or how it should behave with nprocesses>1.
Eventually we need to add a test of one simple such example. This may even be a very simple process, as long as it has nprocesses>1.
@oliviermattelaer maybe this is for you to suggest one ;-)
Low priority anyway (Fortran integration is more important)
The text was updated successfully, but these errors were encountered: