Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example of a calculation with nprocesses>1 #272

Closed
valassi opened this issue Oct 21, 2021 · 10 comments
Closed

Add an example of a calculation with nprocesses>1 #272

valassi opened this issue Oct 21, 2021 · 10 comments

Comments

@valassi
Copy link
Member

valassi commented Oct 21, 2021

Just a reminder. In the epochX work #244, I kept several of the comments I had all over the code "FIXME this assumes nprocesses=1". I am not sure what the code should look like or how it should behave with nprocesses>1.

Eventually we need to add a test of one simple such example. This may even be a very simple process, as long as it has nprocesses>1.

@oliviermattelaer maybe this is for you to suggest one ;-)

Low priority anyway (Fortran integration is more important)

@valassi
Copy link
Member Author

valassi commented Oct 21, 2021

Maybe pp to tt? David suggested that it may be useful to try this anyway for other reasons.

@valassi valassi changed the title Add an example of a calculation with nprocesses>1 Add an example of a calculation with nprocesses>1 (eg pptt) Jan 20, 2022
@valassi
Copy link
Member Author

valassi commented Jan 20, 2022

Adding pptt is definitely needed - Olivier's bug #337 probably comes from this

@valassi
Copy link
Member Author

valassi commented Mar 8, 2022

In PR #396 for #343, I am removing mirror processes in uutt as done by @oliviermattelaer in branch 311. This means that now pptt subprocesses all have nprocesses==1.

Actually I have even added a static assert for nprocesses==1. To go beyond this, we should add an example with nprocesses>1.

Os, actually: was nprocesses>1 ONLY necessary for "mirror processes", and can this now be completely removed if these have been removed?... (question for @oliviermattelaer )

@valassi
Copy link
Member Author

valassi commented Jun 15, 2022

Is this also related to the difference between MAXFLOW and NCOLOR in the fortran? oliviermattelaer/mg5amc_test#21

@oliviermattelaer
Copy link
Member

So in the latest gpucpp branch.
All P directories would have exactly one matrix-element function.
This matrix-element can still be related to multiple initial/final state but all of them will have the exact same matrix-element.

(Compare to my original email, this is now done in a "decent" way meaning that identical matrix-element are correctly handle)

@valassi
Copy link
Member Author

valassi commented Mar 31, 2023

Thanks Olivier! I updated to your latest upstream code in PR #619.

This is related to the discussion we had at the meeting earlier this wek https://indico.cern.ch/event/1263518/
You showed that for instance this process changes as follows
image

I could close this as fixed by PR #619. But I will keep it open for now, maybe I will clean up the code, remove a lot of nprocesses and/or add a few cleaner comments mentioning this #272 and PR #619

valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 3, 2023
…l ok with no change

This completes the first "susy" patch: now susy_gg_tt can be generated correctly (but it does not build).
In practice, the main (only?) issue it addresses is madgraph5#622

Further patches (susy2 and possibly more) will attempt to fix these builds.

NB: At this stage, CODEGEN is still using the upstream mg5amcnlo without "split_nonidentical_grouping" (PR madgraph5#619 and madgraph5#272)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 3, 2023
NB this has two P1 subdirectories gu_ttxu/gux_ttxux! (but nprocesses=1 in each see madgraph5#272)
The code builds and runs successfully in each of the two subdirectories
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 3, 2023
NB this has a single P1 subdirectory gq_ttxq! (but nprocesses=1 in each see madgraph5#272)

The Fortran code has DSIG1 and also DSIG2.
Clearly the DSIG2 code is not correctly interfaced to our cudacpp

The code fails to build:
...
ccache g++  -O3  -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o
In file included from CPPProcess.cc:25:
coloramps.h:18:3: error: too many initializers for ‘const bool [5][4]’
   18 |   };
      |   ^
...
ccache /usr/local/cuda-12.0/bin/nvcc  -O3  -lineinfo -I. -I../../src -I../../../../../tools -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17  -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o
coloramps.h(13): error: too many initializer values

NB this is still using the old upstream mg5amcnlo before Olivier's split_nonidentical_grouping (see madgraph5#619)
I have previewed that after merging that, there will be two separate P1 subdirectories also in .mad (as in .sa)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 3, 2023
This is
  cmd="define q = u c d s u~ c~ d~ s~; generate g q > t t~ l- l+ q"
This currently generates successfully, but has some formatting issues in codegen

Note: this has several DSIG1, DSIG2, DSIG3, DSIG4 in the Fortran for .mad, but still nprocesses=1 in the CUDACPP
I had initially tested this for madgraph5#272, but in the end gq_ttq is enough (I will keep it in codegen anyway)

Note: this has four separate directories in the .sa. The build fails, eg in the first one
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttllq.sa/SubProcesses/P1_Sigma_sm_gu_ttxemepu> make -j
ccache g++  -O3  -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o
CPPProcess.cc: In function ‘void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, fptype_sv*, int)’:
CPPProcess.cc:286:85: error: wrong type argument to unary minus
  286 |       FFV2_5_0<W_ACCESS, A_ACCESS, CI_ACCESS>( w_fp[10], w_fp[6], w_fp[11], -COUPs[3], COUPs[5], &amp_fp[0] );
      |                                                                              ~~~~~~~^

(I initially thought this only hapened in SUSY processes, but actually it also happens in this SM process...)
@valassi
Copy link
Member Author

valassi commented Apr 3, 2023

In the end I have done a lot more analysis about this issue. I did not want to simply merge MR #619 and close this one. I wanted to make sure that before the MR we had a problem with nprocesses>1, and after the MR we did not.

This took several iterations and extra issues opened on the way

Anyway, my understanding now is the following (I will discuss with Olivier)

  • the issue we discuss here comes from things like gq_ttq where the initial and final q can be both u or a ubar
  • within the fortran, the "nprocesses>1" means that there are actually DSIG1 and DSIG2 in a single P1 directory... correspondingly, one could have expected nprocesses=2 in cudacpp inside a single P1 directory
  • however, one workaorund is to split this into two separate P1 directories: this is Olivier's split_nonidentical_grouping" and Upgrade to Olivier's latest gpucpp branch upstream ("split_nonidentical_grouping") #619: I have checked that in the end this gives indeed two P1 subdirectories, with only one DSIG1 in each
  • note well, Olivier's patch uses two P1 directpies ONLY if cudacpp is an exporter: so not in madevent with Fortran MEs... note also, the standalone cudacpp was ALREADY doing two separate directories... so the issue was really only madevent+cudacpp
  • note that, by going to several P1 directories, we will hit even more the issues in Add pp_tttt as an example of a process with many P1 subdirectories which need different nwf values #534

So in a way this #272 (nprocesses>1) and the #534 (more than one P1) are related, because the sam eissues can be moved from the former to the latter

In practice now

@oliviermattelaer
Copy link
Member

also it seems that some matrix element are missing (the symmetric ones) in the susy case, this need to be fix.

valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
NB this has two P1 subdirectories gu_ttxu/gux_ttxux! (but nprocesses=1 in each see madgraph5#272)
The code builds and runs successfully in each of the two subdirectories
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
NB this has a single P1 subdirectory gq_ttxq! (but nprocesses=1 in each see madgraph5#272)

The Fortran code has DSIG1 and also DSIG2.
Clearly the DSIG2 code is not correctly interfaced to our cudacpp

The code fails to build:
...
ccache g++  -O3  -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o
In file included from CPPProcess.cc:25:
coloramps.h:18:3: error: too many initializers for ‘const bool [5][4]’
   18 |   };
      |   ^
...
ccache /usr/local/cuda-12.0/bin/nvcc  -O3  -lineinfo -I. -I../../src -I../../../../../tools -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17  -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o
coloramps.h(13): error: too many initializer values

NB this is still using the old upstream mg5amcnlo before Olivier's split_nonidentical_grouping (see madgraph5#619)
I have previewed that after merging that, there will be two separate P1 subdirectories also in .mad (as in .sa)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
This is
  cmd="define q = u c d s u~ c~ d~ s~; generate g q > t t~ l- l+ q"
This currently generates successfully, but has some formatting issues in codegen

Note: this has several DSIG1, DSIG2, DSIG3, DSIG4 in the Fortran for .mad, but still nprocesses=1 in the CUDACPP
I had initially tested this for madgraph5#272, but in the end gq_ttq is enough (I will keep it in codegen anyway)

Note: this has four separate directories in the .sa. The build fails, eg in the first one
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gq_ttllq.sa/SubProcesses/P1_Sigma_sm_gu_ttxemepu> make -j
ccache g++  -O3  -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -I/usr/local/cuda-12.0/include/ -fPIC -c CPPProcess.cc -o CPPProcess.o
CPPProcess.cc: In function ‘void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, fptype_sv*, int)’:
CPPProcess.cc:286:85: error: wrong type argument to unary minus
  286 |       FFV2_5_0<W_ACCESS, A_ACCESS, CI_ACCESS>( w_fp[10], w_fp[6], w_fp[11], -COUPs[3], COUPs[5], &amp_fp[0] );
      |                                                                              ~~~~~~~^

(I initially thought this only hapened in SUSY processes, but actually it also happens in this SM process...)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
…neProcessExporter (0 in standalone, 1,2... in madevent+cudacpp)

This will later be forced to be always equal to 0 or 1 (there must be a single DSIG1!)

This is essentially what the 'nprocesses=1' assumption of madgraph5#272 is about.
But it seems that nprocesses already is always 1 in generated code (will keep that too).
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Apr 5, 2023
…1_gux_ttxux to P1_gu_ttxu

The gqttq tests fail anyway and will need to be fixed (madgraph5#630).
However, this completes the addition of gq_ttq as a new process to the repo.
In particular it includes proof that Olivier's "split_nonidentical_grouping" madgraph5#619 fixes the gqttq builds.
It also includes a lot of cleanup for "nprocesses" (madgraph5#272 and madgraph5#343)

Revert "[gqttq] retry the tmad gqttq test with the P1_gu_ttxu directory - the test continues to fail (madgraph5#630)"
This reverts commit 2dea1f7.

Revert "[gqttq] temporarely use P1_gu_ttxu instead of P1_gux_ttxux for gqttq tmad tests"
This reverts commit ea23a9a.
@valassi
Copy link
Member Author

valassi commented Apr 5, 2023

also it seems that some matrix element are missing (the symmetric ones) in the susy case, this need to be fix.

thanks Olivier... can you elaborate please? which process? expect build/runtime failures or just wrong results? I am about to close this issue, we can open a new one (unless this is covered by the other issues already open for SUSY)

@valassi
Copy link
Member Author

valassi commented Apr 7, 2023

I am finally closing this. There have been a lot of changes (and I am not 100% sure I understand all the implications), but essentially cudacpp is now configured so that

I have added some improved comments about the issues above in PR #635, which I have just merged.

One last word: having removed mirror processes and the possibility to have DSIG2, now cudacpp relies on having several P1 subdirectories. In some cases (actually not the pp_tt or gq_ttq processes above), there are problems because some files should be different in the various P1 and they are not: see for instance pp to tttt in WIP PR #534, where the value of nwf should be different. [NB I have tested that this issue happens in pp_tttt but not in pp_tt]

For the summary of the summary, about the "example processes"

FINALLY closing this.

cc: @roiser @oliviermattelaer @zeniheisser

@valassi valassi closed this as completed Apr 7, 2023
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
…neProcessExporter (0 in standalone, 1,2... in madevent+cudacpp)

This will later be forced to be always equal to 0 or 1 (there must be a single DSIG1!)

This is essentially what the 'nprocesses=1' assumption of madgraph5/madgraph4gpu#272 is about.
But it seems that nprocesses already is always 1 in generated code (will keep that too).
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 28, 2023
…matting, and especially the build will fail.

Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200)
This uses Olivier's 'fix_mirror' branch for PR madgraph5#754

In particular
  a6731bd       Olivier Mattelaer       Wed Aug 23 13:23:12 2023 +0200  Merge branch 'fix_mirror'
  2556cdd       Olivier Mattelaer       Wed Aug 23 09:27:38 2023 +0200  avoid that mirroring is reset by the plugin

These lines fail the build (as well as clang formatting)
[NOT OK] Check formatting in: pp_tt012j.mad/SubProcesses/P0_uux_ttx/CPPProcess.cc
786c786
<     constexpr int helcolDenominators[1] = { 36,36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343)
---
>     constexpr int helcolDenominators[1] = { 36, 36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343)
The same happens in each P subdirectory.

Build errors:
  ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -O3 -lineinfo -I. -I../../src -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17  -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o
  gCPPProcess.cu(779): error: static assertion failed with "Assume nprocesses == 1"
  gCPPProcess.cu(786): error: too many initializer values
  2 errors detected in the compilation of "gCPPProcess.cu".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants