-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if the second helicity filtering in fortran for mirror processes can be removed #941
Comments
The fact to have Ntry/goodhel to have a mirror index is ... as old as the matrix_madevent_group_v4.inc file (this is even the original reason of such file): Now I have added a test comparison that crash at fortran stage if this is not the case (in LTS) and in crashed for
REPORT for P1_qq_ttxqq :
|
Adding only those helicities to the fortran code does not change (at all) the cross-section (so bit to bit compatibility and really 0 helicity
versus
So the issue is occuring due to uu>uu that has some helicity that contribute only for that process But the good news is that for "gpu grouping" (so what we have now, this is indeed symmetric). |
Hi @oliviermattelaer thanks for looking at this :-) I was wondering this morning, does Madgraph include the option of polarised beams? Because one physics case where I imagine different helicity lists is that of polarised beams. Well, typically that would be e+ against e- which is not symmetric anyway, but in principle if you have something like e- against e- and the left and right beams have different polarisations, then the list of helicities are different. Far fetched, but maybe something similar is the reason why this was introduced at the time? (Reminder from yesterday's meeting: you said that the computation of helicities in fortran separately for each mirror was introduced by JohannA in one of the first bug fix releases 1.1.0 or similar, "to fix a bug" or similar, so maybe there was a use case - PS ops sorry I see you mentioned it it the post above, giving details , thanks!). Anyway, in the meantime as discussed I will look at introducing a sanity check that if two lists of helicities are computed in fortran, then they are the same. |
So the commit message of Johan is:
|
Hi Olivier, thanks again for looking at this :-) I implemented my sanity checks, I see no issues in pp->ttjj (well, actually in pp_tt012j). Now, there is one important difference: in the cudacpp generated code, remember that we split the processes (maybe this is what you mean by gpu grouping?). In particular pp -> tt 0,1,2 jets has
In particular:
So, bottomline: remember that in cudacpp generated code (including generated fortran) we do NOT mix uu_XXXX and uux_XXXX together, so if these two have separate helicities, this is handled because they are separate directories. Changing this would require a much more general handling of nprocesses>2, which so far we do not use. Rephrasing: I think that adding this sanity check should be enough, and in cudacpp we can safely keep two helicity computations in fortran and only one in cudacpp. Do you agree? |
Yes, this is what I meant by GPU grouping. Now this makes me realize that they are steps that I do not like in the way those helicities are handle and I did create a new branch(https://github.com/mg5amcnlo/mg5amcnlo/tree/goodhel) to implement some change and to allow to experiment more on this without being affected by the issue spotted for tt~. I do agree that the above tt~ case is not relevant for this plugin. Now I still need to investigate this since
The obviously side effect of this fortran refactoring is that your patching method will likely crash since the base file will be different. |
Hi Olivier, thanks. Yes good to investigate puzzles :-) Just one question, are you trying to completely remove the second helicity calculation? Or to understand whty it is needed and keep it there? I mean I'd like to understand if eventually we need to implement two helicities in cudacpp |
…k that the helicities from IMIROR=1,2 match (madgraph5#872 madgraph5#935 madgraph5#941)
… sanity check that the helicities from IMIROR=1,2 match (madgraph5#872 madgraph5#935 madgraph5#941) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/bin/internal/banner.py gg_tt.mad/bin/internal/gen_ximprove.py gg_tt.mad/bin/internal/madevent_interface.py >> CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
… from IMIROR=1,2 match (madgraph5#872 madgraph5#935 madgraph5#941)
…ties from IMIROR=1,2 match (madgraph5#872 madgraph5#935 madgraph5#941)
First step is to understand if /when it is needed.
And if the answer is "it is not needed" -> then we can remove it.
Cheers,
Olivier
On 31 Jul 2024, at 11:39, Andrea Valassi ***@***.***> wrote:
Hi Olivier, thanks. Yes good to investigate puzzles :-)
Just one question, are you trying to completely remove the second helicity calculation? Or to understand whty it is needed and keep it there? I mean I'd like to understand if eventually we need to implement two helicities in cudacpp
—
Reply to this email directly, view it on GitHub<#941 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6535S2OM56FTHTBZ27NWDZPCWGRAVCNFSM6AAAAABLUTRRYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRQGA4TOMBSGE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
With the new goodhel branch, performing additional tests (first in the SM):
Running test with sm-ckm model (like CMS often use) then the crash was on the first process (that I did test):
So I'm going to investigate that process now:
So given that the limit to be "accepted is LIMHEL=1e-8" and that here we are (after 12 events) hitting the first event with only 4 times bigger than the limit... and that many other channel (like G1, does not include 18 for process 63 (in none of the imirror), I think that this is safe to say that this is a threshold effect. So yes here non identical list are occuring and are "fine" so here a simple check of identical list on the fortran side will just lead to an un-necessary crash. (and the result will be equivalent (up to single precision calculation). To avoid such threshold check, I have updated the test to (the quite heavy now):
With this factor of "100", the above process still crashes for 4 G directory out of 183.
helicity 18: -1 -1 1 0 -1 -1 cross-section of helicity 18+12 : 0.6 (contribution from process 60 only:5.718838519992133e-12) I will ignore the crash at the refine stage, since this might be an issue on how the grid is read/... So conclusion for this crashing test are
|
Continue checking for more processes (MSSM):
|
Hi Olivier, thanks, nice finds :-) One point: in the fortran generated through cudacpp, we use LIMHEL=0, since more than two years, see #419. So there is no sensistivity to threshold effect. We had discussed this extensively at the time. For the release, I would keep this as it is (LIMHEL=0) and add the check that the two helicity lists are the same, to ensure that cudacpp and fortran tests give bit by bit the same results. From a software testing point of view, this is essential. Later on, this can be rediscussed. One can even think of computing helicities in fortran and passing them to cudacpp. That said, I still kind of favour having LIMHEL=0 in production full stop. I wonder how much performance we gain and in which specific processes by using LIMHEL>0. But definitely I do not have the full picture here. |
For sm-ckm model, using limhel=0 can giveperformance hit of a factor of 4 or more ...
So this is not negligeable.
Obviously, I do not want to have a "normal" Fortran and a "cudacpp" fortran.
You can set limhel to 0 for test if you want but in production this should be the same fortran code/design.
Cheers,
Olivier
On 2 Aug 2024, at 09:34, Andrea Valassi ***@***.***> wrote:
So given that the limit to be "accepted is LIMHEL=1e-8" and that here we are (after 12 events) hitting the first event with only 4 times bigger than the limit... and that many other channel (like G1, does not include 18 for process 63 (in none of the imirror), I think that this is safe to say that this is a threshold effect. So yes here non identical list are occuring and are "fine" so here a simple check of identical list on the fortran side will just lead to an un-necessary crash. (and the result will be equivalent (up to single precision calculation).
Hi Olivier, thanks, nice finds :-)
One point: in the fortran generated through cudacpp, we use LIMHEL=0, since more than two years, see #419<#419>. So there is no sensistivity to threshold effect. We had discussed this extensively at the time.
For the release, I would keep this as it is (LIMHEL=0) and add the check that the two helicity lists are the same, to ensure that cudacpp and fortran tests give bit by bit the same results. From a software testing point of view, this is essential.
Later on, this can be rediscussed. One can even think of computing helicities in fortran and passing them to cudacpp.
That said, I still kind of favour having LIMHEL=0 in production full stop. I wonder how much performance we gain and in which specific processes by using LIMHEL>0. But definitely I do not have the full picture here.
—
Reply to this email directly, view it on GitHub<#941 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6535Q5DZVUBYPTD5VRFULZPMZCBAVCNFSM6AAAAABLUTRRYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRUG42TSNBVGU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
So I guess that we can close this issue with the conclusion: Yes it can be removed. |
Thanks a lot Olivier :-) In the meantime while waiting for 360 etc, can we then merge #935 which adds the second reset cumulative call? Then of course this will need to be removed in 360 (I can do that at that moment, no problem), but at least in the meantime while waiting for 360 we have some internally consistent code where fortran and cudacpp are expected to give the same xsec bit by bit. I prefer to add a fix now that we need to remove later, rather than do nothing and have tests that keep failing. Esepecially because I want to test the CMS xsec issues and I need that bit by bit equality.
Note, LIMHEL=0 is now in cudacpp production (NOT just in the madgrah4gpu repo). I am ok to remove that and move it ONLY to the madgraph4gpu repo, but can we at least add a runcard that sets limhel=0? (Or maybe it exists?) I think that would be very useful for debugging. |
This is the result of ./tlau/lauX.sh -nomakeclean -ALL pp_dy012j.mad which crashed, and then manually moving to -g builds
Revert "[cmsdy] various changes in pp_dy012j while debugging madgraph5#941" This reverts commit 14335be.
Hi @oliviermattelaer ,
as discussed in #935 (comment)
This is a reminder to check if the second helicity filtering in fortran for mirror processes can be removed.
For the moment, my proposal to solve #872 via PR #935 is to
If, as you suggest, we eventually find that helicities can be computed only once in fortran, then
I assign this to you... Low priority, in my opinion
Thanks
Andrea
The text was updated successfully, but these errors were encountered: