first systematic 'launch'-like tests (and move to the latest select_color upstream) #709

valassi · 2023-06-16T05:31:20Z

This is a WIP MR with some first systematic 'launch' tests (./bin/generate_events actually) using the lauX.sh script in #683.

There are several issues (icolamp bugs) and a lot to be analysed and tuned - I will open separate issues for those.

…raph5#683

FIRST for dir in gg_tt.mad gg_ttg.mad gg_ttgg.mad gg_ttggg.mad; do ./tlau/lauX.sh -CUDA ${dir} ./tlau/lauX.sh -FORTRAN ${dir} ./tlau/lauX.sh -CPP ${dir} done THEN for d in *.mad/LAUX*; do d1=${d/gg_tt/ggtt}; mv $d tlau/logs_${d1/.mad\/LAUX/}; done git add tlau/logs_ggtt*

…ck his select_color change This will introduce again madgraph5#655 but should fix madgraph5#710

…in SELECT_COLOR

… it still takes 12 minutes... This is because the part that takes time is survey (nevents fixed?), not refine...

What costs time is the refine step... how do I change accuracy? (Is it req_acc in thr runcard?) For the tests, alternatively, consider something like this? NLO only? ./bin/generate_events --only-generation --nocompile

STARTED AT Fri Jun 16 22:42:39 CEST 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Fri Jun 16 23:07:07 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Fri Jun 16 23:15:57 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Fri Jun 16 23:25:01 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Fri Jun 16 23:28:02 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Fri Jun 16 23:31:00 CEST 2023 [Status=0]

…ected This is the price to pay for fixing instead madgraph5#710 using OLivier's select_color change STARTED AT Fri Jun 16 23:34:03 CEST 2023 ENDED AT Sat Jun 17 03:40:59 CEST 2023 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt

…raph5#710 with Olivier's select_color change The price to pay is the tmad failures in ggttgg madgraph5#655 Add to the git repo the two ggttggg FORTRAN logs that were previously failing The duration of these tests needs some tuning, the ggttggg take too long madgraph5#711 ls -ltr tlau/logs_ggtt*/*txt -rw-r--r--. 1 avalassi zg 3590 Jun 17 03:41 tlau/logs_ggtt_CUDA/output.txt -rw-r--r--. 1 avalassi zg 3588 Jun 17 03:41 tlau/logs_ggtt_FORTRAN/output.txt -rw-r--r--. 1 avalassi zg 3580 Jun 17 03:42 tlau/logs_ggtt_CPP/output.txt -rw-r--r--. 1 avalassi zg 3462 Jun 17 03:42 tlau/logs_ggttg_CUDA/output.txt -rw-r--r--. 1 avalassi zg 3571 Jun 17 03:43 tlau/logs_ggttg_FORTRAN/output.txt -rw-r--r--. 1 avalassi zg 3515 Jun 17 03:44 tlau/logs_ggttg_CPP/output.txt -rw-r--r--. 1 avalassi zg 4106 Jun 17 03:46 tlau/logs_ggttgg_CUDA/output.txt -rw-r--r--. 1 avalassi zg 4425 Jun 17 04:00 tlau/logs_ggttgg_FORTRAN/output.txt -rw-r--r--. 1 avalassi zg 4349 Jun 17 04:05 tlau/logs_ggttgg_CPP/output.txt -rw-r--r--. 1 avalassi zg 6766 Jun 17 04:50 tlau/logs_ggttggg_CUDA/output.txt -rw-r--r--. 1 avalassi zg 7069 Jun 17 20:45 tlau/logs_ggttggg_FORTRAN/output.txt -rw-r--r--. 1 avalassi zg 6967 Jun 18 01:29 tlau/logs_ggttggg_CPP/output.txt

…adgraph5#711 Revert "[launch] in lauX.sh go back to 10000 unweighted events..." This reverts commit 7021bc6. I realised that also unweighted event generation does take a very long time in these tests

…r survey and refine (8192 events, 1 iteration) Note: cmd.opts['accuracy'] comes from cmd._survey_options where cmd is in madevent_interface.py

valassi · 2023-06-18T09:00:05Z

In the latest commits I have rerun the testst after fixing #710 (using @oliviermattelaer select_color patch ... which however reintroduces #655 that will need to be fixed).

I will now work on tuning the time it takes (the tests are very long) #711

…l to other tests on itscrd80)

…derstanding of stdout

…gg as this should also affect refine?

…to understand "refine"

…gen_ximprove.py Revert "[launch] TEMPORARY tests on gg_ttgg.mad/bin/internal/gen_ximprove.py to understand "refine"" This reverts commit 9d544d0fb241dabb1df4bf813abb3a319a3b2c7c.

This skips refine but fails generation INFO: Storing parton level results No event detected. No cleaning performed! This should allow to run: cd Subprocesses; ../bin/internal/combine_events to have your events if those one are missing.

Revert "[launch] in ggttgg.mad, comment out all to_refine.append(C)" This reverts commit a1150138961f701eda26f957b01347d2edac4df9.

…mes to reduce the load...

Revert "[launch] in ggttgg.mad, select only channels with two-character Gn names to reduce the load..." This reverts commit e1138741d4305fa99b4993c66a2afab385e9dc7c.

This fails with INFO: finish refine Survey return zero cross section. Typical reasons are the following: 1) A massive s-channel particle has a width set to zero. 2) The pdf are zero for at least one of the initial state particles or you are using maxjetflavor=4 for initial state b:s. 3) The cuts are too strong. Please check/correct your param_card and/or your run_card. Zero result detected: See https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/FAQ-General-14

Revert "[launch] in ggttgg.mad, select only 3 channels at most in read_results" This reverts commit e5061ac9078a87e750832d81f648f542985dde7e.

…itscrd80 - will revert Some observations: - for ggtt and ggttg, with the same numbers of events, results vary slightly to itscrd90 (ok... different compiler) - for ggttgg, with 100 events instead of 1000, the overall time spent is essentially the same also in FORTRAN - for ggttggg, with 10 events instead of 100, the overall time is also essentially the same also in FORTRAN I will therefore move back to 100 events at least. In any case I need a way to make these tests last less time, this is totally unmanageable now for development. I am moving back by reverting to itscrd90 results. And I am keeping the lhe events file. Eventually (in a following MR) I will move to itscrd80 results, and possibly remove event files.

… 10000, 1000, 100) Revert "[launch] rerun (INTERRUPTED in the last ggttggg CPP) tlau alltees on itscrd80 - will revert" This reverts commit 151df29.

…as in the logs - will tune later on

…ute path

valassi · 2023-06-19T12:42:39Z

I will self merge this.

As suggested by @oliviermattelaer I have now permannently included his select_color change in the upstream gpucpp. THis fixes #710 - but it reintroduces #655. I have accordingly updated the CODEGEN settings.

I have also made some progress on launch-like tests for #683. THis needs some debugging and tuning, which is done in #711.

I have also added a first README_CODEGEN.txt for users. This is a temporary solution, while waiting for a full integration of cudacpp upstream.

valassi added 5 commits June 15, 2023 07:22

[nobm] improve lauX.sh script, add support for a generic process madg…

04dd58b

…raph5#683

[launch] in lauX.sh, add 'make cleanall' (madgraph5#683)

a064ea0

[launch] in lauX.sh, keep results in LAUX_<BCKEND> madgraph5#683

9f209db

[launch] in lauX.sh, add start and end date to outputs

29d9e59

valassi self-assigned this Jun 16, 2023

valassi marked this pull request as draft June 16, 2023 05:31

This was referenced Jun 16, 2023

Index of array 'icolamp' above upper bound in FORTRAN generate_events tests #710

Closed

Analyse, tune and debug 'launch' tests (automatic comparison scripts; use fewer events; etc...) #711

Open

valassi added 15 commits June 16, 2023 17:54

[launch] in tlau logs, remove all results.pkl

14148e8

[lunch] in tlau, add results.pkl to .gitignore

80c9f49

[launch] in tlau logs, remove all lhe.gz and add all .lhe files

7668e16

[launch] improve the lauX.sh script madgraph5#683

c186da4

[launch] rerun tlau gg_tt CUDA test

258c4e3

[launch] add tlau/allTees.sh script madgraph5#683

8727e9c

[launch] update CODEGEN to use Olivier's latest upstream - and add ba…

bd618c0

…ck his select_color change This will introduce again madgraph5#655 but should fix madgraph5#710

[launch] in codegen, manually fix patch.common to reflect the change …

0a2a631

…in SELECT_COLOR

[launch] regenerate 7 processes mad

95f2ab8

[launch] regenerate 7 processes sa - no change, as expected

98444ad

[launch] in lauX.sh, set a lower number of events in ggttgg and ggttggg

6930cc0

[launch] rerun ggttgg tlau - no longer crashes madgraph5#710, however…

fb1033d

… it still takes 12 minutes... This is because the part that takes time is survey (nevents fixed?), not refine...

[launch] improve lauX.sh to ensure restoring initial values

be4a4f4

[launch] in lauX.sh go back to 10000 unweighted events...

7021bc6

What costs time is the refine step... how do I change accuracy? (Is it req_acc in thr runcard?) For the tests, alternatively, consider something like this? NLO only? ./bin/generate_events --only-generation --nocompile

[launch] fix tlau/alltees

6cd117a

valassi mentioned this pull request Jun 17, 2023

Issues with runcard includes: stability, dependencies #687

Open

valassi added 5 commits June 18, 2023 10:06

[launch] go back to tuning the number of unweighted event generation m…

1bb49f7

…adgraph5#711 Revert "[launch] in lauX.sh go back to 10000 unweighted events..." This reverts commit 7021bc6. I realised that also unweighted event generation does take a very long time in these tests

[launch] tune lauX.sh script madgraph5#683 and madgraph5#711 - shorte…

fd67f47

…r survey and refine (8192 events, 1 iteration) Note: cmd.opts['accuracy'] comes from cmd._survey_options where cmd is in madevent_interface.py

valassi added 17 commits June 18, 2023 15:17

[launch] in lauX.sh, further reduce ggttggg to 100 unweighted events

c1762ac

[launch] rerun tlau alltees with fewer events on itscrd90 (in paralle…

6e78896

…l to other tests on itscrd80)

[launch] in lauX.sh add a separator and debug printouts for easier un…

fb81cff

…derstanding of stdout

[launch] in lauX.sh further decrease unweighted events to 10 in ggttg…

d5508ec

…gg as this should also affect refine?

[launch] temporary tests on gg_ttgg.mad/bin/internal/gen_ximprove.py …

2eb26e6

…to understand "refine"

[launch] Revert previous temporary tests on gg_ttgg.mad/bin/internal/…

e025186

…gen_ximprove.py Revert "[launch] TEMPORARY tests on gg_ttgg.mad/bin/internal/gen_ximprove.py to understand "refine"" This reverts commit 9d544d0fb241dabb1df4bf813abb3a319a3b2c7c.

[launch] Revert the previous change

263e0da

Revert "[launch] in ggttgg.mad, comment out all to_refine.append(C)" This reverts commit a1150138961f701eda26f957b01347d2edac4df9.

[launch] in ggttgg.mad, select only channels with two-character Gn na…

5ce3278

…mes to reduce the load...

[launch] Revert the previous change

55f235d

Revert "[launch] in ggttgg.mad, select only channels with two-character Gn names to reduce the load..." This reverts commit e1138741d4305fa99b4993c66a2afab385e9dc7c.

[launch] Revert the previous change

5fa6d1c

Revert "[launch] in ggttgg.mad, select only 3 channels at most in read_results" This reverts commit e5061ac9078a87e750832d81f648f542985dde7e.

[launch] go back to the previous results on itscrd90 for tlau (10000,…

9ac6c9d

… 10000, 1000, 100) Revert "[launch] rerun (INTERRUPTED in the last ggttggg CPP) tlau alltees on itscrd80 - will revert" This reverts commit 151df29.

[launch] in tlau/lauX.sh, use 10000, 10000, 1000, 100 events for now …

b1715fa

…as in the logs - will tune later on

[launch] in generateAndCompare.sh, ensure that <proc> is not an absol…

fc55e94

…ute path

[launch] ** COMPLETE LAUNCH ** Add a basic README_CODEGEN.txt for users

d89aea8

valassi changed the title ~~WIP: first systematic 'launch'-like tests~~ first systematic 'launch'-like tests Jun 19, 2023

valassi marked this pull request as ready for review June 19, 2023 12:38

valassi linked an issue Jun 19, 2023 that may be closed by this pull request

Index of array 'icolamp' above upper bound in FORTRAN generate_events tests #710

Closed

valassi changed the title ~~first systematic 'launch'-like tests~~ first systematic 'launch'-like tests (and move to the latest select_color upstream) Jun 19, 2023

valassi merged commit 79d8c06 into madgraph5:master Jun 19, 2023

This was referenced Jun 19, 2023

Out-of-bounds memory access in random color choice #611

Closed

Problems in ggttgg lhe files after the latest upstream changes (replacing channel by iconfig in select_color) #655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first systematic 'launch'-like tests (and move to the latest select_color upstream) #709

first systematic 'launch'-like tests (and move to the latest select_color upstream) #709

valassi commented Jun 16, 2023

valassi commented Jun 18, 2023

valassi commented Jun 19, 2023

first systematic 'launch'-like tests (and move to the latest select_color upstream) #709

first systematic 'launch'-like tests (and move to the latest select_color upstream) #709

Conversation

valassi commented Jun 16, 2023

valassi commented Jun 18, 2023

valassi commented Jun 19, 2023