Profile (and speed up) the madevent scalar overhead to parallel ME calculations #495

valassi · 2022-06-21T05:45:36Z

In PR #494 a first summary table of madevent+cudacpp results is being assembled.

For a complex process like ggttggg, speeding up cpp by a factor 4 almost results in an overall factor 4 speedup, because the scalar part still takes a limited time. In CUDA however the overhead from the scalar madevent part is the limiting factor, and most of the CUDA speedup gets lost because the overall workflow time is dominated by the madevent overhead.

We should profile this (eg flamegraphs as Stefan suggested) and reduce it. It may be possible that patrts of this are related to the rancom choice of color #402 and helicity #403, so we should reasses when those are done. But the time spent may be elsewhere, eg in phase space sampling (in which case one option would be to move pats of this to GPU, as we do with rambo in the standalone part?)...

valassi mentioned this issue Jul 31, 2022

Prototype multi-threaded MadEvent with shared GPU offload of MEs? #500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile (and speed up) the madevent scalar overhead to parallel ME calculations #495

Profile (and speed up) the madevent scalar overhead to parallel ME calculations #495

valassi commented Jun 21, 2022

Profile (and speed up) the madevent scalar overhead to parallel ME calculations #495

Profile (and speed up) the madevent scalar overhead to parallel ME calculations #495

Comments

valassi commented Jun 21, 2022