You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In PR #494 a first summary table of madevent+cudacpp results is being assembled.
For a complex process like ggttggg, speeding up cpp by a factor 4 almost results in an overall factor 4 speedup, because the scalar part still takes a limited time. In CUDA however the overhead from the scalar madevent part is the limiting factor, and most of the CUDA speedup gets lost because the overall workflow time is dominated by the madevent overhead.
We should profile this (eg flamegraphs as Stefan suggested) and reduce it. It may be possible that patrts of this are related to the rancom choice of color #402 and helicity #403, so we should reasses when those are done. But the time spent may be elsewhere, eg in phase space sampling (in which case one option would be to move pats of this to GPU, as we do with rambo in the standalone part?)...
The text was updated successfully, but these errors were encountered:
In PR #494 a first summary table of madevent+cudacpp results is being assembled.
For a complex process like ggttggg, speeding up cpp by a factor 4 almost results in an overall factor 4 speedup, because the scalar part still takes a limited time. In CUDA however the overhead from the scalar madevent part is the limiting factor, and most of the CUDA speedup gets lost because the overall workflow time is dominated by the madevent overhead.
We should profile this (eg flamegraphs as Stefan suggested) and reduce it. It may be possible that patrts of this are related to the rancom choice of color #402 and helicity #403, so we should reasses when those are done. But the time spent may be elsewhere, eg in phase space sampling (in which case one option would be to move pats of this to GPU, as we do with rambo in the standalone part?)...
The text was updated successfully, but these errors were encountered: