Require one event at a time #1233

sethrj · 2024-05-13T18:40:46Z

Currently track IDs are constructed within an event based on an atomic counter. Even though the results (track slots, etc.) are reproducible across runs, the individual track IDs are not: so you can't dig into a particular history using an a priori track ID. We could simplify reproducibility by requiring all tracks in flight to be from the same event (i.e., the usual way we integrate into Geant4).

This also would eliminate the need for max_events (#1058). We could store the single event ID on the "core state" object since we don't need access to it on GPU. This would also reduce the memory requirements by sizeof(int) * (initializers.size() + state.size()).

The text was updated successfully, but these errors were encountered:

sethrj · 2024-07-09T01:46:33Z

From @amandalund :

with our standalone app I’ve typically seen better performance transporting multiple events at a time (e.g. about ~2x faster for the cms2018+field+msc-vecgeom-gpu regression problem with merge_events=true ). Our regression suite merges events for celer-sim on the gpu, right? I don’t think I’ve run those problems in single-event mode in a while, but it would be good to compare. I’d still probably prefer to keep that capability.

My follow up:

fair enough! I think it'd also be good to run with a few more events (to reduce the load imbalance from each core having only a single event) and make sure that the merge_events=false case has optimal track slots

amandalund · 2024-09-15T21:13:37Z

Here's where the speedup running one event at a time stands with all the latest performance improvements (but not partiitioning by charge when running with merge_events off, since it doesn't help there):

Still a bit slower when not combining events, but I haven't tried increasing the total number of events or optimizing the number of track slots (which may also be problem dependent).

EDIT: Here's the same plot but with 4x the number of track slots when merge_events is off:

and the same as above but partitioning by charge with merge_events off as well:

sethrj · 2024-09-16T12:23:34Z

@amandalund Why doesn't partitioning by charge help? 😕 (I don't think it helps GPU+G4 since we don't yet pass that option through right?) And is this after #1405? And so this is just the regression problems with and without "merge event"?

amandalund · 2024-09-16T12:47:44Z

Right, toggling merge_events and after #1405. And yeah, we don't pass the option to celer-g4 so shouldn't expect any difference there. Here's the speedup partitioning charge with merge_events off (this is with 16 CPU cores and one A100):

It helps a bit in some cases, but more often seems to hurt... I'm not really sure why that is yet.

EDIT: Same plot as above but with 4x the number of track slots:

amandalund · 2024-09-16T18:37:39Z

Ok @sethrj good news: I played around with different numbers of tracks slots, and increasing this number does improve performance. I added plots above using 4x as many track slots.

With more slots the partitioning does help, and continues to help relatively more the larger the state size gets (though the overall performance gets worse past a certain point), so could be that with a larger state we get less mixing for longer.

sethrj · 2024-09-16T19:17:01Z

@amandalund To maintain the memory limits (especially with CMS and on V100) I have the runner script reduce the number of track slots when merge_events is off on GPU. Can you verify from the output what are the reported track slot count and number of primaries per thread?

EDIT: and on an unrelated note, do you mind setting transparent=False for the PNGs that you upload to github? They render like this with a dark background 😅

amandalund · 2024-09-16T19:36:16Z

2^18 track slots and 1300 primaries per thread (I know for celer-sim we do that division in the Runner class).

sethrj · 2024-09-16T19:47:41Z

Oh ok that's great, I was afraid that was with the 2^20 number .

sethrj added the physics Particles, processes, and stepping algorithms label May 13, 2024

sethrj changed the title ~~Make track IDs reproducible across runs~~ Require one event at a time to make track IDs reproducible across runs Jun 27, 2024

sethrj changed the title ~~Require one event at a time to make track IDs reproducible across runs~~ Require one event at a time Jun 27, 2024

sethrj mentioned this issue Sep 4, 2024

Refactor primary construction into aux data + generator #1390

Open

amandalund mentioned this issue Sep 28, 2024

Add track_order option to celer-g4 and default to partitioning by charge on GPU #1433

Merged

sethrj mentioned this issue Oct 14, 2024

Define a "unique event ID" different from event counter #1447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require one event at a time #1233

Require one event at a time #1233

sethrj commented May 13, 2024 •

edited

Loading

sethrj commented Jul 9, 2024

amandalund commented Sep 15, 2024 •

edited

Loading

sethrj commented Sep 16, 2024

amandalund commented Sep 16, 2024 •

edited

Loading

amandalund commented Sep 16, 2024

sethrj commented Sep 16, 2024 •

edited

Loading

amandalund commented Sep 16, 2024

sethrj commented Sep 16, 2024

Require one event at a time #1233

Require one event at a time #1233

Comments

sethrj commented May 13, 2024 • edited Loading

sethrj commented Jul 9, 2024

amandalund commented Sep 15, 2024 • edited Loading

sethrj commented Sep 16, 2024

amandalund commented Sep 16, 2024 • edited Loading

amandalund commented Sep 16, 2024

sethrj commented Sep 16, 2024 • edited Loading

amandalund commented Sep 16, 2024

sethrj commented Sep 16, 2024

sethrj commented May 13, 2024 •

edited

Loading

amandalund commented Sep 15, 2024 •

edited

Loading

amandalund commented Sep 16, 2024 •

edited

Loading

sethrj commented Sep 16, 2024 •

edited

Loading