Optimize track vector data layout for particle types #1322

amandalund · 2024-07-14T23:40:58Z

I started sketching out a possible implementation of #1312 (still a WIP). This partitiions the last num_new_tracks initializers by charged/neutral and initializes neutral and charged tracks at opposite ends of the track vector.

For now I've disabled the immediate initialization of. a secondary in the parent's track slot (since that results in quite a bit more mixing) and the copying of the parent's geometry state (allowing this would require partitioning the parent IDs as well, which we could do). FWIW I didn't see a ton of speedup from those geometry optimizations anyway (maybe a few percent at most, in cms2018).

test/celeritas/track/TrackSort.test.cc

sethrj · 2024-07-26T16:28:45Z

I might test this today... also the test segfaults are caught by clang asan:

[ RUN      ] PartitionDataTest.init_primaries_host
=================================================================
==2683==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f277607cc58 at pc 0x00000038c3b0 bp 0x7ffc76ad7e70 sp 0x7ffc76ad7e68
READ of size 8 at 0x7f277607cc58 thread T0

amandalund · 2024-08-05T22:50:39Z

@sethrj @esseivaju I think this is ready for a look and some testing/profiling now. I haven't done much myself yet, but in some quick initial comparisons saw ~25% improvement on testem3 and ~5% on cms2018 with celer-sim. With merge_events off I saw only a 15% improvement with testem3 and a performance degradation with cms; I haven't looked at the performance of any of the Geant4-integrated apps yet.

sethrj · 2024-08-06T12:30:58Z

Seems to be barely any bump in performance for Frontier/AMD. @esseivaju if you run the v0.5.0-dev.partitioned branch I'll generate the equivalent plot for perlmutter for the whole run of problems.

esseivaju · 2024-08-07T02:51:27Z

I'll try to run this tomorrow

sethrj · 2024-08-12T19:33:21Z

Copying the output from the slack channel. We're obviously not yet copying the track sort parameter into the GPU setup for Celeritas. It's very odd that the CPU "sort" performance dips in a couple of cases.

sethrj

Superb work @amandalund ! We really need to publish some of this 😅

src/celeritas/track/InitializeTracksAction.cc

src/celeritas/track/detail/ProcessSecondariesExecutor.hh

sethrj · 2024-08-16T18:11:01Z

@amandalund I think this is good to go aside from the minor comments and the conflict 😄 I know it's probably been lost in the pile of pull requests 😅

amandalund · 2024-08-16T18:17:49Z

Thanks @sethrj! I'd modified this to try to reuse the parent's geo state when possible for improved performance, but got sidetracked when I added an assertion that the parent's position match the initializer position and several unit tests started failing. Should be able to wrap this up pretty quickly now that those fixes are in.

amandalund · 2024-08-17T04:48:35Z

Alright, I've updated this so we're now partitioning an array of indices (rather than track initializers) which we can also use to access the parent track slot IDs to copy the geometry state. We'll still have to reinitialize the geometry more often than we would when not sorting (we're not doing the in-place initialization, so we can't reuse the state if the track was killed but produced secondaries), but hopefully it should still help some. @esseivaju if you wouldn't mind rerunning the regression problems with the latest changes and @sethrj updating the performance plot again I'd appreciate it!

sethrj · 2024-08-17T19:44:56Z

Rerunning frontier now...

src/celeritas/random/distribution/RejectionSampler.hh

sethrj · 2024-08-19T16:40:21Z

Frontier's seeing a solid 10% speedup with this 😄 nice work!

sethrj · 2024-08-19T21:34:16Z

Perlmutter's also seeing similar speedups for testem3. The Run 2 geometry minus MSC shows that same anomaly...

sethrj · 2024-08-28T20:38:09Z

@esseivaju did you get a chance to re-run develop for the CMS/run2/no-msc case? It would be good to get this in but I'd like to understand the performance regression first.

amandalund · 2024-08-28T20:49:38Z

I also ran it actually (on an A100 + AMD EPYC 7532):

sethrj · 2024-08-28T20:51:11Z

Weird...

amandalund · 2024-08-28T20:57:47Z

It's possible the worse performance for cms+msc could still be the penalty of having to reinitialize the geometry state more often than on develop...

sethrj · 2024-08-29T12:07:21Z

That makes a lot of sense, good thought @amandalund . The cost of a step iteration without MSC or field is relatively low compared to with MSC and field, and the cost of initialization with vecgeom/CMS is much higher than testem3. We could potentially do a better job of serializing the geometry state for fast reconstruction by storing an additional "unique volume ID" (aka VecGeom navindex, or expanded ORANGE volume ID) on the track and defining a new initializer that reconstructs the logical geometry state more efficiently than searching based on the point.

amandalund · 2024-08-29T13:15:28Z

Yeah it definitely seems worthwhile to explore ways of speeding up that initialization.

amandalund · 2024-08-31T23:47:56Z

Sorry @sethrj I only just realized that M means !MSC 😅. Maybe we should invert that?

sethrj · 2024-09-01T02:03:35Z

There's supposed to be a tilde on top; matplotlib didn't render it correctly. Since the results aren't correct without msc I figured it was better to start denoting "not included" rather than "accurate"...

amandalund · 2024-09-01T23:09:07Z

It does seem better to better to mark that some problems are missing key physics than the other way around... though on the other hand $F$ for field and $M$ for msc is much more obvious when glancing at a plot than $F$ for field and $\tilde{M}$ for no msc.

sethrj · 2024-09-03T12:47:28Z

I'm cool with changing this. Maybe better (if we want to go crazy) would be to replace the "no MSC" mode with full coulomb scattering? I have no idea if that'll increase our run times by 1000x though...

amandalund added physics Particles, processes, and stepping algorithms performance Changes for performance optimization labels Jul 14, 2024

amandalund requested review from sethrj and esseivaju July 14, 2024 23:40

amandalund force-pushed the track-data-layout branch from da7ad89 to c8bc84a Compare July 14, 2024 23:45

sethrj reviewed Jul 25, 2024

View reviewed changes

test/celeritas/track/TrackSort.test.cc Show resolved Hide resolved

amandalund force-pushed the track-data-layout branch from c8bc84a to 0157ec2 Compare July 27, 2024 00:21

amandalund added 2 commits August 2, 2024 20:35

Partition track data by whether the track is charged or neutral

9000c8c

Disable copying of parent's geometry state during track inttialization

830e579

amandalund force-pushed the track-data-layout branch from 0157ec2 to c4bf10f Compare August 2, 2024 20:37

amandalund added 2 commits August 2, 2024 22:04

Clean up

942085c

Avoid blocking D2H copy

2eb8ebd

amandalund force-pushed the track-data-layout branch from c4bf10f to 2eb8ebd Compare August 3, 2024 17:44

Revert to partition

0744378

amandalund marked this pull request as ready for review August 5, 2024 22:49

sethrj approved these changes Aug 13, 2024

View reviewed changes

src/celeritas/track/InitializeTracksAction.cc Outdated Show resolved Hide resolved

src/celeritas/track/detail/ProcessSecondariesExecutor.hh Show resolved Hide resolved

amandalund added 6 commits August 16, 2024 20:29

Copy parent's geo state when possible

421bb2d

Merge remote-tracking branch 'upstream/develop' into track-data-layout

284f1db

Use fill_sequence() and fix up

ff8ed95

Address review feedback

79af2f4

Clear parent IDs at each step when partitioning

75ef23f

Try to simplify a bit

ae4537e

sethrj reviewed Aug 17, 2024

View reviewed changes

src/celeritas/random/distribution/RejectionSampler.hh Outdated Show resolved Hide resolved

Merge branch 'develop' into track-data-layout

9ad807d

sethrj enabled auto-merge (squash) August 29, 2024 11:45

sethrj merged commit 3f6f041 into celeritas-project:develop Aug 29, 2024
29 checks passed

amandalund deleted the track-data-layout branch August 29, 2024 13:15

sethrj mentioned this pull request Sep 25, 2024

Add slot particle diagnostic #1426

Merged

3 tasks

amandalund mentioned this pull request Sep 28, 2024

Add track_order option to celer-g4 and default to partitioning by charge on GPU #1433

Merged

sethrj mentioned this pull request Sep 30, 2024

Rename track slot enumerations #1434

Merged

sethrj added the enhancement New feature or request label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize track vector data layout for particle types #1322

Optimize track vector data layout for particle types #1322

amandalund commented Jul 14, 2024

sethrj commented Jul 26, 2024

amandalund commented Aug 5, 2024

sethrj commented Aug 6, 2024

esseivaju commented Aug 7, 2024

sethrj commented Aug 12, 2024

sethrj left a comment

sethrj commented Aug 16, 2024

amandalund commented Aug 16, 2024

amandalund commented Aug 17, 2024

sethrj commented Aug 17, 2024

sethrj commented Aug 19, 2024

sethrj commented Aug 19, 2024

sethrj commented Aug 28, 2024

amandalund commented Aug 28, 2024

sethrj commented Aug 28, 2024

amandalund commented Aug 28, 2024

sethrj commented Aug 29, 2024

amandalund commented Aug 29, 2024

amandalund commented Aug 31, 2024

sethrj commented Sep 1, 2024

amandalund commented Sep 1, 2024

sethrj commented Sep 3, 2024

Optimize track vector data layout for particle types #1322

Optimize track vector data layout for particle types #1322

Conversation

amandalund commented Jul 14, 2024

sethrj commented Jul 26, 2024

amandalund commented Aug 5, 2024

sethrj commented Aug 6, 2024

esseivaju commented Aug 7, 2024

sethrj commented Aug 12, 2024

sethrj left a comment

Choose a reason for hiding this comment

sethrj commented Aug 16, 2024

amandalund commented Aug 16, 2024

amandalund commented Aug 17, 2024

sethrj commented Aug 17, 2024

sethrj commented Aug 19, 2024

sethrj commented Aug 19, 2024

sethrj commented Aug 28, 2024

amandalund commented Aug 28, 2024

sethrj commented Aug 28, 2024

amandalund commented Aug 28, 2024

sethrj commented Aug 29, 2024

amandalund commented Aug 29, 2024

amandalund commented Aug 31, 2024

sethrj commented Sep 1, 2024

amandalund commented Sep 1, 2024

sethrj commented Sep 3, 2024