Use separate along-step kernel for neutral particles for 25% performance boost #745

sethrj · 2023-05-03T17:29:03Z

This results in a ~25% speedup for GPU tracks on CMS+msc+field. Sorting the along-step tracks results in a slight (5-10% per kernel) performance increase for the geometry kernels but a substantial (~7% overall execution) increase in the pre-step kernel plus a penalty (~5% overall) for the sort. Sorting by both action and post-step action further increases the overall time.

NOTE: preliminary testing suggests this is terrible for AMD hardware by default: the along-step-field time does not change for TestEm3, and the along-step-neutral is simply added on top of it. Perhaps because AMD doesn't have separate hardware counters?

Interestingly geo propagation limit seems to be fixed? Results are now consistent with ORANGE (!)

esseivaju · 2023-05-10T18:41:27Z

Other than the diagnostic test failing, everything looks good to me

amandalund

Nice speedup! Looks good to me too once the test is fixed. Interestingly, for cms2018+msc+field this decreased the mean number of steps per gamma track by ~10%.

sethrj · 2023-05-11T02:50:09Z

@amandalund Yeah, that might be related to the change in results for the diagnostic: I'm not sure how, but the

               "geo-propagation-limit e+",
               "geo-propagation-limit e-",

actions are no longer showing up. Maybe it's a lingering issue with positrons being biased to die near geometry boundaries?

esseivaju · 2023-05-11T03:01:42Z

Perhaps because AMD doesn't have separate hardware counters?

I suppose you're referring to Independent thread scheduling from Volta where each thread has its own PC and SP. If my understanding is correct, even then, the execution is still SIMT, so locksteps still happen, we can just interleave divergent code paths since each thread has its own stats. I suppose it still helps if threads are stalled due to memory transfer or fetching instruction (the two most common causes of stalling we have) then we can execute the other code path? I'd still expect AMD to be faster since it has less work to do.

amandalund · 2023-05-11T03:19:27Z

interesting, could be... I do see the geo-propagation-limit actions when I increase the number of steps in the tests, but not sure if it's statistical or something else causing the change.

sethrj added enhancement New feature or request core Software engineering infrastructure labels May 3, 2023

sethrj added this to the v0.3.0 milestone May 4, 2023

sethrj force-pushed the along-step-branch branch 2 times, most recently from 8ea237b to bd372de Compare May 8, 2023 14:01

sethrj added 4 commits May 10, 2023 08:39

Implement separate neutral and charged along-step actions

930fa9f

Add new action order and track policy

757ad36

Add along-step sorting and refactor track sorting

2d6500b

Update diagnostic tests

dc65df4

Interestingly geo propagation limit seems to be fixed? Results are now consistent with ORANGE (!)

sethrj force-pushed the along-step-branch branch from abdce93 to dc65df4 Compare May 10, 2023 12:39

sethrj requested review from esseivaju and amandalund May 10, 2023 12:40

sethrj marked this pull request as ready for review May 10, 2023 12:40

amandalund approved these changes May 10, 2023

View reviewed changes

fixup! Update diagnostic tests

b5f35ae

sethrj closed this May 11, 2023

sethrj reopened this May 11, 2023

Merge remote-tracking branch 'upstream/develop' into along-step-branch

a2c8b5c

esseivaju approved these changes May 11, 2023

View reviewed changes

sethrj merged commit 735b233 into celeritas-project:develop May 11, 2023

sethrj deleted the along-step-branch branch May 11, 2023 15:25

sethrj mentioned this pull request May 11, 2023

Generalize along-step kernels #538

Open

2 tasks

sethrj added performance Changes for performance optimization and removed core Software engineering infrastructure labels Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate along-step kernel for neutral particles for 25% performance boost #745

Use separate along-step kernel for neutral particles for 25% performance boost #745

sethrj commented May 3, 2023 •

edited

Loading

esseivaju commented May 10, 2023

amandalund left a comment

sethrj commented May 11, 2023

esseivaju commented May 11, 2023

amandalund commented May 11, 2023

Use separate along-step kernel for neutral particles for 25% performance boost #745

Use separate along-step kernel for neutral particles for 25% performance boost #745

Conversation

sethrj commented May 3, 2023 • edited Loading

esseivaju commented May 10, 2023

amandalund left a comment

Choose a reason for hiding this comment

sethrj commented May 11, 2023

esseivaju commented May 11, 2023

amandalund commented May 11, 2023

sethrj commented May 3, 2023 •

edited

Loading