Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate along-step kernel for neutral particles for 25% performance boost #745

Merged
merged 6 commits into from
May 11, 2023

Conversation

sethrj
Copy link
Member

@sethrj sethrj commented May 3, 2023

This results in a ~25% speedup for GPU tracks on CMS+msc+field. Sorting the along-step tracks results in a slight (5-10% per kernel) performance increase for the geometry kernels but a substantial (~7% overall execution) increase in the pre-step kernel plus a penalty (~5% overall) for the sort. Sorting by both action and post-step action further increases the overall time.

NOTE: preliminary testing suggests this is terrible for AMD hardware by default: the along-step-field time does not change for TestEm3, and the along-step-neutral is simply added on top of it. Perhaps because AMD doesn't have separate hardware counters?

@sethrj sethrj added enhancement New feature or request core Software engineering infrastructure labels May 3, 2023
@sethrj sethrj added this to the v0.3.0 milestone May 4, 2023
@sethrj sethrj force-pushed the along-step-branch branch 2 times, most recently from 8ea237b to bd372de Compare May 8, 2023 14:01
@sethrj sethrj force-pushed the along-step-branch branch from abdce93 to dc65df4 Compare May 10, 2023 12:39
@sethrj sethrj requested review from esseivaju and amandalund May 10, 2023 12:40
@sethrj sethrj marked this pull request as ready for review May 10, 2023 12:40
@esseivaju
Copy link
Contributor

Other than the diagnostic test failing, everything looks good to me

Copy link
Contributor

@amandalund amandalund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup! Looks good to me too once the test is fixed. Interestingly, for cms2018+msc+field this decreased the mean number of steps per gamma track by ~10%.

@sethrj
Copy link
Member Author

sethrj commented May 11, 2023

@amandalund Yeah, that might be related to the change in results for the diagnostic: I'm not sure how, but the

               "geo-propagation-limit e+",
               "geo-propagation-limit e-",

actions are no longer showing up. Maybe it's a lingering issue with positrons being biased to die near geometry boundaries?

@esseivaju
Copy link
Contributor

Perhaps because AMD doesn't have separate hardware counters?

I suppose you're referring to Independent thread scheduling from Volta where each thread has its own PC and SP. If my understanding is correct, even then, the execution is still SIMT, so locksteps still happen, we can just interleave divergent code paths since each thread has its own stats. I suppose it still helps if threads are stalled due to memory transfer or fetching instruction (the two most common causes of stalling we have) then we can execute the other code path? I'd still expect AMD to be faster since it has less work to do.

@amandalund
Copy link
Contributor

interesting, could be... I do see the geo-propagation-limit actions when I increase the number of steps in the tests, but not sure if it's statistical or something else causing the change.

@sethrj sethrj closed this May 11, 2023
@sethrj sethrj reopened this May 11, 2023
@sethrj sethrj merged commit 735b233 into celeritas-project:develop May 11, 2023
@sethrj sethrj deleted the along-step-branch branch May 11, 2023 15:25
@sethrj sethrj mentioned this pull request May 11, 2023
2 tasks
@sethrj sethrj added performance Changes for performance optimization and removed core Software engineering infrastructure labels Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Changes for performance optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants