-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cell EDM Rewrite, main branch (2024.09.18.) #712
Cell EDM Rewrite, main branch (2024.09.18.) #712
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally on board, but in its current state this makes the code much less readable and generic; I'd strongly recommend looking into a proxy object for this as it would make the code more readable and require much fewer code line changes.
See, e.g. https://github.com/acts-project/acts/blob/main/Examples/Framework/include/ActsExamples/EventData/Measurement.hpp and https://github.com/acts-project/traccc/blob/main/core/include/traccc/utils/array_wrapper.hpp for inspiration.
core/include/traccc/clusterization/impl/measurement_creation.ipp
Outdated
Show resolved
Hide resolved
core/include/traccc/clusterization/impl/measurement_creation.ipp
Outdated
Show resolved
Hide resolved
As mentioned, please increase the minimum vecmem version to 1.8.0. |
Hmm. are you going to go through this for every EDM class? |
This will also have lots of conflictions with #692 😿 |
The good news is that when the proxy objects are implemented, most of the code will remain unchanged. |
Some further developments are indeed underway... |
9641666
to
34966b3
Compare
All of you, hold onto your hats. 😄 If/once we settle on acts-project/vecmem#296, these are the types of updates that we will need to do to switch from the current AoS to a new SoA EDM: At the same time I looked at profiles of the throughput application a little as well. This was very educational. As it turns out, the small slowdown is not due to the kernels. It seems to be due to the code spending a little more time on memory copies. 🤔 That's not great news, as apparently the |
34966b3
to
348ee3b
Compare
348ee3b
to
c8b3f86
Compare
out.push_back(v2); | ||
} | ||
} else if (tid == 0) { | ||
out[atomicAdd(out_size, 1u)] = projection(in.at(tid)); | ||
out.push_back(projection(in.at(tid))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenswat take note that I changed this code a little. Since you were literally implementing a resizable 1D vector by hand previously, I decided to rather just use the vecmem types here that were designed for exactly this use case. 🤔
ca8dd63
to
adaecb6
Compare
aabcb8e
to
1999c5b
Compare
The good news is that once the code finally starts working on all platforms with all compilers, this very latest version is finally delivering on the performance front. 😄
Though I am a little bit afraid of this possibly being artificial. Since the previous result was on |
ccb8eb3
to
df40fb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting there. 👍
core/include/traccc/clusterization/impl/measurement_creation.ipp
Outdated
Show resolved
Hide resolved
core/include/traccc/clusterization/impl/measurement_creation.ipp
Outdated
Show resolved
Hide resolved
device/common/include/traccc/clusterization/device/impl/aggregate_cluster.ipp
Outdated
Show resolved
Hide resolved
device/common/include/traccc/clusterization/device/impl/reduce_problem_cell.ipp
Show resolved
Hide resolved
Huhh... 🤔 What's your take on these errors @stephenswat? |
SonarCloud actually makes a really valid point here about constraining universal references; I'd suggest we go ahead and implement them. |
As long as you have a concrete idea of how to go about it, I'm happy to let you propose the improvement. 😉 |
df40fb2
to
b65600e
Compare
Okay, I guess we need to get vecmem 1.10.0 and then we can put this in, right? |
Updated the algorithms in traccc::core to use the new types.
Updated the remaining clients that were still using these headers by mistake.
Allowing the code to revert to a very similar setup that it had with the old AoS cell collection.
Also introduced the usage of detector description proxy objects in a few places. Updated the sanity functions once again, to bring them back to a setup closer to how they are at the moment in the main branch. Also synchornized the CUDA and SYCL implementations of the sanity functions a little.
Introduced some additional functions into edm::silicon_cell to make it a bit easier to handle such objects/containers. Fixed a long-standing mistake in traccc::device::ccl_core, which was brought to light with these latest changes.
The aggregate_cluster code uses the same cell over and over again, so using a proxy formalism makes sense here after all.
After switching to vecmem-1.10.0, to make this possible.
b65600e
to
bf7ac0d
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think this can go in now.
This is the next monster PR... Exchanging
traccc::cell_collection_types
andtraccc::cluster_container_types
with SoA versions.To jump right to the chase: It doesn't bring any performance improvement. 😦 This EDM change of course only affects clusterization. Which is already one of the fastest things that we run. Still if anything, I see an O(1%) performance drop during the TML$\mu$ =200 throughput measurements with this update applied. 🤔
On my RTX3080 I get the following with the current
main
branch:While this PR produces the following:
(There is variation on these numbers, but the "new" code is always just a little slower. 😦)
About the code:
traccc::edm::silicon_cell_collection
andtraccc::edm::silicon_cluster_collection
as the name of these containers. But I'm not too fond of these names either. So I'm open to suggestions.traccc::cell
closely.traccc::edm::silicon_cluster_collection
type, only used in the host code, is now a jagged vector of cell indices. As such,traccc::host::measurement_creation_algorithm
had to change its interface slightly.We'll have to do some profiling, but I suspect that the small performance drop comes from the fact that the PR's code always reads the cell data from global memory, whenever it needs it. Just loading some of the info into local registers in a couple of places will hopefully take us back to the previous performance. I just didn't want to complicate the code even further in this PR. 🤔
This PR also closes #691.