-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SoA Spacepoints, main branch (2025.02.19.) #878
SoA Spacepoints, main branch (2025.02.19.) #878
Conversation
6ca9b8f
to
fa5af0a
Compare
Detailed performance tests to come on Monday. But with the modified
But don't read too much into it, I've seen fairly varied results. (The GPU is also doing 2x 4K graphics during all of this... 😛) At least now they're in the same ballpark... 🤔 |
fa5af0a
to
8ca8904
Compare
I assume this means the global coordinates are stored as an array of structs rather than a struct of arrays? |
? Have a look! I pushed the changes before posting the comment. Yes, I switched back to using |
Worse than that, the changes don't even show up as I imagined. I might have messed up with the push after all... 🤔 In the end the new Edit: Never mind. I was looking at the wrong file myself even... 🤦 The code in the PR should be the latest version that I have, after all. |
I think the remainder is about 0.01 Hz, right? I think that's an acceptable change, doesn't warrant too much investigation as far as I'm concerned. |
I might as well spoil the surprise. 😛 I kept the variance variables from The ones that have always been hardcoded to zero... Now they are stored as actual variables in the SoA. With their values always set to zero. 🤔 This of course would introduce some additional I/O. So we'll have to see how much we actually want these variances in the seed finding. At least for the moment. 🤔 |
8ca8904
to
ce3b24f
Compare
So... let's ignore the build failure for a moment... The current code of the PR, on the same A5000 card as before, behaves like this:
Note though that there is a fair amount of variance on these numbers. 🤔 Re-running the same test on the current state of the main branch I would get ~5% variances even on the runtime of
As discussed at our meeting this morning, the fact that the code now properly keeps track of the (currently always zero) variances of the spacepoints, slows things down. But we absolutely have to do this. The code was artificially faster previously. 🤔 I'll still do some compute profiles, and fix the compilation issue(s). But generally I think we should go forward like this. 🤔 |
The NSight Compute profile didn't reveal much. 🤔 I believe we still have performance to be gained overall, but that should not be done in this PR. It's already doing more than it really should... |
Since I couldn't resist, I tried what would happen if I tweaked
This can be directly compared with the numbers in #878 (comment). So it will be possible to improve things. 😄 But that will be better to put into a separate, much smaller PR. Once this beast gets in. 😉 |
ce3b24f
to
b1825e3
Compare
b1825e3
to
35c2a16
Compare
f95b308
to
3252eac
Compare
Choosing a local rebase on this branch was not the right call... 😦 Git clearly got very confused about how to transplant my 13 commits on top of the current It's not he end of the world since the commits would've been squashed in the end anyway, but still... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me but let's wait for @stephenswat's approval and further comments as the PR is huge
3252eac
to
cec16cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay in general, but there are so many unrelated changes in this PR... I would suggest we work a bit on pull request hygiene because this single 5000-line delta PR is practically unreviewable; it contains what should have been five or six independent PRs... 😕
/// Spacepoint container for the test seeds | ||
const spacepoint_collection_types::const_view m_spacepoints; | ||
const edm::spacepoint_collection::const_view m_spacepoints; | ||
|
||
/// The reference object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now out of date.
return (is_same_scalar(obj.x(), m_ref.x(), m_unc) && | ||
is_same_scalar(obj.y(), m_ref.y(), m_unc) && | ||
is_same_scalar(obj.z(), m_ref.z(), m_unc)); | ||
} | ||
|
||
private: | ||
/// The reference object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only looking at these "outdated" comments now. What do you mean here?
"Reference" here is not in the C++ sense, but rather that "this is the reference that we compare our test object against".
@@ -34,6 +34,7 @@ class nseed_performance_writer { | |||
void initialize(); | |||
void finalize(); | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you comment out this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have forgotten about a "temporarily disabled" piece of code. 😦 Let me check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a good catch!
That code is not used anywhere in practice, but I now still tried to update it to be compatible with the new EDM.
It's very likely to still have issues, but I was not going to write client code for this class just to test it.
const auto spB = | ||
spacepoints.at(sp_device.bin(spB_loc.bin_idx)[spB_loc.sp_idx]); | ||
const auto spT = | ||
spacepoints.at(sp_device.bin(spT_loc.bin_idx)[spT_loc.sp_idx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto spB = | |
spacepoints.at(sp_device.bin(spB_loc.bin_idx)[spB_loc.sp_idx]); | |
const auto spT = | |
spacepoints.at(sp_device.bin(spT_loc.bin_idx)[spT_loc.sp_idx]); | |
const auto & spB = | |
spacepoints.at(sp_device.bin(spB_loc.bin_idx)[spB_loc.sp_idx]); | |
const auto & spT = | |
spacepoints.at(sp_device.bin(spT_loc.bin_idx)[spT_loc.sp_idx]); |
const internal_spacepoint<spacepoint> middle_sp = | ||
sp_grid.bin(middle_sp_counter.m_spM.bin_idx) | ||
.at(middle_sp_counter.m_spM.sp_idx); | ||
const auto middle_sp = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll stop making more comments on this but let's capture the proxy spacepoints by const reference instead of by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote before, that would result in some properly dangerous code.
Since I do know that the compiler would actually allow that to be written. We've had such an occurrence in the ATLAS offline code, where one function all of a sudden started returning its thing by value. And clients continued working for a long time. Until one of the clients tried to pass along such a reference to a different scope. Then all of a sudden all hell broke lose. And it took quite some debugging to figure out what was going wrong. (As the code breaking was quite a bit removed from the code causing the issue.)
I also still debate whether we should use auto
so liberally in these places. 🤔 There are proper named types that we could use here. They are just so very long... 🤔 So I'm not sure what's the best for readability and maintainability...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until one of the clients tried to pass along such a reference to a different scope. Then all of a sudden all hell broke lose. And it took quite some debugging to figure out what was going wrong. (As the code breaking was quite a bit removed from the code causing the issue.)
Can you elaborate on how passing on a lifetime-extended temporary by reference is dangerous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the lifetime of the returned object is as one would expect. It last's "the scope that it was received in".
Let's take the following dummy code for instance:
std::reference_wrapper<const Foo> ref;
{
const Foo& f = functionReturningValue();
ref = f;
}
ref.get().doSomething();
The thing we encountered in the offline code was something along these lines.
I feel pretty strongly that using references in these places would very easily lead to coding mistakes later down the road. And clearly... adding &
will do absolutely nothing to the performance of the generated code. So what good would it do? It would only obfuscate the code more in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::reference_wrapper<const Foo> ref; { const Foo& f = functionReturningValue(); ref = f; } ref.get().doSomething();
By this logic your current proxy objects are also broken:
my_edm_container cont;
std::reference_wrapper<my_edm_container::proxy_type> ref;
{
my_edm_container::proxy_type p = cont[0];
ref = p;
}
ref.get().x();
What I am trying to say is, if your user wants to shoot themselves in the foot this much then there is no way to prevent them to (without switching to Rust).
At the same time, keeping these variables as references makes the code more flexible! If the design of the container changes, a reference will be able to keep up without invisible performance degradation; not the case if they are values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus, accepting them as references provides a semantic hint to the user that these types are basically just fancy reference packs, and that they cannot be changed without side-effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example the code literally makes a reference to a temporary. Of course it will break then. 😕 In my example the writer of the code could reasonably believe that they are holding onto a reference that has a lifetime beyond the current scope.
I see your argument about changing future containers, but then let me go ahead and replace all the auto
-s with concrete types. As I'm coming to think that that would result in the most readable code going forward...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for removing/replacing all usages of auto
for the new types. This way if/when the definition of edm::spacepoint_collection::const_device::const_proxy_type
changes, the code here should change correctly as well. 🤔
const std::size_t sp_index = result.size(); | ||
result.resize(sp_index + 1u); | ||
auto sp = result.at(sp_index); | ||
traccc::details::fill_pixel_spacepoint(sp, det, meas); | ||
sp.measurement_index() = meas_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had a push_pack function now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I modified this code before I would've implemented push_back(...)
. And then didn't bother switching it back.
Mainly because I didn't think that using vecmem::edm::device::push_back(...)
would actually make this code easier to understand / maintain. Passing along a proxy to the helper function was easier than passing a templated container. (Since the helper needs to work with both the host and the device container.)
} | ||
++meas_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this and not just std::distance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::distance
? You mean to switch from this Frankenstein for-loop to an iterator based one?
Yeah, I could be on board with that. Adding the extra integer seemed like the smaller intervention in the code. But I agree, the for-loop is pretty ugly now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look!
I fear though that I only made the code (even) uglier. 😦
/// Internal implementation struct | ||
struct impl; | ||
/// Pointer to the internal implementation | ||
std::unique_ptr<impl> m_impl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this needs to be PIMPL? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I wanted to move all the separate classes that we use in the implementation, out of the public interface of the library. As they are all implementation details. That are only used by the host code.
By the way, I think we should strongly consider testing out the new |
cec16cf
to
6f90e80
Compare
6f90e80
to
38b793f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go ahead and put this in.
38b793f
to
b10bfdc
Compare
May God have mercy on our souls... 😛 |
There is no need for it to receive the size of the measurement collection separately. It already has a view of the collection with that information.
Also, make at least some of the code use the new vecmem::edm::host::push_back(...) functionality.
…thm. In sync with all the other spacepoint_binning classes.
Note though that the code is not actually used anywhere at the moment. I only ensured that it would compile.
b10bfdc
to
529b49c
Compare
|
With all its problems, let me open this monster PR, so that all of you could have a first look.
As I've been advertising for a while now, I was working on making
traccc::spacepoint
(andtraccc::seed
) have an SoA memory layout. Using the samevecmem::edm
code that I pushed in for the cells as well.To jump right to the conclusion, the code is currently a little slower with the SoA layout. 😭 Even though I would've dreamt of some cheap performance gains with all this effort... But let's come back to this further down.
Some of the main points in this PR:
traccc::edm::spacepoint<t>
,traccc::edm::spacepoint_collection
,traccc::edm::seed<T>
andtraccc::edm::seed_collection
;traccc::measurement
member, but rather just an integer with the index of the measurement that the spacepoint was made out of.traccc::internal_spacepoint
completely, and modified the spacepoint grid type to:traccc::details::spacepoint_grid_types::host
,traccc::details::spacepoint_grid_types::buffer
, etc.traccc::details
namespace for it, since regular users should not really need to interact with such objects directly.unsigned int
elements, with the indices of the spacepoints that belong to a given bin in the grid.traccc::<foo>::spacepoint_binning
andtraccc::<foo>::seed_finding
classes from "being algorithms", to just being some helper tools that the maintraccc::<foo>::seeding_algorithm
classes would use.traccc::<foo>::details::
namespaces, to show that these would not be the main interfaces to use in client code. (I considered completely hiding these classes into thesrc/
directories, but in the end decided against that.)traccc::performance
to be able to compare SoA containers to each other.traccc::soa_comparator
class, which uses the same machinery as the existingtraccc::collection_comparator
. (I tried to extend the existing class for a bit to make it compatible with both AoS and SoA types, but it was just not practical to do.)Coming back to the performance... On our trusty old A5000, at$\mu$ =140, we go from:
To:
The spacepoint grid creation speeds up a little, but everything else slows down. 😦 We discussed a little about this with @stephenswat already, I'll try some tricks still. But it seems to make it pretty clear that just blindly switching to an SoA layout is not a silver bullet. As the memory throughput of the affected kernels clearly went down with the current state of this PR. 🤔