Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoA Spacepoints, main branch (2025.02.19.) #878

Merged
merged 8 commits into from
Feb 27, 2025

Conversation

krasznaa
Copy link
Member

With all its problems, let me open this monster PR, so that all of you could have a first look.

As I've been advertising for a while now, I was working on making traccc::spacepoint (and traccc::seed) have an SoA memory layout. Using the same vecmem::edm code that I pushed in for the cells as well.

To jump right to the conclusion, the code is currently a little slower with the SoA layout. 😭 Even though I would've dreamt of some cheap performance gains with all this effort... But let's come back to this further down.

Some of the main points in this PR:

  • Introduced traccc::edm::spacepoint<t>, traccc::edm::spacepoint_collection, traccc::edm::seed<T> and traccc::edm::seed_collection;
    • Spacepoints no longer have a traccc::measurement member, but rather just an integer with the index of the measurement that the spacepoint was made out of.
    • This could later on be extended so that the spacepoint would point at two strip measurements. In case we end up writing such a thing.
  • Removed traccc::internal_spacepoint completely, and modified the spacepoint grid type to:
    • Follow the same naming scheme that we use for all our containers. I.e. traccc::details::spacepoint_grid_types::host, traccc::details::spacepoint_grid_types::buffer, etc.
    • I used the traccc::details namespace for it, since regular users should not really need to interact with such objects directly.
    • The grid now uses unsigned int elements, with the indices of the spacepoints that belong to a given bin in the grid.
  • I demoted all of the traccc::<foo>::spacepoint_binning and traccc::<foo>::seed_finding classes from "being algorithms", to just being some helper tools that the main traccc::<foo>::seeding_algorithm classes would use.
    • I also moved all of them into traccc::<foo>::details:: namespaces, to show that these would not be the main interfaces to use in client code. (I considered completely hiding these classes into the src/ directories, but in the end decided against that.)
  • Updated all of the I/O code to the new classes;
  • Updated traccc::performance to be able to compare SoA containers to each other.
    • This is done through a new traccc::soa_comparator class, which uses the same machinery as the existing traccc::collection_comparator. (I tried to extend the existing class for a bit to make it compatible with both AoS and SoA types, but it was just not practical to do.)
  • Updated all of the tests and examples to use the new EDM types.

Coming back to the performance... On our trusty old A5000, at $\mu$=140, we go from:

 Time (%)  Total Time (ns)  Instances    Avg (ns)       Med (ns)      Min (ns)     Max (ns)    StdDev (ns)                                                   Name
 --------  ---------------  ---------  -------------  -------------  -----------  -----------  ------------  ----------------------------------------------------------------------------------------------------
...
      1.5      470,490,231        110    4,277,183.9    3,780,795.5    1,370,243   14,717,057   2,201,624.2  traccc::cuda::kernels::count_triplets(traccc::seedfinder_config, detray::const_grid2_view<detray::g…
...
      0.7      230,992,739        110    2,099,934.0    1,630,692.5      527,521   10,839,928   1,571,123.8  traccc::cuda::kernels::find_doublets(traccc::seedfinder_config, detray::const_grid2_view<detray::gr…
      0.5      171,477,070        110    1,558,882.5    1,158,307.5      345,345   19,859,050   2,248,613.7  traccc::cuda::kernels::find_triplets(traccc::seedfinder_config, traccc::seedfilter_config, detray::…
...
      0.3       89,031,135        110      809,374.0      602,642.0      214,465    4,017,866     655,702.0  traccc::cuda::kernels::count_doublets(traccc::seedfinder_config, detray::const_grid2_view<detray::g…
...
      0.1       34,684,545        110      315,314.0      224,912.5       93,952    1,779,783     312,483.6  traccc::cuda::kernels::select_seeds(traccc::seedfilter_config, vecmem::data::vector_view<const trac…
...
      0.1       16,780,777        110      152,552.5      115,008.5       21,888      829,538     145,592.3  traccc::cuda::kernels::estimate_track_params(vecmem::data::vector_view<const traccc::spacepoint>, v…
...
      0.0       12,371,711        110      112,470.1       59,296.5       19,552      528,097     109,092.5  traccc::cuda::kernels::update_triplet_weights(traccc::seedfilter_config, detray::const_grid2_view<d…
      0.0        8,864,118        110       80,582.9       45,296.0       22,720      368,385      69,339.7  void traccc::cuda::kernels::form_spacepoints<detray::detector<detray::default_metadata<algebra::plu…
 ...
      0.0        4,239,799        110       38,543.6       29,376.0       19,456      213,505      26,191.3  traccc::cuda::kernels::populate_grid(traccc::seedfinder_config, vecmem::data::vector_view<const tra…
      0.0        3,109,098        110       28,264.5       23,200.0       14,975      106,528      14,777.9  traccc::cuda::kernels::count_grid_capacities(traccc::seedfinder_config, detray::axis2::circular<std…
...
      0.0        2,480,457        110       22,549.6       11,600.0        7,360      113,793      18,672.3  traccc::cuda::kernels::reduce_triplet_counts(vecmem::data::vector_view<const traccc::device::double…
...

To:

 Time (%)  Total Time (ns)  Instances    Avg (ns)       Med (ns)      Min (ns)     Max (ns)    StdDev (ns)                                                   Name
 --------  ---------------  ---------  -------------  -------------  -----------  -----------  ------------  ----------------------------------------------------------------------------------------------------
...
      1.8      601,845,107        110    5,471,319.2    4,951,645.5    1,737,715   17,843,903   2,667,566.6  traccc::cuda::kernels::count_triplets(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sch…
      1.1      374,323,769        110    3,402,943.4    2,881,532.5    1,434,646   19,223,364   2,234,771.4  traccc::cuda::kernels::find_doublets(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sche…
      0.8      272,225,133        110    2,474,773.9    1,712,515.5      589,308   23,448,033   3,162,311.5  traccc::cuda::kernels::find_triplets(traccc::seedfinder_config, traccc::seedfilter_config, vecmem::…
...
      0.6      204,124,161        110    1,855,674.2    1,549,077.0      489,917    5,415,800   1,074,744.7  traccc::cuda::kernels::count_doublets(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sch…
...
      0.1       45,660,333        110      415,093.9      262,094.0      128,127    2,542,445     458,869.7  traccc::cuda::kernels::select_seeds(traccc::seedfilter_config, vecmem::edm::view<vecmem::edm::schem…
...
      0.1       18,027,751        110      163,888.6      156,639.0       22,016      730,811     125,213.4  traccc::cuda::kernels::estimate_track_params(vecmem::data::vector_view<const traccc::measurement>, …
...
      0.1       17,326,347        110      157,512.2       60,160.0       26,496    3,417,801     354,702.6  traccc::cuda::kernels::update_triplet_weights(traccc::seedfilter_config, vecmem::edm::view<vecmem::…
...
      0.0        5,860,878        110       53,280.7       26,735.5       10,848      216,286      51,295.1  void traccc::cuda::kernels::form_spacepoints<detray::detector<detray::default_metadata<algebra::plu…
...
      0.0        2,855,926        110       25,963.0       20,448.0       13,920      135,007      16,604.9  traccc::cuda::kernels::populate_grid(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sche…
      0.0        2,389,644        110       21,724.0       19,103.5       13,600       62,591       9,177.2  traccc::cuda::kernels::count_grid_capacities(traccc::seedfinder_config, detray::axis2::circular<std…
...

The spacepoint grid creation speeds up a little, but everything else slows down. 😦 We discussed a little about this with @stephenswat already, I'll try some tricks still. But it seems to make it pretty clear that just blindly switching to an SoA layout is not a silver bullet. As the memory throughput of the affected kernels clearly went down with the current state of this PR. 🤔

@krasznaa krasznaa added cuda Changes related to CUDA sycl Changes related to SYCL cpu Changes related to CPU code edm Changes to the data model kokkos Changes related to Kokkos alpaka Changes related to Alpaka labels Feb 19, 2025
@krasznaa
Copy link
Member Author

Detailed performance tests to come on Monday. But with the modified traccc::edm::spacepoint<T> class I now get the following.

  • With the current main branch I get the following on my home PC:
./build/bin/traccc_throughput_mt_cuda --input-directory /data/hdd-4tb/acts_data/odd-20240509/geant4_ttbar_mu140/ --input-events 100 --cpu-threads 2 --deterministic

Running Multi-threaded CUDA GPU throughput tests
...
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Warm-up processing [==================================================] 100% [00m:00s]                                            
Event processing   [==================================================] 100% [00m:00s]                                            
Reconstructed track parameters: 4766973
Time totals:
                  File reading  3325 ms
            Warm-up processing  2421 ms
              Event processing  21890 ms
Throughput:
            Warm-up processing  242.162 ms/event, 4.12946 events/s
              Event processing  218.906 ms/event, 4.56816 events/s
  • And with this PR's code I see:
./build/cuda-native-fp32/bin/traccc_throughput_mt_cuda --input-directory /data/hdd-4tb/acts_data/odd-20240509/geant4_ttbar_mu140/ --input-events 100 --cpu-threads 2 --deterministic

Running Multi-threaded CUDA GPU throughput tests
...
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Warm-up processing [==================================================] 100% [00m:00s]                                                                                                                                                                                                                                                                                                                                               
Event processing   [==================================================] 100% [00m:00s]                                                                                                                                                                                                                                                                                                                                               
Reconstructed track parameters: 4767233
Time totals:
                  File reading  3280 ms
            Warm-up processing  2426 ms
              Event processing  21936 ms
Throughput:
            Warm-up processing  242.633 ms/event, 4.12145 events/s
              Event processing  219.37 ms/event, 4.55851 events/s

But don't read too much into it, I've seen fairly varied results. (The GPU is also doing 2x 4K graphics during all of this... 😛) At least now they're in the same ballpark... 🤔

@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from fa5af0a to 8ca8904 Compare February 21, 2025 21:20
@stephenswat
Copy link
Member

But with the modified traccc::edm::spacepoint<T> class I now get the following.

I assume this means the global coordinates are stored as an array of structs rather than a struct of arrays?

@krasznaa
Copy link
Member Author

I assume this means the global coordinates are stored as an array of structs rather than a struct of arrays?

? Have a look! I pushed the changes before posting the comment.

Yes, I switched back to using traccc::point3 as-is. I also believe I understand where the remaining performance difference comes from. We can discuss about it in a little bit...

@stephenswat
Copy link
Member

? Have a look! I pushed the changes before posting the comment.

image

😛

@krasznaa
Copy link
Member Author

krasznaa commented Feb 24, 2025

Worse than that, the changes don't even show up as I imagined. I might have messed up with the push after all... 🤔

In the end the new spacepoint_collection.hpp file should be the important part. Everything else is just there to support that new thing.

Edit: Never mind. I was looking at the wrong file myself even... 🤦 The code in the PR should be the latest version that I have, after all.

@stephenswat
Copy link
Member

I also believe I understand where the remaining performance difference comes from.

I think the remainder is about 0.01 Hz, right? I think that's an acceptable change, doesn't warrant too much investigation as far as I'm concerned.

@krasznaa
Copy link
Member Author

I might as well spoil the surprise. 😛

I kept the variance variables from internal_spacepoint.

https://github.com/acts-project/traccc/blob/main/core/include/traccc/edm/internal_spacepoint.hpp#L80-L84

The ones that have always been hardcoded to zero... Now they are stored as actual variables in the SoA. With their values always set to zero. 🤔 This of course would introduce some additional I/O.

So we'll have to see how much we actually want these variances in the seed finding. At least for the moment. 🤔

@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from 8ca8904 to ce3b24f Compare February 24, 2025 09:22
@krasznaa
Copy link
Member Author

So... let's ignore the build failure for a moment...

The current code of the PR, on the same A5000 card as before, behaves like this:

nsys profile --stats=true -o seeding_new2.profile ./build-new/bin/traccc_throughput_mt_cuda --input-directory /data/Acts/odd-simulations-20240509/geant4_ttbar_mu140/ --input-events 100 --cpu-threads 2 --deterministic 2>&1 | tee seeding_new2.profile.log
...
 Time (%)  Total Time (ns)  Instances    Avg (ns)       Med (ns)      Min (ns)    Max (ns)    StdDev (ns)                                                   Name                                                
 --------  ---------------  ---------  -------------  -------------  ----------  -----------  ------------  ----------------------------------------------------------------------------------------------------
...
      1.8      541,780,618        110    4,925,278.3    4,801,199.5   1,934,854   11,273,987   1,975,679.1  traccc::cuda::kernels::count_triplets(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sch…
...
      0.8      227,799,499        110    2,070,904.5    1,894,950.5     912,321    5,418,155     847,702.1  traccc::cuda::kernels::find_doublets(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sche…
      0.6      168,275,466        110    1,529,777.0    1,386,307.0     538,785   12,902,792   1,197,117.2  traccc::cuda::kernels::find_triplets(traccc::seedfinder_config, traccc::seedfilter_config, vecmem::…
      0.5      162,648,091        110    1,478,619.0    1,493,715.5     316,609    3,315,369     543,757.1  traccc::cuda::kernels::count_doublets(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sch…
...
      0.1       22,085,158        110      200,774.2      195,952.0     114,208      559,553      56,230.2  traccc::cuda::kernels::select_seeds(traccc::seedfilter_config, vecmem::edm::view<vecmem::edm::schem…
...
      0.0        9,887,572        110       89,887.0       43,184.0      24,640      310,785      67,368.5  traccc::cuda::kernels::estimate_track_params(vecmem::data::vector_view<const traccc::measurement>, …
...
      0.0        4,828,110        110       43,891.9       11,055.5       8,384      147,872      47,505.2  traccc::cuda::kernels::reduce_triplet_counts(vecmem::data::vector_view<const traccc::device::double…
      0.0        4,640,174        110       42,183.4       39,520.0      24,032      155,073      15,700.1  traccc::cuda::kernels::update_triplet_weights(traccc::seedfilter_config, vecmem::edm::view<vecmem::…
...
      0.0        3,019,749        110       27,452.3       15,776.0       9,216       93,760      23,093.6  void traccc::cuda::kernels::form_spacepoints<detray::detector<detray::default_metadata<algebra::plu…
...
      0.0        2,190,561        110       19,914.2       18,799.5      13,120       49,440       5,830.7  traccc::cuda::kernels::populate_grid(traccc::seedfinder_config, vecmem::edm::view<vecmem::edm::sche…
      0.0        2,043,941        110       18,581.3       17,472.0      12,577       57,984       6,019.7  traccc::cuda::kernels::count_grid_capacities(traccc::seedfinder_config, detray::axis2::circular<std…
...

Note though that there is a fair amount of variance on these numbers. 🤔 Re-running the same test on the current state of the main branch I would get ~5% variances even on the runtime of traccc::cuda::kernels::ccl_kernel. Which is absolutely not affected by this PR. 😕 One of the slower runs with the current main branch look like this for instance:

nsys profile --stats=true ./build-current/bin/traccc_throughput_mt_cuda --input-directory /data/Acts/odd-simulations-20240509/geant4_ttbar_mu140/ --input-events 100 --cpu-threads 2 --deterministic
...
 Time (%)  Total Time (ns)  Instances    Avg (ns)       Med (ns)      Min (ns)     Max (ns)    StdDev (ns)                                                   Name                                                
 --------  ---------------  ---------  -------------  -------------  -----------  -----------  ------------  ----------------------------------------------------------------------------------------------------
...
      1.8      516,405,282        110    4,694,593.5    4,480,770.0    1,578,508   13,249,062   2,161,830.1  traccc::cuda::kernels::count_triplets(traccc::seedfinder_config, detray::const_grid2_view<detray::g…
...
      0.6      174,097,132        110    1,582,701.2    1,160,745.5      580,709    5,464,202     992,292.8  traccc::cuda::kernels::find_doublets(traccc::seedfinder_config, detray::const_grid2_view<detray::gr…
      0.4      125,428,299        110    1,140,257.3    1,098,185.0      370,114    2,470,451     387,936.4  traccc::cuda::kernels::find_triplets(traccc::seedfinder_config, traccc::seedfilter_config, detray::…
...
      0.3       86,367,906        110      785,162.8      701,397.5      198,882    2,067,408     376,686.2  traccc::cuda::kernels::count_doublets(traccc::seedfinder_config, detray::const_grid2_view<detray::g…
...
      0.1       19,817,147        110      180,155.9      177,713.0      104,000      369,699      45,532.0  traccc::cuda::kernels::select_seeds(traccc::seedfilter_config, vecmem::data::vector_view<const trac…
...      
      0.0        9,832,528        110       89,386.6       42,881.0       20,928      585,636     125,096.4  traccc::cuda::kernels::update_triplet_weights(traccc::seedfilter_config, detray::const_grid2_view<d…
      0.0        9,513,894        110       86,489.9       36,496.0       21,568      306,818      73,636.3  traccc::cuda::kernels::estimate_track_params(vecmem::data::vector_view<const traccc::spacepoint>, v…
...
      0.0        5,305,766        110       48,234.2       37,824.5       21,281      288,706      34,678.8  void traccc::cuda::kernels::form_spacepoints<detray::detector<detray::default_metadata<algebra::plu…
...
      0.0        3,374,773        110       30,679.8       26,944.0       17,024      263,362      23,959.4  traccc::cuda::kernels::populate_grid(traccc::seedfinder_config, vecmem::data::vector_view<const tra…
...
      0.0        2,786,839        110       25,334.9       10,480.0        8,160      141,857      29,897.2  traccc::cuda::kernels::reduce_triplet_counts(vecmem::data::vector_view<const traccc::device::double…
      0.0        2,451,603        110       22,287.3       21,520.5       16,032       49,761       5,758.1  traccc::cuda::kernels::count_grid_capacities(traccc::seedfinder_config, detray::axis2::circular<std…
...

As discussed at our meeting this morning, the fact that the code now properly keeps track of the (currently always zero) variances of the spacepoints, slows things down. But we absolutely have to do this. The code was artificially faster previously. 🤔

I'll still do some compute profiles, and fix the compilation issue(s). But generally I think we should go forward like this. 🤔

@krasznaa krasznaa marked this pull request as ready for review February 24, 2025 10:52
@krasznaa
Copy link
Member Author

The NSight Compute profile didn't reveal much. 🤔 I believe we still have performance to be gained overall, but that should not be done in this PR. It's already doing more than it really should...

@krasznaa
Copy link
Member Author

Since I couldn't resist, I tried what would happen if I tweaked doublet_finding_helper::transform_coordinates a bit. With just some trivial modifications, I now get:

./traccc/out/build/cuda-native-fp32/bin/traccc_throughput_mt_cuda --input-directory /data/hdd-4tb/acts_data/odd-20240509/geant4_ttbar_mu140/ --input-events 100 --cpu-threads 2 --deterministic
...
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Warm-up processing [==================================================] 100% [00m:00s]                                                                                                                                                                                                                                                                                                                                               
Event processing   [==================================================] 100% [00m:00s]                                                                                                                                                                                                                                                                                                                                               
Reconstructed track parameters: 4767399
Time totals:
                  File reading  28576 ms
            Warm-up processing  2388 ms
              Event processing  21230 ms
Throughput:
            Warm-up processing  238.884 ms/event, 4.18614 events/s
              Event processing  212.308 ms/event, 4.71013 events/s

This can be directly compared with the numbers in #878 (comment). So it will be possible to improve things. 😄

But that will be better to put into a separate, much smaller PR. Once this beast gets in. 😉

@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from ce3b24f to b1825e3 Compare February 24, 2025 20:46
@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from b1825e3 to 35c2a16 Compare February 25, 2025 12:14
@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch 3 times, most recently from f95b308 to 3252eac Compare February 25, 2025 18:32
@krasznaa
Copy link
Member Author

Choosing a local rebase on this branch was not the right call... 😦 Git clearly got very confused about how to transplant my 13 commits on top of the current main branch. First time that I'm running into such an issue. 🤔

It's not he end of the world since the commits would've been squashed in the end anyway, but still...

Copy link
Contributor

@beomki-yeo beomki-yeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me but let's wait for @stephenswat's approval and further comments as the PR is huge

@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from 3252eac to cec16cf Compare February 26, 2025 10:57
Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay in general, but there are so many unrelated changes in this PR... I would suggest we work a bit on pull request hygiene because this single 5000-line delta PR is practically unreviewable; it contains what should have been five or six independent PRs... 😕

/// Spacepoint container for the test seeds
const spacepoint_collection_types::const_view m_spacepoints;
const edm::spacepoint_collection::const_view m_spacepoints;

/// The reference object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now out of date.

return (is_same_scalar(obj.x(), m_ref.x(), m_unc) &&
is_same_scalar(obj.y(), m_ref.y(), m_unc) &&
is_same_scalar(obj.z(), m_ref.z(), m_unc));
}

private:
/// The reference object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only looking at these "outdated" comments now. What do you mean here?

"Reference" here is not in the C++ sense, but rather that "this is the reference that we compare our test object against".

@@ -34,6 +34,7 @@ class nseed_performance_writer {
void initialize();
void finalize();

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you comment out this code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have forgotten about a "temporarily disabled" piece of code. 😦 Let me check...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good catch!

That code is not used anywhere in practice, but I now still tried to update it to be compatible with the new EDM.

It's very likely to still have issues, but I was not going to write client code for this class just to test it.

Comment on lines 109 to 112
const auto spB =
spacepoints.at(sp_device.bin(spB_loc.bin_idx)[spB_loc.sp_idx]);
const auto spT =
spacepoints.at(sp_device.bin(spT_loc.bin_idx)[spT_loc.sp_idx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto spB =
spacepoints.at(sp_device.bin(spB_loc.bin_idx)[spB_loc.sp_idx]);
const auto spT =
spacepoints.at(sp_device.bin(spT_loc.bin_idx)[spT_loc.sp_idx]);
const auto & spB =
spacepoints.at(sp_device.bin(spB_loc.bin_idx)[spB_loc.sp_idx]);
const auto & spT =
spacepoints.at(sp_device.bin(spT_loc.bin_idx)[spT_loc.sp_idx]);

const internal_spacepoint<spacepoint> middle_sp =
sp_grid.bin(middle_sp_counter.m_spM.bin_idx)
.at(middle_sp_counter.m_spM.sp_idx);
const auto middle_sp =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stop making more comments on this but let's capture the proxy spacepoints by const reference instead of by value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote before, that would result in some properly dangerous code.

Since I do know that the compiler would actually allow that to be written. We've had such an occurrence in the ATLAS offline code, where one function all of a sudden started returning its thing by value. And clients continued working for a long time. Until one of the clients tried to pass along such a reference to a different scope. Then all of a sudden all hell broke lose. And it took quite some debugging to figure out what was going wrong. (As the code breaking was quite a bit removed from the code causing the issue.)

I also still debate whether we should use auto so liberally in these places. 🤔 There are proper named types that we could use here. They are just so very long... 🤔 So I'm not sure what's the best for readability and maintainability...

Copy link
Member

@stephenswat stephenswat Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until one of the clients tried to pass along such a reference to a different scope. Then all of a sudden all hell broke lose. And it took quite some debugging to figure out what was going wrong. (As the code breaking was quite a bit removed from the code causing the issue.)

Can you elaborate on how passing on a lifetime-extended temporary by reference is dangerous?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the lifetime of the returned object is as one would expect. It last's "the scope that it was received in".

Let's take the following dummy code for instance:

std::reference_wrapper<const Foo> ref;
{
   const Foo& f = functionReturningValue();
   ref = f;
}
ref.get().doSomething();

The thing we encountered in the offline code was something along these lines.

I feel pretty strongly that using references in these places would very easily lead to coding mistakes later down the road. And clearly... adding & will do absolutely nothing to the performance of the generated code. So what good would it do? It would only obfuscate the code more in my mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::reference_wrapper<const Foo> ref;
{
   const Foo& f = functionReturningValue();
   ref = f;
}
ref.get().doSomething();

By this logic your current proxy objects are also broken:

my_edm_container cont;
std::reference_wrapper<my_edm_container::proxy_type> ref;
{
   my_edm_container::proxy_type p = cont[0];
   ref = p;
}
ref.get().x();

What I am trying to say is, if your user wants to shoot themselves in the foot this much then there is no way to prevent them to (without switching to Rust).

At the same time, keeping these variables as references makes the code more flexible! If the design of the container changes, a reference will be able to keep up without invisible performance degradation; not the case if they are values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, accepting them as references provides a semantic hint to the user that these types are basically just fancy reference packs, and that they cannot be changed without side-effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example the code literally makes a reference to a temporary. Of course it will break then. 😕 In my example the writer of the code could reasonably believe that they are holding onto a reference that has a lifetime beyond the current scope.

I see your argument about changing future containers, but then let me go ahead and replace all the auto-s with concrete types. As I'm coming to think that that would result in the most readable code going forward...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for removing/replacing all usages of auto for the new types. This way if/when the definition of edm::spacepoint_collection::const_device::const_proxy_type changes, the code here should change correctly as well. 🤔

Comment on lines 47 to 51
const std::size_t sp_index = result.size();
result.resize(sp_index + 1u);
auto sp = result.at(sp_index);
traccc::details::fill_pixel_spacepoint(sp, det, meas);
sp.measurement_index() = meas_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had a push_pack function now, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I modified this code before I would've implemented push_back(...). And then didn't bother switching it back.

Mainly because I didn't think that using vecmem::edm::device::push_back(...) would actually make this code easier to understand / maintain. Passing along a proxy to the helper function was easier than passing a templated container. (Since the helper needs to work with both the host and the device container.)

}
++meas_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this and not just std::distance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::distance? You mean to switch from this Frankenstein for-loop to an iterator based one?

Yeah, I could be on board with that. Adding the extra integer seemed like the smaller intervention in the code. But I agree, the for-loop is pretty ugly now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look!

I fear though that I only made the code (even) uglier. 😦

Comment on lines +58 to +61
/// Internal implementation struct
struct impl;
/// Pointer to the internal implementation
std::unique_ptr<impl> m_impl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this needs to be PIMPL? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I wanted to move all the separate classes that we use in the implementation, out of the public interface of the library. As they are all implementation details. That are only used by the host code.

@stephenswat
Copy link
Member

By the way, I think we should strongly consider testing out the new [[lifetimebound]] attribute on these proxies! Not for this PR though.

@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from cec16cf to 6f90e80 Compare February 26, 2025 15:42
@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from 6f90e80 to 38b793f Compare February 26, 2025 21:13
Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead and put this in.

@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from 38b793f to b10bfdc Compare February 27, 2025 16:25
@krasznaa
Copy link
Member Author

We can go ahead and put this in.

May God have mercy on our souls... 😛

@krasznaa krasznaa enabled auto-merge (squash) February 27, 2025 16:26
There is no need for it to receive the size of the measurement
collection separately. It already has a view of the collection
with that information.
Also, make at least some of the code use the new
vecmem::edm::host::push_back(...) functionality.
…thm.

In sync with all the other spacepoint_binning classes.
Note though that the code is not actually used anywhere at the moment.
I only ensured that it would compile.
@krasznaa krasznaa force-pushed the SpacepointSoA-main-20250212 branch from b10bfdc to 529b49c Compare February 27, 2025 17:42
@krasznaa krasznaa merged commit 2abfd24 into acts-project:main Feb 27, 2025
29 checks passed
@krasznaa krasznaa deleted the SpacepointSoA-main-20250212 branch February 27, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpaka Changes related to Alpaka cpu Changes related to CPU code cuda Changes related to CUDA edm Changes to the data model kokkos Changes related to Kokkos sycl Changes related to SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants