Add compatibility with CUDA 12.8 #919

stephenswat · 2025-03-19T13:06:56Z

This commit makes traccc compatible with CUDA 12.8 through three changes:

It removes the standalone translation units containing templated kernels, instead wrapping them in simple C++ wrapper functions which gives them more robust linkage.
It sets the default CUDA architecture to CC7.5, although bugs in the build systems of dependencies are currently overriding this flag.
It adds a note to the troubleshooting section about an incompatibility between traccc and CUDA 12.8 in debug mode.

This commit makes traccc compatible with CUDA 12.8 through three changes: 1. It removes the standalone translation units containing templated kernels, instead wrapping them in simple C++ wrapper functions which gives them more robust linkage. 2. It sets the default CUDA architecture to CC7.5, although bugs in the build systems of dependencies are currently overriding this flag. 3. It adds a note to the troubleshooting section about an incompatibility between traccc and CUDA 12.8 in debug mode.

sonarqubecloud · 2025-03-19T13:07:27Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

beomki-yeo · 2025-03-21T03:32:28Z

Was CKF the only algorithm incompatible with CUDA 12.8? For example, fit function is also templated with detector type

krasznaa

With the way that things are done by now, wouldn't it be easier to:

Add one .hpp file per kernel, which lists all the overloaded versions of the functions, without any templating;
Add one .cu file per kernel, which has a templated __global__ function, and the overloaded C++ functions calling that kernel appropriately.

?

I.e. similar to that I've done in traccc::core and traccc::sycl? I think some of the templating here is not really helping us anymore.

krasznaa · 2025-03-21T08:02:05Z

cmake/traccc-compiler-options-cuda.cmake

@@ -24,7 +24,7 @@ if( "${CMAKE_CUDA_COMPILER_ID}" MATCHES "NVIDIA" )
 endif()

 # Set the CUDA architecture to build code for.
-set( CMAKE_CUDA_ARCHITECTURES "52" CACHE STRING
+set( CMAKE_CUDA_ARCHITECTURES "75" CACHE STRING


Yesh. Unfortunately the include order in the project makes this indeed ineffective. 😦

In a separate PR we should change the setup a little. Moving such global settings simply to the main CMakeLists.txt file. Such that they would be guaranteed to precede whatever version some of the externals are setting up.

Note that I've been setting CMAKE_CUDA_ARCHITECTURES in CMakeUserPresets.json recently to set for instance "native" in my local builds. (Just to give ideas.)

I'd say the superior option would really be to have the dependent projects not set any global CMake variables!

krasznaa · 2025-03-21T08:05:10Z

device/cuda/src/finding/kernels/specializations/apply_interaction_src.cuh

@@ -17,12 +17,20 @@
 namespace traccc::cuda::kernels {

 template <typename detector_t>
-__global__ void apply_interaction(
+__global__ void _apply_interaction(


I'm not super happy about this naming. 🤔

My first idea would've been to give some new name to the C++ functions, and keep the name of the kernels the same.

But another option could be to call the actual kernel something like traccc::cuda::kernels::impl::apply_interaction. 🤔

I'm just not a fan of using functions with underscore prefixes. 😦

Sure, we can do this in a little impl namespace. 👍

krasznaa · 2025-03-21T08:08:07Z

device/cuda/src/finding/finding_algorithm.cu

-                <<<nBlocks, nThreads, 0, stream>>>(
-                    m_cfg, {det_view, n_in_params, in_params_buffer,
-                            param_liveness_buffer});
+            kernels::apply_interaction<std::decay_t<detector_type>>(


Would it not be possible to come up with a design in which the called (templated) function could deduce its template parameters? 🤔 Unfortunately none of the provided parameters are detector_type directly. But maybe some formalism could still be found for such a setup.

It would just make the API of the function less error prone.

Should be doable after #921.

krasznaa · 2025-03-21T08:09:09Z

device/cuda/src/finding/kernels/apply_interaction.cuh

-__global__ void apply_interaction(
-    const finding_config cfg,
-    device::apply_interaction_payload<detector_t> payload);
+void apply_interaction(const dim3& grid_size, const dim3& block_size,


These are now C++ headers. Let's rename them to .hpp! Or do they still in turn include some .cuh files of their own? 😕

Sure, we can do that; although they do need dim3 from the cUDA runtime, but we can include that from a .hpp as well.

stephenswat added build This relates to the build system cuda Changes related to CUDA labels Mar 19, 2025

stephenswat requested review from krasznaa and beomki-yeo March 19, 2025 13:06

krasznaa requested changes Mar 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compatibility with CUDA 12.8 #919

Add compatibility with CUDA 12.8 #919

stephenswat commented Mar 19, 2025

sonarqubecloud bot commented Mar 19, 2025

beomki-yeo commented Mar 21, 2025

krasznaa left a comment

krasznaa Mar 21, 2025

stephenswat Mar 21, 2025

krasznaa Mar 21, 2025

stephenswat Mar 21, 2025

krasznaa Mar 21, 2025

stephenswat Mar 21, 2025

krasznaa Mar 21, 2025

stephenswat Mar 21, 2025

Add compatibility with CUDA 12.8 #919

Are you sure you want to change the base?

Add compatibility with CUDA 12.8 #919

Conversation

stephenswat commented Mar 19, 2025

sonarqubecloud bot commented Mar 19, 2025

Quality Gate passed

beomki-yeo commented Mar 21, 2025

krasznaa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment