-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compatibility with CUDA 12.8 #919
base: main
Are you sure you want to change the base?
Conversation
This commit makes traccc compatible with CUDA 12.8 through three changes: 1. It removes the standalone translation units containing templated kernels, instead wrapping them in simple C++ wrapper functions which gives them more robust linkage. 2. It sets the default CUDA architecture to CC7.5, although bugs in the build systems of dependencies are currently overriding this flag. 3. It adds a note to the troubleshooting section about an incompatibility between traccc and CUDA 12.8 in debug mode.
|
Was CKF the only algorithm incompatible with CUDA 12.8? For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the way that things are done by now, wouldn't it be easier to:
- Add one
.hpp
file per kernel, which lists all the overloaded versions of the functions, without any templating; - Add one
.cu
file per kernel, which has a templated__global__
function, and the overloaded C++ functions calling that kernel appropriately.
?
I.e. similar to that I've done in traccc::core
and traccc::sycl
? I think some of the templating here is not really helping us anymore.
@@ -24,7 +24,7 @@ if( "${CMAKE_CUDA_COMPILER_ID}" MATCHES "NVIDIA" ) | |||
endif() | |||
|
|||
# Set the CUDA architecture to build code for. | |||
set( CMAKE_CUDA_ARCHITECTURES "52" CACHE STRING | |||
set( CMAKE_CUDA_ARCHITECTURES "75" CACHE STRING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yesh. Unfortunately the include order in the project makes this indeed ineffective. 😦
In a separate PR we should change the setup a little. Moving such global settings simply to the main CMakeLists.txt
file. Such that they would be guaranteed to precede whatever version some of the externals are setting up.
Note that I've been setting CMAKE_CUDA_ARCHITECTURES
in CMakeUserPresets.json
recently to set for instance "native"
in my local builds. (Just to give ideas.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the superior option would really be to have the dependent projects not set any global CMake variables!
@@ -17,12 +17,20 @@ | |||
namespace traccc::cuda::kernels { | |||
|
|||
template <typename detector_t> | |||
__global__ void apply_interaction( | |||
__global__ void _apply_interaction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy about this naming. 🤔
My first idea would've been to give some new name to the C++ functions, and keep the name of the kernels the same.
But another option could be to call the actual kernel something like traccc::cuda::kernels::impl::apply_interaction
. 🤔
I'm just not a fan of using functions with underscore prefixes. 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do this in a little impl
namespace. 👍
<<<nBlocks, nThreads, 0, stream>>>( | ||
m_cfg, {det_view, n_in_params, in_params_buffer, | ||
param_liveness_buffer}); | ||
kernels::apply_interaction<std::decay_t<detector_type>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be possible to come up with a design in which the called (templated) function could deduce its template parameters? 🤔 Unfortunately none of the provided parameters are detector_type
directly. But maybe some formalism could still be found for such a setup.
It would just make the API of the function less error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be doable after #921.
__global__ void apply_interaction( | ||
const finding_config cfg, | ||
device::apply_interaction_payload<detector_t> payload); | ||
void apply_interaction(const dim3& grid_size, const dim3& block_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are now C++ headers. Let's rename them to .hpp
! Or do they still in turn include some .cuh
files of their own? 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do that; although they do need dim3
from the cUDA runtime, but we can include that from a .hpp
as well.
This commit makes traccc compatible with CUDA 12.8 through three changes: