Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting code to cuda::kernel::wrap API changes #486

Closed
fvanmaele opened this issue Apr 3, 2023 · 2 comments
Closed

Adopting code to cuda::kernel::wrap API changes #486

fvanmaele opened this issue Apr 3, 2023 · 2 comments
Assignees
Labels

Comments

@fvanmaele
Copy link

fvanmaele commented Apr 3, 2023

I'm having some old code which uses the old API of cuda::kernel::wrap, which was modified in commit bc53844. The call looks as follows:

		    return cuda::kernel::wrap(
                cuda::device::current::detail_::get_id(),
		        kernel::reduce<M_ic,
		                       block_dim_ic,
		                       num_partitions_per_block_ic,
		                       num_rhs_batch_ic,
		                       pivoting,
		                       int,
		                       T,
		                       TR,
		                       typename make_scalar_type<T>::type>);

kernel::reduce above is a void function. See: https://mp-force.ziti.uni-heidelberg.de/fvanmaele/tridigpu/-/blob/master/include/tridigpu/reduction.h#L58-80 and https://mp-force.ziti.uni-heidelberg.de/fvanmaele/tridigpu/-/blob/master/include/tridigpu/reduction.h#L105

How can I adopt this to the new interface?
Unfortunately I am not familiar with this project, and I couldn't find a migration guide for the API either. Changing to cuda::kernel::get was unsuccessful.

@eyalroz
Copy link
Owner

eyalroz commented Apr 3, 2023

tl;dr: cuda::kernel::get(my_device, my_plain_vanilla_kernel_function) is known to work; see the execution control example.


The effect of that big change on the wrap() method is that it now requires a CUDA context. People who are used to the runtime API don't know about context - and that's ok - but wrap() is the lowest-level non-detail_ kind of API call I offer, and it must really "not do anything", so it needs the context. Under the hood, the CUDA runtime API uses something called the "primary context" on each device.

So, your options are:

  1. Getting the handle of the primary context on the device (which is not entirely trivial since that thing is reference counted; but auto my_pcontext = my_device.primary_context(), keeping that object alive, and using my_pcontext.handle() does the trick.
  2. Using the convenience function template I offer to more runtime-API-oriented people, cuda::kernel::get(const device_t& device, KernelFunctionPtr function_ptr) where you only pass a device and a kernel function. That takes care of the context magic for you. Note: It takes a device, not a device id!

@eyalroz eyalroz self-assigned this Apr 3, 2023
@eyalroz
Copy link
Owner

eyalroz commented Apr 7, 2023

Assuming the question is answered to @fvanmaele 's satisfaction... please comment again if that's not the case.

@eyalroz eyalroz closed this as completed Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants