Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Emit diagnostic when attempting to use extended device lambda in contexts that require querying the return type #1688

Closed
jrhemstad opened this issue May 12, 2022 · 5 comments · Fixed by #1760
Assignees
Labels
backend: CUDA Related to the CUDA backend compiler: nvcc Specific to the NVCC compiler. good first issue Good for newcomers. P1: should have Necessary, but not critical. release: breaking change Include in "Breaking Changes" section of release notes. type: enhancement New feature or request.
Milestone

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented May 12, 2022

It is a very common pitfall of Thrust users to attempt to use a __device__ lambda with Thrust algorithms or iterators that fails in silent or obscure ways.

This is frequently due to the limitation that you cannot reliably query the return type of an extended lambda in host code. Specifically:

  1. As described above, the CUDA compiler replaces a device extended lambda defined in a host function with a placeholder type defined in namespace scope. This placeholder type does not define a operator() function equivalent to the original lambda declaration. An attempt to determine the return type or parameter types of the operator() function may therefore work incorrectly in host code, as the code processed by the host compiler will be semantically different than the input code processed by the CUDA compiler. However, it is OK to introspect the return type or parameter types of the operator() function within device code. Note that this restriction does not apply to host device extended lambdas.

This means that any code in Thrust that relies on std::invoke_result_t or std::result_of or similar in host code will fail with a __device__ lambda, e.g.,

using result_type = std::invoke_result_t<TransformOp, input_type>;

This will frequently fail silently or fail to compile with incredibly obscure compile errors. This is such a nuisance that I avoid __device__ lambdas all together with Thrust.

However, I recently learned that nvcc provides intrinsic type traits to query if a type is an extended lambda.

The compiler provides type traits to detect closure types for extended lambdas at compile time:

__nv_is_extended_device_lambda_closure_type(type): If 'type' is the closure class created for an extended device lambda, then the trait is true, otherwise it is false.

__nv_is_extended_host_device_lambda_closure_type(type): If 'type' is the closure class created for an extended host device lambda, then the trait is true, otherwise it is false.

This means Thrust can emit a useful diagnostic when a user attempts to use a __device__ lambda in a situation where that would be problematic (e.g., std::invoke_result_t).

One easy way to do this in thrust would be to introduce a consistent wrapper for invoke_result_t/result_of that simply static_asserts that the callable is not an extended lambda.

namespace thrust::detail{
   template <typename F, typename... Args>
   struct invoke_result_t : std::invoke_result_t<F, Args...>{
       static_assert(!__nv_is_extended_device_lambda_closure_type(F), "Attempt to use an extended __device__ lambda in a context that requires querying its return type. Use a named function object or a __host__ __device__ lambda instead.");
}

There are likely many other useful ways Thrust could use these traits. Deducing (or at least verifying) execution space from the callable comes to mind.

@jrhemstad jrhemstad added P1: should have Necessary, but not critical. type: enhancement New feature or request. good first issue Good for newcomers. compiler: nvcc Specific to the NVCC compiler. backend: CUDA Related to the CUDA backend labels May 12, 2022
@gevtushenko
Copy link
Collaborator

We are relying on the thrust/cub version of invoke_result_t.

This means that any code in Thrust that relies on std::invoke_result_t or std::result_of or similar will fail with a device lambda, e.g.

According to the documentation, it's not any code, it's host code:

it is OK to introspect the return type or parameter types of the operator() function within device code.

I'm using invoke_result_t to introspect the return type within device code. Should the code be something like the following one instead?

static_assert(__builtin_is_device_code() || !__nv_is_extended_device_lambda_closure_type(F), "Attempt to use an extended __device__ lambda in a context that requires querying its return type. Use a named function object or a __host__ __device__ lambda instead.");

Alternatively, we might provide thrust::invoke_device_result_t without this assert.

@alliepiper
Copy link
Collaborator

Related to #779.

@jrhemstad
Copy link
Collaborator Author

I'm using invoke_result_t to introspect the return type within device code. Should the code be something like the following one instead?

That's a great point.

This looks to work the way we want:

template <typename F, typename... Args>
struct invoke_result : std::invoke_result<F, Args...> {
#if defined(__NVCC__) && defined(__CUDACC_EXTENDED_LAMBDA__) && !defined(__CUDA_ARCH__)
    static_assert(!__nv_is_extended_device_lambda_closure_type(F),
                  "Attempt to use an extended __device__ lambda in a context "
                  "that requires querying its return type. Use a named "
                  "function object or a __host__ __device__ lambda instead.");
    )               
#endif
};

though those guards may be overkill in-lieu of just using !defined(__CUDA_ARCH__). nvc++ doesn't have "extended" lambdas, so we don't have to worry about that.

https://godbolt.org/z/j6qYT9d8E

@alliepiper
Copy link
Collaborator

@senior-zero @jrhemstad Is there more to do for this? Can we close it?

@gevtushenko
Copy link
Collaborator

I can see std::invoke_result_t usage in thrust, I'll probably have to changes these places to use cuda::std::invoke_result_t instead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend: CUDA Related to the CUDA backend compiler: nvcc Specific to the NVCC compiler. good first issue Good for newcomers. P1: should have Necessary, but not critical. release: breaking change Include in "Breaking Changes" section of release notes. type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants