Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] build failed on cuda 12 runtime #1136

Closed
pxLi opened this issue May 15, 2023 · 3 comments · Fixed by #1143
Closed

[BUG] build failed on cuda 12 runtime #1136

pxLi opened this issue May 15, 2023 · 3 comments · Fixed by #1143
Assignees
Labels
bug Something isn't working build

Comments

@pxLi
Copy link
Collaborator

pxLi commented May 15, 2023

Describe the bug
build JNI failed in recent nightly runs in cuda12 runtime. (build in cuda11 works fine)

[2023-05-13T03:32:07.425Z] [INFO]      [exec] -- The C compiler identification is GNU 11.2.1
[2023-05-13T03:32:07.425Z] [INFO]      [exec] -- The CXX compiler identification is GNU 11.2.1
[2023-05-13T03:32:08.364Z] [INFO]      [exec] -- The CUDA compiler identification is NVIDIA 12.0.140
...
[2023-05-13T03:32:10.136Z] [INFO]      [exec] -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.0.140") 
...

[2023-05-13T04:17:38.652Z] [INFO]      [exec] /usr/local/cuda/include/thrust/iterator/transform_iterator.h:189:7:   required from 'class thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<std::pair<std::unique_ptr<cudf::column>, int> (*)(thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default>, thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default>, rmm::cuda_stream_view, rmm::mr::device_memory_resource*), cudf::detail::make_offsets_child_column<thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default> >, 1>, thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default>, int>, thrust::counting_iterator<int>, thrust::use_default, thrust::use_default>'
[2023-05-13T04:17:38.652Z] [INFO]      [exec] /tmp/tmpxft_0000492c_00000000-6_zorder.compute_90.cudafe1.stub.c:49:1088:   required from here
[2023-05-13T04:17:38.652Z] [INFO]      [exec] /usr/local/cuda/include/cuda/std/detail/libcxx/include/type_traits:4405:16: error: static assertion failed: Attempt to use an extended __device__ lambda in a context that requires querying its return type in host code. Use a named function object, a __host__ __device__ lambda, or cuda::proclaim_return_type instead.
[2023-05-13T04:17:38.652Z] [INFO]      [exec]  4405 |   static_assert(!__nv_is_extended_device_lambda_closure_type(_Fp),
[2023-05-13T04:17:38.652Z] [INFO]      [exec]       |               ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2023-05-13T04:17:38.652Z] [INFO]      [exec] /usr/local/cuda/include/cuda/std/detail/libcxx/include/type_traits:4405:16: note: '!(bool)__nv_extended_device_lambda_trait_helper<__nv_dl_wrapper_t<__nv_dl_tag<std::pair<std::unique_ptr<cudf::column>, int> (*)(thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default>, thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default>, rmm::cuda_stream_view, rmm::mr::device_memory_resource*), cudf::detail::make_offsets_child_column<thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default> >, 1>, thrust::constant_iterator<long unsigned int, thrust::use_default, thrust::use_default>, int> >::value' evaluates to false
[2023-05-13T04:17:38.653Z] [INFO]      [exec] gmake[2]: *** [CMakeFiles/spark_rapids_jni.dir/build.make:244: CMakeFiles/spark_rapids_jni.dir/src/zorder.cu.o] Error 1
[2023-05-13T04:17:56.728Z] [INFO]      [exec] [ 62%] Linking CUDA shared library libcufaultinj.so
[2023-05-13T04:17:56.990Z] [INFO]      [exec] [ 62%] Built target cufaultinj
[2023-05-13T04:18:11.862Z] [INFO]      [exec] gmake[1]: *** [CMakeFiles/Makefile2:170: CMakeFiles/spark_rapids_jni.dir/all] Error 2
[2023-05-13T04:18:11.862Z] [INFO]      [exec] gmake: *** [Makefile:146: all] Error 2

Steps/Code to reproduce bug
run mvn build of JNI in cuda 12 runtime

Expected behavior
Pass the build

@pxLi pxLi added bug Something isn't working build ? - Needs Triage labels May 15, 2023
@bdice
Copy link
Contributor

bdice commented May 15, 2023

@ttnghia asked me a question related to this offline, and linked to this issue. I decided to respond here so it's publicly visible.

CUDA 12 ships with a version of thrust (2.0.0 or newer, updated in this PR) that requires the use of the libcudacxx function cuda::proclaim_return_type for device lambdas if the return type of the lambda needs to be known on the host. Some reasons might include thrust needing to know the amount of (shared?) memory to allocate for a kernel. This is needed because of a bug in the CUDA compiler, where the host cannot always correctly deduce the return type of a device lambda. Incorrect type deduction leads to inscrutable errors (I can't remember if the errors happen at compile time or run time) or perhaps undefined behavior. @jrhemstad would be able to better describe the original motivation for raising this, and perhaps fix anything I've said incorrectly.

In RAPIDS / libcudf, we are currently pinned to an older libcudacxx that doesn't have this requirement. We do need to migrate to this version to libcudacxx for official CUDA 12 support in libcudacxx -- older versions of libcudacxx like those in RAPIDS right now don't officially support CUDA 12, in my understanding (but they happen to work, it seems). I am working on adding CUDA 12 support in these PRs, and most of the necessary changes revolve around cuda::proclaim_return_type.

@bdice
Copy link
Contributor

bdice commented May 15, 2023

The short story is: figure out the return type of your device lambdas, and wrap them in cuda::proclaim_return_type<T>([] __device__ (...){ ... });.

e.g. this one:

[col = *output_dv_ptr,
num_columns,
data_type_size,
input = *input_dv] __device__ (cudf::size_type ret_idx) {

should look like

thrust::for_each_n(
    rmm::exec_policy(stream),
    thrust::make_counting_iterator<cudf::size_type>(0),
    output_size,
    cuda::proclaim_return_type<void>(
      [col = *output_dv_ptr, 
       num_columns,
       data_type_size,
       input = *input_dv] __device__ (cudf::size_type ret_idx) {
       ...
     }));

But to do this, you may need to use a fairly new thrust/libcudacxx version to build with both CUDA 11 and CUDA 12.

@bdice
Copy link
Contributor

bdice commented May 16, 2023

I discussed with @ttnghia. I propose the following fix:

  1. Fetch CCCL libraries (Thrust/libcudacxx) from rapids-cmake. This means you'd get CCCL versions that are newer than those in CUDA 11.8, but older than those in CUDA 12. You'd basically be hiding the problem for a short period of time, by using a version of Thrust that is slightly older than the one shipped in CUDA 12 (but doesn't check device lambdas for safety). This would align with what's currently happening in RAPIDS until we can upgrade to a newer version of CCCL.
  2. Then, when rapids-cmake updates to CCCL 2.1.0 (sometime in the near-ish future, perhaps mid-cycle for 23.08), you'll see this problem occur again in both CUDA 11 and CUDA 12 builds. At this point, you'll have to actually fix the problem by wrapping device lambdas in cuda::proclaim_return_type. This is the same set of changes I'm making in libcudf right now: Test updates of CCCL (thrust, cub, libcudacxx) to 2.1.0. rapidsai/cudf#13222

To fetch CCCL libraries from source, you'd want to edit the CMakeLists.txt file:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants