Skip to content

Program with device code in multiple translation units fails on CUDA #4156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sergey-semenov opened this issue Jul 21, 2021 · 9 comments · Fixed by #4107
Closed

Program with device code in multiple translation units fails on CUDA #4156

sergey-semenov opened this issue Jul 21, 2021 · 9 comments · Fixed by #4107
Labels
bug Something isn't working cuda CUDA back-end

Comments

@sergey-semenov
Copy link
Contributor

sergey-semenov commented Jul 21, 2021

Describe the bug
A simple program with device code in multiple translation units fails in runtime with CUDA_ERROR_INVALID_IMAGE as of #3735

To Reproduce
h.hpp:

#include <CL/sycl.hpp>

void submit_kernelB();

b.cpp

#include "h.hpp"

class KernelNameB;

void submit_kernelB() {
  sycl::queue q;
  q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameB>([]() {}); });
}

main.cpp:

#include "h.hpp"
#include <CL/sycl.hpp>

class KernelNameA;
void submit_kernelA() {
  sycl::queue q;
  q.submit([&](sycl::handler &cgh) { cgh.single_task<KernelNameA>([]() {}); });
}

int main() { submit_kernelA(); }
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice main.cpp b.cpp
./a.out

This reproducer fails with CUDA_ERROR_INVALID_IMAGE, note that compiling this results in 2 device images as of #3735, but in only one with it reverted. The error disappears once the number of device images in the application is reduced to 1 (either by moving submit_kernelB to the same translation unit as submit_kernelA, by using -fsycl-device-code-split=off or by reverting #3735).

Environment:

  • OS: Linux
  • Target device and vendor: CUDA, Titan RTX.
  • DPC++ version: e9d308e
  • Dependencies version: CUDA 10.1
@sergey-semenov sergey-semenov added the bug Something isn't working label Jul 21, 2021
@bader bader added the cuda CUDA back-end label Jul 21, 2021
@bader
Copy link
Contributor

bader commented Jul 21, 2021

@sergey-semenov, I think this issue is already fixed by 351af24. Could you check with newer version of the compiler?

@bader
Copy link
Contributor

bader commented Jul 21, 2021

#4088 and #4079 seems to be about the same problem.

@sergey-semenov
Copy link
Contributor Author

Ah, I specified the guilty commit rather than the version of the compiler I reproduced this on. This problem is still reproducible on e9d308e, so it seems 351af24 didn't address this.

@bader
Copy link
Contributor

bader commented Jul 21, 2021

Okay, thanks for the update.
Maybe #4107 will help this case.

@Michoumichmich
Copy link
Contributor

Michoumichmich commented Jul 21, 2021

@sergey-semenov can you try that? Reverting only the driver is enough to make cuda work again (temporarily)

@sergey-semenov
Copy link
Contributor Author

@Michoumichmich Reverting the driver part of f7ce532 didn't help, the reproducer is still failing as before.

@bader
Copy link
Contributor

bader commented Jul 26, 2021

A simple program with device code in multiple translation units fails in runtime with CUDA_ERROR_INVALID_IMAGE as of #3735

@steffenlarsen, could you take a look, please? It looks like #3735 introduced a significant functional regression.

@steffenlarsen
Copy link
Contributor

#4107 introduces a better solution for the driver changes in #3735. @sergey-semenov would you be able to check if that solves this issue?

@sergey-semenov
Copy link
Contributor Author

This is indeed resolved by #4107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants