-
Notifications
You must be signed in to change notification settings - Fork 767
Program with device code in multiple translation units fails on CUDA #4156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@sergey-semenov, I think this issue is already fixed by 351af24. Could you check with newer version of the compiler? |
Okay, thanks for the update. |
@sergey-semenov can you try that? Reverting only the driver is enough to make cuda work again (temporarily) |
@Michoumichmich Reverting the driver part of f7ce532 didn't help, the reproducer is still failing as before. |
@steffenlarsen, could you take a look, please? It looks like #3735 introduced a significant functional regression. |
#4107 introduces a better solution for the driver changes in #3735. @sergey-semenov would you be able to check if that solves this issue? |
This is indeed resolved by #4107 |
Describe the bug
A simple program with device code in multiple translation units fails in runtime with CUDA_ERROR_INVALID_IMAGE as of #3735
To Reproduce
h.hpp:
b.cpp
main.cpp:
This reproducer fails with CUDA_ERROR_INVALID_IMAGE, note that compiling this results in 2 device images as of #3735, but in only one with it reverted. The error disappears once the number of device images in the application is reduced to 1 (either by moving
submit_kernelB
to the same translation unit assubmit_kernelA
, by using-fsycl-device-code-split=off
or by reverting #3735).Environment:
The text was updated successfully, but these errors were encountered: