Skip to content

Avoid CL_BUILD_PROGRAM_FAILURE under DPC++ for devices not supporting doubles #2968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ekondis opened this issue Dec 29, 2020 · 4 comments
Closed

Comments

@ekondis
Copy link

ekondis commented Dec 29, 2020

I'm developing a SYCL program which invokes multiple kernels, one of which is using double precision computations. However, when run on a Gen12LP HD GPU, which seems not to support double precision, I get an exception during build (CL_BUILD_PROGRAM_FAILURE):

error: double type is not supported on this platform
error: backend compiler failed build.

AFAIK, kernel building under SYCL is performed implicitly. Can I control somehow which kernels will be compiled under DPC++ so I can avoid the exception for devices not supporting double precision?

@AlexeySachkov
Copy link
Contributor

Hi @ekondis,

AFAIK, kernel building under SYCL is performed implicitly. Can I control somehow which kernels will be compiled under DPC++ so I can avoid the exception for devices not supporting double precision?

Yes, you can do that and there are even a couple of ways to do that:

  1. You can switch to explicit build of kernels, but I wouldn't recommend it, because it is not really user-friendly, see [SYCL][SPEC] Tricked by spec example code, really need some one tell me what happened. :) #2704
  2. You can use device code split feature we provide within our compiler. This would probably be the easiest way for you to workaround this problem

About device-code-split: by default, we merge all device code from an app into a single device image which means that even if you launch one of a one hundred kernels, all of them will be sent to compilation, which might result in errors like you have just experienced. However, there is an option to split device code into a several device images: you can either choose per-source mode, where all kernels from a single translation unit go into a separate device image or per-kernel mode, where each kernel goes into a separate device image. By distributing kernels into different device images you can avoid getting errors like you mentioned.

The flag is documented here. As an example you can see how SYCL-CTS are organized and compiled by our SYCL implementation: combination of KhronosGroup/SYCL-CTS#55 and KhronosGroup/SYCL-CTS#42 allows us to correctly launch test for stream class regardless of used device.

Also, I think that it is reasonable to expect from the compiler to automatically distribute kernels that use optional features into separate device images. It seems to perfectly fit recently added auto device code split mode: #2827, which intended to automatically choose the best split option (and I've just realized that I forgot to update documentation about the command line option, my bad)

@ekondis
Copy link
Author

ekondis commented Dec 29, 2020

Thank you for your thorough explanation.
So, in case I set a per-kernel mode then only the invoked kernels will be sent to compilation, right?

@AlexeySachkov
Copy link
Contributor

So, in case I set a per-kernel mode then only the invoked kernels will be sent to compilation, right?

Exactly

@ekondis
Copy link
Author

ekondis commented Dec 30, 2020

Thanks again @AlexeySachkov.

@ekondis ekondis closed this as completed Dec 30, 2020
jsji pushed a commit that referenced this issue Feb 25, 2025
…tes (#2968)

This continues #2258
All the backports with the rename has reached the backend drivers, so
now it's safe to remove the old naming.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@ee6e8fac80c53ff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants