Add option to disable half precision BLAS functions#102
Add option to disable half precision BLAS functions#102mmeterel merged 3 commits intouxlfoundation:developfrom
Conversation
mkrainiuk
left a comment
There was a problem hiding this comment.
Thank you for PR! I have two comments for these changes.
src/config.hpp.in
Outdated
| #cmakedefine ENABLE_MKLCPU_BACKEND | ||
| #cmakedefine ENABLE_MKLGPU_BACKEND | ||
| #cmakedefine ENABLE_NETLIB_BACKEND | ||
| #cmakedefine ENABLE_HALF_ROUTINES |
There was a problem hiding this comment.
I think new flag needs to be applied to all backends, but I see changes only in cublas and mklcpu, is there a reason why it's not propagated to other backends?
There was a problem hiding this comment.
This seems to be an oversight on my part, I only tested with these two backends.
There was a problem hiding this comment.
This probable means more changes needs to be done on other backends as well.
However, I have my reservations about disabling half precision since it is part of SYCL Specification 1.2.1 section 5.1. Could you please remind me your argument for not supporting half precision in hipSYCL?
There was a problem hiding this comment.
I have my reservations about disabling half precision since it is part of SYCL Specification 1.2.1 section 5.1.
Maybe more importantly, it is part of the SYCL 2020 spec, C.9.1.
There was a problem hiding this comment.
If it's necessary to disable half for hipSYCL, can disabling be done implicitly when using hipSYCL rather than adding an explicit functionality, instead of introducing more complexity into the build system?
There was a problem hiding this comment.
I think "...all SYCL implementations..." is very clear.
It's not even defined in the core spec what the fp16 optional feature means, only in the backend specification. At the same time, backend specifications are not binding for other backends. IMO there's a gap there. The definition of the features mentioned in the core spec (fp16, fp64, atomic64) should also be defined in the core spec if they are supposed to be well-defined everywhere.
And by the way, thanks for hipSYCL!
'Tis our pleasure :)
There was a problem hiding this comment.
It's not even defined in the core spec what the fp16 optional feature means...
Can it be that it's definition is inferred by the LLVM?
https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
AMDGPU is one of the currently supported targets. Out of curiosity (for my own understanding), what is the limitation in hipSYCL to support fp16? hipSYCL is also based on LLVM.
There was a problem hiding this comment.
Can it be that it's definition is inferred by the LLVM?
My guess is that it's an oversight that happened when content was shuffled around when we generalized the SYCL backend model for SYCL 2020. In any case, LLVM conventions are also not binding for the SYCL spec :)
AMDGPU is one of the currently supported targets. Out of curiosity (for my own understanding), what is the limitation in hipSYCL to support fp16? hipSYCL is also based on LLVM.
It's mainly an implementation prioritization issue. On GPU it's fairly straight forward as you point out, but on CPU we have to build some emulation or ship some C++ half library with hipSYCL.
There was a problem hiding this comment.
Perhaps it'd be useful to ask @bader or @Pennycook for clarification.
Thanks for the fruitful discussion, @illuhad! Good luck with this!
There was a problem hiding this comment.
I agree that the specification is unclear, here. The definition of half is something that I think needs more discussion.
There are already a few internal Khronos issues open that I think are relevant to this issue: https://gitlab.khronos.org/sycl/Specification/-/issues/322, https://gitlab.khronos.org/sycl/Specification/-/issues/455. @illuhad, if you agree that those issues are relevant, would you mind adding some more details there?
0c9abb9 to
665c253
Compare
|
Sorry for the long delay. Added runtime checking for the support for fp16 types. |
| ::cblas_sgemm(CBLASMAJOR, transa_, transb_, m, n, k, f32_alpha, f32_a, lda, f32_b, ldb, | ||
| f32_beta, f32_c, ldc); | ||
| // copy C back to half | ||
| // copy C back to cl::sycl::half |
There was a problem hiding this comment.
Could you please revert this change?
| #include <type_traits> | ||
|
|
||
| // Utility function to verify that a given set of types is supported by the | ||
| // device compiler combintiation |
|
@sbalint98 Sorry for the delay. Could you please resolve conflicts in src/blas/backends/mklcpu/mklcpu_level3.cxx |
1a1d2a5 to
cb3dc41
Compare
Description
This PR adds functionality to disable the cuBLAS and MKLcpu BLAS functions, which take half-precision arguments. related to #99
Checklist