Skip to content

Comments

Add option to disable half precision BLAS functions#102

Merged
mmeterel merged 3 commits intouxlfoundation:developfrom
sbalint98:ustream-half-final
Aug 18, 2021
Merged

Add option to disable half precision BLAS functions#102
mmeterel merged 3 commits intouxlfoundation:developfrom
sbalint98:ustream-half-final

Conversation

@sbalint98
Copy link
Contributor

@sbalint98 sbalint98 commented Jun 9, 2021

Description

This PR adds functionality to disable the cuBLAS and MKLcpu BLAS functions, which take half-precision arguments. related to #99

Checklist

  • Do all unit tests pass locally? Attach a log.
  • Have you formatted the code using clang-format?

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for PR! I have two comments for these changes.

#cmakedefine ENABLE_MKLCPU_BACKEND
#cmakedefine ENABLE_MKLGPU_BACKEND
#cmakedefine ENABLE_NETLIB_BACKEND
#cmakedefine ENABLE_HALF_ROUTINES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think new flag needs to be applied to all backends, but I see changes only in cublas and mklcpu, is there a reason why it's not propagated to other backends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an oversight on my part, I only tested with these two backends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probable means more changes needs to be done on other backends as well.

However, I have my reservations about disabling half precision since it is part of SYCL Specification 1.2.1 section 5.1. Could you please remind me your argument for not supporting half precision in hipSYCL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have my reservations about disabling half precision since it is part of SYCL Specification 1.2.1 section 5.1.

Maybe more importantly, it is part of the SYCL 2020 spec, C.9.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's necessary to disable half for hipSYCL, can disabling be done implicitly when using hipSYCL rather than adding an explicit functionality, instead of introducing more complexity into the build system?

Copy link

@illuhad illuhad Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "...all SYCL implementations..." is very clear.

It's not even defined in the core spec what the fp16 optional feature means, only in the backend specification. At the same time, backend specifications are not binding for other backends. IMO there's a gap there. The definition of the features mentioned in the core spec (fp16, fp64, atomic64) should also be defined in the core spec if they are supposed to be well-defined everywhere.

And by the way, thanks for hipSYCL!

'Tis our pleasure :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not even defined in the core spec what the fp16 optional feature means...

Can it be that it's definition is inferred by the LLVM?
https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point

AMDGPU is one of the currently supported targets. Out of curiosity (for my own understanding), what is the limitation in hipSYCL to support fp16? hipSYCL is also based on LLVM.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be that it's definition is inferred by the LLVM?

My guess is that it's an oversight that happened when content was shuffled around when we generalized the SYCL backend model for SYCL 2020. In any case, LLVM conventions are also not binding for the SYCL spec :)

AMDGPU is one of the currently supported targets. Out of curiosity (for my own understanding), what is the limitation in hipSYCL to support fp16? hipSYCL is also based on LLVM.

It's mainly an implementation prioritization issue. On GPU it's fairly straight forward as you point out, but on CPU we have to build some emulation or ship some C++ half library with hipSYCL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it'd be useful to ask @bader or @Pennycook for clarification.

Thanks for the fruitful discussion, @illuhad! Good luck with this!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the specification is unclear, here. The definition of half is something that I think needs more discussion.

There are already a few internal Khronos issues open that I think are relevant to this issue: https://gitlab.khronos.org/sycl/Specification/-/issues/322, https://gitlab.khronos.org/sycl/Specification/-/issues/455. @illuhad, if you agree that those issues are relevant, would you mind adding some more details there?

@sbalint98 sbalint98 force-pushed the ustream-half-final branch from 0c9abb9 to 665c253 Compare July 15, 2021 10:51
@sbalint98
Copy link
Contributor Author

Sorry for the long delay. Added runtime checking for the support for fp16 types.

::cblas_sgemm(CBLASMAJOR, transa_, transb_, m, n, k, f32_alpha, f32_a, lda, f32_b, ldb,
f32_beta, f32_c, ldc);
// copy C back to half
// copy C back to cl::sycl::half
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert this change?

#include <type_traits>

// Utility function to verify that a given set of types is supported by the
// device compiler combintiation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combination (typo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed!

@mmeterel
Copy link
Contributor

@sbalint98 Sorry for the delay. Could you please resolve conflicts in src/blas/backends/mklcpu/mklcpu_level3.cxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants