-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL][CUDA] bfloat16 in oneapi namespace and also supporting CUDA #5393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added bfloat16 in oneapi experimental namespace. Signed-off-by: jack.kirk <jack.kirk@codeplay.com>
BTW this PR is ready for review. I had only left it as draft because there are some small outstanding issues which means it cannot be completely finished, but these should be minor issues. What we really want to know is whether this an acceptable direction for generalizing this extension to support other vendor's backends? It seems to be almost trivial to allow other backends to use this extension (the changes I made are very small). One issue is that we need to know is whether the aspect from #4266 is going to be eventually added? so we know whether we also need to support it in the CUDA backend. You may also want to take note of the bug that I found as described in my first message here ^^. I am aware that I did not yet add support to the CUDA version of the intel/llvm device test because it will require a different compilation instruction for the CUDA case. Again this is a small issue that can be addressed at a later point. |
namespace oneapi { | ||
namespace experimental { | ||
|
||
class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be wrapped with SYCL_DEVICE_ONLY defined, for it to be built successfully by other compilers, right? (see #5594)
@AlexeySotkin, are you planning for adding this as a separate PR?
…_BF16_CONVERSION.asciidoc
Added updated design doc switching from Intel vendor specific to oneapi extension. We are aware there is another open PR for this design doc: #5248. |
Removed aspect reference: can be added once the ext_oneapi_bfloat16 aspect is merged.
Update: As discussed the plan is to have a single bfloat16 oneapi experimental extension. As such I have renamed the existing intel experimental extension as "sycl_ext_oneapi_bfloat16.asciidoc" and made some small naming changes. If it is preferred I can (temporarily) switch this document back to its original name since that could help with reviewing the (small number of) changes in the document? Ideally this PR would be merged first, then other open PRs that add to the design document/bfloat16 functionality such as #5645 and #5248 would follow once they have been updated to reflect the new extension name etc. This PR can be tested with a draft test-suite PR that is up to date for the purpose of this functional testing: intel/llvm-test-suite#889 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is preferred I can (temporarily) switch this document back to its original name since that could help with reviewing the (small number of) changes in the document?
No need. I checked out the PR and did the diff manually.
I think this PR needs to remove the existing file sycl_ext_intel_bf16_conversion.asciidoc
, correct?
namespace experimental { | ||
|
||
class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 { | ||
class bfloat16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, did you remove this C++ attribute intentionally?
Removing it is probably not a big problem because we don't have support in the rest of the tool chain yet to take advantage of this attribute. If there's no reason to remove it, though, it seems better to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it was intentional. I think that it will be replaced by the oneapi aspect in #5720.
The only reason for deleting it is that the cuda backend does not have this ext_intel_bf16_conversion aspect (which as I understand it will be replaced by the oneapi aspect generally). However I can try adding it back and seeing if the cuda backend bfloat16 test passes with it included, since I'm not sure how it will behave. Shall I do this? If the cuda backend test passes then this line can remain unchanged if that is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I thought the aspect was renamed in this PR, but I see now that it is not.
When #5720 is merged, we should add this C++ attribute back and use the new aspect. I think you could probably keep the attribute in this PR with the old aspect name, and it would probably not hurt CUDA. The attribute will cause the front-end to put some metadata on the IR, but I think all IR passes ignore that metadata right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I thought the aspect was renamed in this PR, but I see now that it is not.
When #5720 is merged, we should add this C++ attribute back and use the new aspect.
So is it OK to leave it as it is? I would remember to add it back when the new aspect is available....
I think you could probably keep the attribute in this PR with the old aspect name, and it would probably not hurt CUDA. The attribute will cause the front-end to put some metadata on the IR, but I think all IR passes ignore that metadata right now.
Or should I add it back now with the old aspect name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is OK.
Yes, thanks for pointing this out. I've removed the old doc now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec changes are OK
/verify with intel/llvm-test-suite#889 |
Hi @bader RE: #5849 (comment) I've tried this here but it still seems to be testing using the old version of bfloat16_type.cpp rather than the new version in my llvm-test-suite test. Did I do something incorrectly? Thanks |
@JackAKirk, I think you are look at wrong logs. |
Oh yes, I see that the failing tests at the bottom of this page must be from from the existing bfloat16_type.cpp test, which is expected to fail without the patch from intel/llvm-test-suite#889. In that case this I think that this PR can be considered for merge. @alexbatashev would it be possible to get a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec changes still look OK.
/verify with intel/llvm-test-suite#889 |
This SYCL 2020 extension proposal proposes adding bfloat16 support to the fma, fmin, fmax and fabs SYCL floating point math functions. Blocked by #5393 Co-authored-by: JackAKirk <chezjakirk@gmail.com>
I have tested these changes on the CUDA backend using a intel->oneapi namespace version of this test:
https://github.com/intel/llvm-test-suite/blob/intel/SYCL/BFloat16/bfloat16_type.cpp
There is a bug in the verify_logic function in the bfloat16_type.cpp test (C accessor is not written to) - I'm not sure how this did not lead to a failure already. With the bug fixed the test passes for the CUDA backend with this patch. I've added a draft test file that also increases the coverage to test unary minus operator here: intel/llvm-test-suite#889.
Note that the unary neg intrinsic added here that is used in unary minus will be pulled down from upstream via e.g. https://reviews.llvm.org/D117887.
Is this an acceptable direction for generalizing this extension to support other vendor's backends?