[SYCL] Fix issue with half and -fsycl-unnamed-lambda #960

AlexeySachkov · 2019-12-20T21:56:14Z

When -fsycl-unnamed-lambda is present, mapping from SYCL Kernel
function to a corresponding OpenCL kernel name is done via
__unique_stable_name built-in. It is used by device compiler to generate
integration header and it is used by host compiler to find kernels
information in there.

The problem is that we might get different results for the same SYCL
Kernel function when compiling for host and device: the issue appears if
kernel uses half data type which is represented as:

cl::sycl::detail::half_impl::half on host
_Float16 on device

Actually, similar issue exists even without -fsycl-unnamed-lambda, but
for that case we have a work-around in form of
#define _Float16 cl::sycl::detail::half_impl::half in
kernel_desc.hpp to turn device half representation into a host one.

The same trick doesn't apply here and the problem is fixed by doing the
following:

for UniqueStableMangler, we mangle
cl::sycl::detail::half_impl::half in the same way as _Float16, i.e.
FD16_
cl::sycl::detail::half_impl::half is marked as non-substitutable to
avoid other differences in mangled name

erichkeane

OK with the approach, a few quick bits in how we detect it.

clang/lib/AST/ItaniumMangle.cpp

erichkeane · 2019-12-20T22:06:16Z

clang/lib/AST/ItaniumMangle.cpp

+  // (namespace) and name.
+  const CXXRecordDecl *RecTy = Ty->getAsCXXRecordDecl();
+
+  if (!RecTy)


Can this be an assert instead? I'd hate for us to think this works for other things, then it fails.

In this file I call this function with TagType, which might represent both Record and Enum. So, I would prefer if instead

clang/lib/AST/ItaniumMangle.cpp

erichkeane · 2019-12-20T22:07:55Z

clang/lib/AST/ItaniumMangle.cpp

+      Name = cast<CXXRecordDecl>(Ctx)->getName();
+      break;
+    case clang::Decl::Kind::Namespace:
+      Name = cast<NamespaceDecl>(Ctx)->getName();


What happens with anonymous namespaces?

I guess, getName() returns a unique string hash, so, it won't match with the requested name and the function will just return false

clang/lib/AST/ItaniumMangle.cpp

erichkeane · 2019-12-20T22:12:29Z

clang/lib/AST/ItaniumMangle.cpp

+    }
+    if (Name != Scope.second)
+      return false;
+    Ctx = Ctx->getParent();


Check out getEnclosingNamespaceContext. If we can get the chart to tell us what we expect out of everything (and I think all are namespaces?) you can probably just use that until you find the TU.

getEnclosingNamespaceContext doesn't seem to work properly:

For getParent(), I see the following chain of (DeclKind, Name) pairs being analyzed:

33 half 14 half_impl 14 detail 14 sycl 14 cl

While for getEnclosingNamespaceContext() it looks like:

33 half 14 half_impl 14 half_impl

It seems like enclosing namespace context for half_impl is half_impl, which is confusing. Probably I don't fully understand something

Hmm... interesting. My reading of the function doesn't really seem like it SHOULD do that, but it is perhaps an old enough function that it isn't terribly maintained. It just seemed to fit the need :)

Is that the entire chain? or did you 'give up' there. There is an interesting call to 'getPrimaryContext' in that function that seems like it should make sure you don't get duplicates...

Actually... looking at the logic to that function I think it (or my interpretation of it) is wrong... I think it would return half_impl forever. It seems that it never returns the parent of a namespace, just the primary declcontext for the current one (or the namespace containing a current object).

I looked at some other things that do similar work, so I now think getParent is the only way to do this.

Fznamznon · 2019-12-23T08:54:43Z

clang/lib/AST/ItaniumMangle.cpp

+        DeclContextDesc{clang::Decl::Kind::Namespace, "cl"},
+        DeclContextDesc{clang::Decl::Kind::Namespace, "sycl"},
+        DeclContextDesc{clang::Decl::Kind::Namespace, "detail"},
+        DeclContextDesc{clang::Decl::Kind::Namespace, "half_impl"},


What if we change names of classes/namespaces in SYCL headers?
This seems even worse than the way which we use to detect accessors, because name and namespaces of accessors is defined by SYCL spec. This set of detail/half_impl anything is NOT defined by the SYCL spec, it's only details of our implementation of SYCL headers.
This "magic" is fragile, not flexible and cannot be upstreamed.

What if we change names of classes/namespaces in SYCL headers?

Corresponding LIT test will fail and we will have to update this code too.

This "magic" is fragile, not flexible and cannot be upstreamed.

I agree that the whole solution looks very "hacky". Any ideas how to do the same better?

We could try to define cl::sycl::half as a single wrapper. But change underlying type depending on device we or not. Then it will be the same type for mangler. Same as it's done for cl::sycl::vec classes.

What do you mean 'as a single wrapper'? As a type itself? Type aliases aren't mangled. Curious to see what the 'vec' classes solution is.

@erichkeane, the idea is to use cl::sycl::detail::half_impl::half for both device and host code to achieve the same mangling.

This might require writing relatively significant amount of code somewhere in SYCL headers, because this wrapper should encapsulate native half within it on device side:

device compiler should be able to optimize math operations with it

we should be able to call different device built-in functions which accept native half

I will try to do this, but I also would like to get "LGTM" for this patch as back-up plan if the different way will take a lot of time

When `-fsycl-unnamed-lambda` is present, mapping from SYCL Kernel function to a corresponding OpenCL kernel name is done via `__unique_stable_name` built-in. It is used by device compiler to generate integration header and it is used by host compiler to find kernels information in there. The problem is that we might get different results for the same SYCL Kernel function when compiling for host and device: the issue appears if kernel uses `half` data type which is represented as: - `cl::sycl::detail::half_impl::half` on host - `_Float16` on device Actually, similar issue exists even without `-fsycl-unnamed-lambda`, but for that case we have a work-around in form of `#define _Float16 cl::sycl::detail::half_impl::half` in `kernel_desc.hpp` to turn device half representation into a host one. The same trick doesn't apply here and the problem is fixed by doing the following: - for `UniqueStableMangler`, we mangle `cl::sycl::detail::half_impl::half` in the same way as `_Float16`, i.e. `FD16_` - for `UniqueStableMandlger`, `cl::sycl::detail::half_impl::half` is marked as non-substitutable to avoid other differences in mangled name Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>

Fznamznon · 2019-12-26T11:17:16Z

sycl/test/regression/fp16-with-unnamed-lambda.cpp

+  if (!D.has_extension("cl_khr_fp16"))
+    return 0; // Skip the test if halfs are not supported
+
+  cl::sycl::buffer<cl::sycl::cl_half> Buf(1);


I assume just half type also will work, right?

Actually, there is no such thing as half according to the SYCL spec, see KhronosGroup/SYCL-CTS#37

But we have such alias in our implementation and it should also work, because cl::sycl::cl_half is declared as an alias to half

Nope. SYCL spec defines half. See Table 6.1. https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf . There is half defined.

Ok, I see.

However, it says that "all standard C++ fundamental types from Table 6.1", while half is not a standard fundamental data type, see Floating point types

But half also is presented in this table.

Feel free to fix the spec if you think there are some issues. :-)

The only file left is `sycl/test/regression/fp16-with-unnamed-lambda.cpp` Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>

Because of the fact, that `half` type is not a standard C++ type and it is not supported everywhere, its implementation differs between host and device: C++ class with overloaded arithmetic operators is used on host and `_Float16` is used on device side. Previously, the switch between two version was implemented as preprocessor macro and having two different types caused some problems with integration header and unnamed lambda feature, see intel#185 and intel#960. This patch redesigned `half` implementation in a way, that single wrapper data type is used as `half` representation on both host and device sides; differentiation between actual host and device implementations is done under the hood of this wrapper. Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>

…rounds (#1089) Because of the fact, that `half` type is not a standard C++ type and it is not supported everywhere, its implementation differs between host and device: C++ class with overloaded arithmetic operators is used on host and `_Float16` is used on device side. Previously, the switch between two version was implemented as preprocessor macro and having two different types caused some problems with integration header and unnamed lambda feature, see #185 and #960. This patch redesigned `half` implementation in a way, that single wrapper data type is used as `half` representation on both host and device sides; differentiation between actual host and device implementations is done under the hood of this wrapper. Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>

When all the large const offsets masked with the same value from bit-12 to bit-23. Fold add x8, x0, #2031, lsl #12 add x8, x8, #960 ldr x9, [x8, x8] ldr x8, [x8, #2056] into add x8, x0, #2031, lsl #12 ldr x9, [x8, #960] ldr x8, [x8, #3016]

AlexeySachkov requested review from rolandschulz and erichkeane December 20, 2019 21:56

AlexeySachkov assigned Fznamznon Dec 20, 2019

erichkeane reviewed Dec 20, 2019

View reviewed changes

Fznamznon reviewed Dec 23, 2019

View reviewed changes

AlexeySachkov force-pushed the private/asachkov/unique-stable-name-for-half branch from 93d564d to bc98cc4 Compare December 23, 2019 12:57

Fznamznon approved these changes Dec 26, 2019

View reviewed changes

romanovvlad merged commit 514fc0b into intel:sycl Dec 26, 2019

AlexeySachkov mentioned this pull request Feb 3, 2020

[SYCL] Rework 'half' implementation in order to remove bunch of workarounds #1089

Merged

AlexeySachkov added a commit to AlexeySachkov/llvm that referenced this pull request Feb 11, 2020

[SYCL] Partially revert intel#960

c9d78ed

The only file left is `sycl/test/regression/fp16-with-unnamed-lambda.cpp` Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>

AlexeySachkov deleted the private/asachkov/unique-stable-name-for-half branch April 1, 2020 10:23

[SYCL] Fix issue with half and -fsycl-unnamed-lambda #960

[SYCL] Fix issue with half and -fsycl-unnamed-lambda #960

Uh oh!

Conversation

AlexeySachkov commented Dec 20, 2019

Uh oh!

erichkeane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!