-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL] Fix issue with half and -fsycl-unnamed-lambda #960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix issue with half and -fsycl-unnamed-lambda #960
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with the approach, a few quick bits in how we detect it.
// (namespace) and name. | ||
const CXXRecordDecl *RecTy = Ty->getAsCXXRecordDecl(); | ||
|
||
if (!RecTy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be an assert instead? I'd hate for us to think this works for other things, then it fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this file I call this function with TagType
, which might represent both Record and Enum. So, I would prefer if
instead
Name = cast<CXXRecordDecl>(Ctx)->getName(); | ||
break; | ||
case clang::Decl::Kind::Namespace: | ||
Name = cast<NamespaceDecl>(Ctx)->getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with anonymous namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, getName()
returns a unique string hash, so, it won't match with the requested name and the function will just return false
} | ||
if (Name != Scope.second) | ||
return false; | ||
Ctx = Ctx->getParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out getEnclosingNamespaceContext. If we can get the chart to tell us what we expect out of everything (and I think all are namespaces?) you can probably just use that until you find the TU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getEnclosingNamespaceContext
doesn't seem to work properly:
For getParent()
, I see the following chain of (DeclKind, Name)
pairs being analyzed:
33 half
14 half_impl
14 detail
14 sycl
14 cl
While for getEnclosingNamespaceContext()
it looks like:
33 half
14 half_impl
14 half_impl
It seems like enclosing namespace context for half_impl
is half_impl
, which is confusing. Probably I don't fully understand something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... interesting. My reading of the function doesn't really seem like it SHOULD do that, but it is perhaps an old enough function that it isn't terribly maintained. It just seemed to fit the need :)
Is that the entire chain? or did you 'give up' there. There is an interesting call to 'getPrimaryContext' in that function that seems like it should make sure you don't get duplicates...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... looking at the logic to that function I think it (or my interpretation of it) is wrong... I think it would return half_impl forever. It seems that it never returns the parent of a namespace, just the primary declcontext for the current one (or the namespace containing a current object).
I looked at some other things that do similar work, so I now think getParent is the only way to do this.
clang/lib/AST/ItaniumMangle.cpp
Outdated
DeclContextDesc{clang::Decl::Kind::Namespace, "cl"}, | ||
DeclContextDesc{clang::Decl::Kind::Namespace, "sycl"}, | ||
DeclContextDesc{clang::Decl::Kind::Namespace, "detail"}, | ||
DeclContextDesc{clang::Decl::Kind::Namespace, "half_impl"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we change names of classes/namespaces in SYCL headers?
This seems even worse than the way which we use to detect accessors, because name and namespaces of accessors is defined by SYCL spec. This set of detail/half_impl anything is NOT defined by the SYCL spec, it's only details of our implementation of SYCL headers.
This "magic" is fragile, not flexible and cannot be upstreamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we change names of classes/namespaces in SYCL headers?
Corresponding LIT test will fail and we will have to update this code too.
This "magic" is fragile, not flexible and cannot be upstreamed.
I agree that the whole solution looks very "hacky". Any ideas how to do the same better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try to define cl::sycl::half
as a single wrapper. But change underlying type depending on device we or not. Then it will be the same type for mangler. Same as it's done for cl::sycl::vec
classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean 'as a single wrapper'? As a type itself? Type aliases aren't mangled. Curious to see what the 'vec' classes solution is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erichkeane, the idea is to use cl::sycl::detail::half_impl::half
for both device and host code to achieve the same mangling.
This might require writing relatively significant amount of code somewhere in SYCL headers, because this wrapper should encapsulate native half within it on device side:
- device compiler should be able to optimize math operations with it
- we should be able to call different device built-in functions which accept native half
I will try to do this, but I also would like to get "LGTM" for this patch as back-up plan if the different way will take a lot of time
When `-fsycl-unnamed-lambda` is present, mapping from SYCL Kernel function to a corresponding OpenCL kernel name is done via `__unique_stable_name` built-in. It is used by device compiler to generate integration header and it is used by host compiler to find kernels information in there. The problem is that we might get different results for the same SYCL Kernel function when compiling for host and device: the issue appears if kernel uses `half` data type which is represented as: - `cl::sycl::detail::half_impl::half` on host - `_Float16` on device Actually, similar issue exists even without `-fsycl-unnamed-lambda`, but for that case we have a work-around in form of `#define _Float16 cl::sycl::detail::half_impl::half` in `kernel_desc.hpp` to turn device half representation into a host one. The same trick doesn't apply here and the problem is fixed by doing the following: - for `UniqueStableMangler`, we mangle `cl::sycl::detail::half_impl::half` in the same way as `_Float16`, i.e. `FD16_` - for `UniqueStableMandlger`, `cl::sycl::detail::half_impl::half` is marked as non-substitutable to avoid other differences in mangled name Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
93d564d
to
bc98cc4
Compare
if (!D.has_extension("cl_khr_fp16")) | ||
return 0; // Skip the test if halfs are not supported | ||
|
||
cl::sycl::buffer<cl::sycl::cl_half> Buf(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume just half
type also will work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is no such thing as half
according to the SYCL spec, see KhronosGroup/SYCL-CTS#37
But we have such alias in our implementation and it should also work, because cl::sycl::cl_half
is declared as an alias to half
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. SYCL spec defines half
. See Table 6.1. https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf . There is half
defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see.
However, it says that "all standard C++ fundamental types from Table 6.1", while half
is not a standard fundamental data type, see Floating point types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But half
also is presented in this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to fix the spec if you think there are some issues. :-)
The only file left is `sycl/test/regression/fp16-with-unnamed-lambda.cpp` Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Because of the fact, that `half` type is not a standard C++ type and it is not supported everywhere, its implementation differs between host and device: C++ class with overloaded arithmetic operators is used on host and `_Float16` is used on device side. Previously, the switch between two version was implemented as preprocessor macro and having two different types caused some problems with integration header and unnamed lambda feature, see intel#185 and intel#960. This patch redesigned `half` implementation in a way, that single wrapper data type is used as `half` representation on both host and device sides; differentiation between actual host and device implementations is done under the hood of this wrapper. Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
…rounds (#1089) Because of the fact, that `half` type is not a standard C++ type and it is not supported everywhere, its implementation differs between host and device: C++ class with overloaded arithmetic operators is used on host and `_Float16` is used on device side. Previously, the switch between two version was implemented as preprocessor macro and having two different types caused some problems with integration header and unnamed lambda feature, see #185 and #960. This patch redesigned `half` implementation in a way, that single wrapper data type is used as `half` representation on both host and device sides; differentiation between actual host and device implementations is done under the hood of this wrapper. Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
When
-fsycl-unnamed-lambda
is present, mapping from SYCL Kernelfunction to a corresponding OpenCL kernel name is done via
__unique_stable_name
built-in. It is used by device compiler to generateintegration header and it is used by host compiler to find kernels
information in there.
The problem is that we might get different results for the same SYCL
Kernel function when compiling for host and device: the issue appears if
kernel uses
half
data type which is represented as:cl::sycl::detail::half_impl::half
on host_Float16
on deviceActually, similar issue exists even without
-fsycl-unnamed-lambda
, butfor that case we have a work-around in form of
#define _Float16 cl::sycl::detail::half_impl::half
inkernel_desc.hpp
to turn device half representation into a host one.The same trick doesn't apply here and the problem is fixed by doing the
following:
UniqueStableMangler
, we manglecl::sycl::detail::half_impl::half
in the same way as_Float16
, i.e.FD16_
cl::sycl::detail::half_impl::half
is marked as non-substitutable toavoid other differences in mangled name