Skip to content

[SYCL] Fix issue with half and -fsycl-unnamed-lambda #960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AlexeySachkov
Copy link
Contributor

When -fsycl-unnamed-lambda is present, mapping from SYCL Kernel
function to a corresponding OpenCL kernel name is done via
__unique_stable_name built-in. It is used by device compiler to generate
integration header and it is used by host compiler to find kernels
information in there.

The problem is that we might get different results for the same SYCL
Kernel function when compiling for host and device: the issue appears if
kernel uses half data type which is represented as:

  • cl::sycl::detail::half_impl::half on host
  • _Float16 on device

Actually, similar issue exists even without -fsycl-unnamed-lambda, but
for that case we have a work-around in form of
#define _Float16 cl::sycl::detail::half_impl::half in
kernel_desc.hpp to turn device half representation into a host one.

The same trick doesn't apply here and the problem is fixed by doing the
following:

  • for UniqueStableMangler, we mangle
    cl::sycl::detail::half_impl::half in the same way as _Float16, i.e.
    FD16_
  • cl::sycl::detail::half_impl::half is marked as non-substitutable to
    avoid other differences in mangled name

Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with the approach, a few quick bits in how we detect it.

// (namespace) and name.
const CXXRecordDecl *RecTy = Ty->getAsCXXRecordDecl();

if (!RecTy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an assert instead? I'd hate for us to think this works for other things, then it fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file I call this function with TagType, which might represent both Record and Enum. So, I would prefer if instead

Name = cast<CXXRecordDecl>(Ctx)->getName();
break;
case clang::Decl::Kind::Namespace:
Name = cast<NamespaceDecl>(Ctx)->getName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with anonymous namespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, getName() returns a unique string hash, so, it won't match with the requested name and the function will just return false

}
if (Name != Scope.second)
return false;
Ctx = Ctx->getParent();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out getEnclosingNamespaceContext. If we can get the chart to tell us what we expect out of everything (and I think all are namespaces?) you can probably just use that until you find the TU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEnclosingNamespaceContext doesn't seem to work properly:

For getParent(), I see the following chain of (DeclKind, Name) pairs being analyzed:

33      half
14      half_impl
14      detail
14      sycl
14      cl

While for getEnclosingNamespaceContext() it looks like:

33      half
14      half_impl
14      half_impl

It seems like enclosing namespace context for half_impl is half_impl, which is confusing. Probably I don't fully understand something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... interesting. My reading of the function doesn't really seem like it SHOULD do that, but it is perhaps an old enough function that it isn't terribly maintained. It just seemed to fit the need :)

Is that the entire chain? or did you 'give up' there. There is an interesting call to 'getPrimaryContext' in that function that seems like it should make sure you don't get duplicates...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... looking at the logic to that function I think it (or my interpretation of it) is wrong... I think it would return half_impl forever. It seems that it never returns the parent of a namespace, just the primary declcontext for the current one (or the namespace containing a current object).

I looked at some other things that do similar work, so I now think getParent is the only way to do this.

DeclContextDesc{clang::Decl::Kind::Namespace, "cl"},
DeclContextDesc{clang::Decl::Kind::Namespace, "sycl"},
DeclContextDesc{clang::Decl::Kind::Namespace, "detail"},
DeclContextDesc{clang::Decl::Kind::Namespace, "half_impl"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we change names of classes/namespaces in SYCL headers?
This seems even worse than the way which we use to detect accessors, because name and namespaces of accessors is defined by SYCL spec. This set of detail/half_impl anything is NOT defined by the SYCL spec, it's only details of our implementation of SYCL headers.
This "magic" is fragile, not flexible and cannot be upstreamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we change names of classes/namespaces in SYCL headers?

Corresponding LIT test will fail and we will have to update this code too.

This "magic" is fragile, not flexible and cannot be upstreamed.

I agree that the whole solution looks very "hacky". Any ideas how to do the same better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to define cl::sycl::half as a single wrapper. But change underlying type depending on device we or not. Then it will be the same type for mangler. Same as it's done for cl::sycl::vec classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean 'as a single wrapper'? As a type itself? Type aliases aren't mangled. Curious to see what the 'vec' classes solution is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erichkeane, the idea is to use cl::sycl::detail::half_impl::half for both device and host code to achieve the same mangling.

This might require writing relatively significant amount of code somewhere in SYCL headers, because this wrapper should encapsulate native half within it on device side:

  • device compiler should be able to optimize math operations with it
  • we should be able to call different device built-in functions which accept native half

I will try to do this, but I also would like to get "LGTM" for this patch as back-up plan if the different way will take a lot of time

When `-fsycl-unnamed-lambda` is present, mapping from SYCL Kernel
function to a corresponding OpenCL kernel name is done via
`__unique_stable_name` built-in. It is used by device compiler to generate
integration header and it is used by host compiler to find kernels
information in there.

The problem is that we might get different results for the same SYCL
Kernel function when compiling for host and device: the issue appears if
kernel uses `half` data type which is represented as:
- `cl::sycl::detail::half_impl::half` on host
- `_Float16` on device

Actually, similar issue exists even without `-fsycl-unnamed-lambda`, but
for that case we have a work-around in form of
`#define _Float16 cl::sycl::detail::half_impl::half` in
`kernel_desc.hpp` to turn device half representation into a host one.

The same trick doesn't apply here and the problem is fixed by doing the
following:
- for `UniqueStableMangler`, we mangle
  `cl::sycl::detail::half_impl::half` in the same way as `_Float16`, i.e.
  `FD16_`
- for `UniqueStableMandlger`, `cl::sycl::detail::half_impl::half` is marked as
  non-substitutable to avoid other differences in mangled name

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
@AlexeySachkov AlexeySachkov force-pushed the private/asachkov/unique-stable-name-for-half branch from 93d564d to bc98cc4 Compare December 23, 2019 12:57
if (!D.has_extension("cl_khr_fp16"))
return 0; // Skip the test if halfs are not supported

cl::sycl::buffer<cl::sycl::cl_half> Buf(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume just half type also will work, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is no such thing as half according to the SYCL spec, see KhronosGroup/SYCL-CTS#37

But we have such alias in our implementation and it should also work, because cl::sycl::cl_half is declared as an alias to half

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. SYCL spec defines half. See Table 6.1. https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf . There is half defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see.

However, it says that "all standard C++ fundamental types from Table 6.1", while half is not a standard fundamental data type, see Floating point types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But half also is presented in this table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to fix the spec if you think there are some issues. :-)

@romanovvlad romanovvlad merged commit 514fc0b into intel:sycl Dec 26, 2019
AlexeySachkov added a commit to AlexeySachkov/llvm that referenced this pull request Feb 11, 2020
The only file left is
`sycl/test/regression/fp16-with-unnamed-lambda.cpp`

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
AlexeySachkov added a commit to AlexeySachkov/llvm that referenced this pull request Feb 11, 2020
Because of the fact, that `half` type is not a standard C++ type and it is
not supported everywhere, its implementation differs between host and
device: C++ class with overloaded arithmetic operators is used on host
and `_Float16` is used on device side.

Previously, the switch between two version was implemented as
preprocessor macro and having two different types caused some problems
with integration header and unnamed lambda feature, see intel#185 and
intel#960.

This patch redesigned `half` implementation in a way, that single
wrapper data type is used as `half` representation on both host and
device sides; differentiation between actual host and device
implementations is done under the hood of this wrapper.

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
bader pushed a commit that referenced this pull request Feb 12, 2020
…rounds (#1089)

Because of the fact, that `half` type is not a standard C++ type and it is
not supported everywhere, its implementation differs between host and
device: C++ class with overloaded arithmetic operators is used on host
and `_Float16` is used on device side.

Previously, the switch between two version was implemented as
preprocessor macro and having two different types caused some problems
with integration header and unnamed lambda feature, see #185 and
#960.

This patch redesigned `half` implementation in a way, that single
wrapper data type is used as `half` representation on both host and
device sides; differentiation between actual host and device
implementations is done under the hood of this wrapper.

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
@AlexeySachkov AlexeySachkov deleted the private/asachkov/unique-stable-name-for-half branch April 1, 2020 10:23
iclsrc pushed a commit that referenced this pull request Dec 5, 2023
When all the large const offsets masked with the same value from bit-12 to bit-23.
Fold
  add     x8, x0, #2031, lsl #12
  add     x8, x8, #960
  ldr     x9, [x8, x8]
  ldr     x8, [x8, #2056]

into
  add     x8, x0, #2031, lsl #12
  ldr     x9, [x8, #960]
  ldr     x8, [x8, #3016]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants