Skip to content

[SYCL] Mark kernel on host for enabling optimizations. #2186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 5, 2020

Conversation

premanandrao
Copy link
Contributor

@premanandrao premanandrao commented Jul 27, 2020

To enable better optimizations on the host, kernel implementation
routines may be marked with the attribute [[clang::sycl_kernel]].
Previously the 'sycl_kernel' attribute was ignored on host.
The call operator of the function object passed to the SYCL kernel
is also marked with 'alwaysinline' on the host to enable these optimizations
to be more effective.

Signed-off-by: Premanand M Rao premanand.m.rao@intel.com

@premanandrao
Copy link
Contributor Author

TODO: When PR #1785 is merged, the tests need to be modified to use the new const reference syntax.

@premanandrao premanandrao requested a review from turinevgeny July 27, 2020 22:23
// Function must return void.
if (!getFunctionOrMethodResultType(D)->isVoidType()) {
S.Diag(FT->getLocation(), diag::warn_sycl_attr_return_type)
<< "sycl_kernel_impl";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need additional attribute?
sycl_kernel_impl looks fully identical to sycl_kernel attribute. Can we just use sycl_kernel attribute instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is the restriction on the number of template arguments and the number of function arguments on a routine marked "sycl_kernel". The implementation routines that I have seen seem to need a bit more leeway than that. Can that restriction on the "sycl_kernel" routines be lifted? If so, we can reuse the same attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "parameters" in both places that I said "arguments" above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those restrictions are not coming from the spec, they aim to guarantee that the right function is marked with this attribute, so it can be used to construct OpenCL kernel.
We can lift some of them if we are sure that OpenCL kernel generation won't be broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this information, @Fznamznon! Since OpenCL kernel generation happens on the device, I will try to reuse the same attribute on the host for this PR purpose.

@premanandrao premanandrao requested a review from Fznamznon July 28, 2020 19:39
S.ConstructOpenCLKernel(FD, MC);
} else if (S.LangOpts.SYCLIsHost) {
auto CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl();
for (auto Method : CRD->methods())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (auto Method : CRD->methods())
for (auto *Method : CRD->methods())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 6152 to 6154
// Because all SYCL kernel functions are template functions - they
// have deferred instantination. We need bodies of these functions
// so we are checking for SYCL kernel attribute after instantiation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment became out of context here. Lets remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,37 @@
// RUN: %clang_cc1 -fsycl -fsycl-is-host -fsyntax-only -verify %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this test with existing test/SemaSYCL/kernel-attribute.cpp test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please review.

@@ -891,6 +891,12 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
if (D && D->hasAttr<CFICanonicalJumpTableAttr>())
Fn->addFnAttr("cfi-canonical-jump-table");

if (getLangOpts().SYCLIsHost && D && D->hasAttr<SYCLKernelAttr>()) {
Fn->addFnAttr("sycl_kernel");
// Sycl kernel functions are not subject to inlining
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Sycl kernel functions are not subject to inlining
// SYCL kernel functions are not subject to inlining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 2 to 3
// Test that the kernel implementation routine marked with 'sycl_kernel' has the attribute `sycl_kernel` in the generated LLVM IR
// and that the function object passed to the sycl kernel is marked 'alwaysinline' on the host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please keep these lines short? i.e. 80 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry I missed this.

if (S.LangOpts.SYCLIsDevice) {
S.ConstructOpenCLKernel(FD, MC);
} else if (S.LangOpts.SYCLIsHost) {
auto CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl();
auto *CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl();

However, I would prefer explicit type here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used explicit type here.

@keryell
Copy link
Contributor

keryell commented Jul 29, 2020

If the goal is to have [[always_inline]] for example, just use it inside the C++ headers and do not touch the compiler at all?

@premanandrao
Copy link
Contributor Author

If the goal is to have [[always_inline]] for example, just use it inside the C++ headers and do not touch the compiler at all?

The 'alwaysinline' attribute is applied on the call operator of the user-provided lambda or function object. I am not sure how that can be marked in the header file. If I have misunderstood you, please explain.
However the 'noinline' attribute was on the sycl kernel itself and I have removed it (in my code changes) since that can be done in the header file.

Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, otherwise ok.

Comment on lines 7196 to 7197
// Function template must have at least two template parameters to be
// able to generate OpenCL SYCL kernels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Function template must have at least two template parameters to be
// able to generate OpenCL SYCL kernels.
// Function template must have at least two template parameters so it can be used
// in OpenCL kernel generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

To enable better optimizations on the host, kernel implementation
routines may be marked with the attribute [[clang::sycl_kernel_impl]].
The function object passed to the SYCL kernel is also marked with
alwaysinline on the host to enable these optimizations to be more
effective.

Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
@premanandrao premanandrao force-pushed the remote_sycl_kernel_impl branch from 2eb8142 to 690af65 Compare August 3, 2020 19:35
@premanandrao
Copy link
Contributor Author

Sorry for the force-push, there were some unrelated test failures which needed changes from other commits.

@premanandrao premanandrao requested a review from Fznamznon August 3, 2020 19:38
@Fznamznon
Copy link
Contributor

Tests still failing.

@premanandrao
Copy link
Contributor Author

Tests still failing.

Yes, not sure why though. They were unrelated to my changes and a restart shows a clean run.

@premanandrao
Copy link
Contributor Author

@bader , would it be possible to merge this soon? I'd like this to get into github today if feasible. Thanks!

@bader bader merged commit 414c1e5 into intel:sycl Aug 5, 2020
premanandrao added a commit to premanandrao/llvm that referenced this pull request Sep 22, 2021
This is essentially a revert of PR intel#2186, as the supposed
optimization did not materialize.  I have removed the
functionality added in that PR, while keeping the code
cleanups.
premanandrao added a commit to premanandrao/llvm that referenced this pull request Sep 22, 2021
This is essentially a revert of PR intel#2186, as the supposed
optimization did not materialize.  I have removed the
functionality added in that PR, while keeping the code
cleanups.
dm-vodopyanov pushed a commit that referenced this pull request Sep 25, 2021
This is essentially a revert of PR #2186, as the supposed
optimization did not materialize. I have removed the
functionality added in that PR, while keeping the code
cleanups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants