Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAE][SYCL] Enable DAE in SYCL kernel functions #2226

Merged
merged 6 commits into from
Aug 4, 2020

Conversation

DenisBakhvalov
Copy link
Contributor

We allow eliminating dead arguments in SYCL kernel functions
even if they have external linkage.

This patch also updates information about kernel arguments
in the integration header file.

We allow eliminating dead arguments in SYCL kernel functions
even if they have external linkage.

This patch also updates information about kernel arguments
in the integration header file.
@DenisBakhvalov
Copy link
Contributor Author

I think we can ignore clang-format-check fails since it will disrupt the current formatting in files and it will look odd:

        (void) llvm::createDeadArgEliminationPass();
 -      (void) llvm::createDeadArgEliminationSYCLPass();
 +      (void)llvm::createDeadArgEliminationSYCLPass();
        (void) llvm::createDeadCodeEliminationPass();

// TODO: batch changes to multiple SYCL kernels and do one bulk update.
constexpr StringLiteral OMIT_TABLE_BEGIN("// OMIT_TABLE_BEGIN");
constexpr StringLiteral OMIT_TABLE_END("// OMIT_TABLE_END");
static void updateIntegrationHeader(StringRef SyclKernelName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK...hopefully the header file is short and/or the number of kernels is small, as we are writing the whole file on each update. Otherwise we have to go with a "F F F F" kind of table that can be modified in-place without changing the size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will make a difference in practice. But I'm working on a patch for doing a bulk update of int-header. I would like to do this change incrementally, i.e. in a separate patch.

llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp Outdated Show resolved Hide resolved
Comment on lines +784 to +787
if (!IntHeaderBuffer)
report_fatal_error("unable to read integration header file '" +
IntegrationHeaderFileName +
"': " + IntHeaderBuffer.getError().message());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to replace this with assert(s)?
Another option to use LLVM_DEBUG and just skip the optimization.
Hard fail like this probably not the best approach for an optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a hard fail. If we did not update int header it will result in a later runtime fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did not update int header it will result in a later runtime fail.

If we keep all arguments, it should be okay. I mean if anything is wrong with pre-requisites, this pass can just do nothing instead of crashing and it won't break anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced an early exit if IntegrationHeaderFileName is not provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I think we can replace all error checking with asserts in this file.
This way we can "hard fail" on build with enabled assertions and there is no overhead on internal consistency checking on the builds w/o assertions.

if (!<cond>)
  report_fatal_error(<msg>);

->

assert(<cond> & <msg>);

I don't think we should do thorough runtime checking for integration header format. This file is auto-generated by the compiler, so it's should be validated in scope of #2236. We can validate that file format is correct with asserts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to address this in a separate PR.

llvm/test/Transforms/DeadArgElim/sycl-kernels-neg1.ll Outdated Show resolved Hide resolved
llvm/test/Transforms/DeadArgElim/sycl-kernels-neg2.ll Outdated Show resolved Hide resolved
llvm/test/Transforms/DeadArgElim/sycl-kernels.ll Outdated Show resolved Hide resolved
@DenisBakhvalov DenisBakhvalov requested a review from bader August 3, 2020 21:57
Comment on lines +784 to +787
if (!IntHeaderBuffer)
report_fatal_error("unable to read integration header file '" +
IntegrationHeaderFileName +
"': " + IntHeaderBuffer.getError().message());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I think we can replace all error checking with asserts in this file.
This way we can "hard fail" on build with enabled assertions and there is no overhead on internal consistency checking on the builds w/o assertions.

if (!<cond>)
  report_fatal_error(<msg>);

->

assert(<cond> & <msg>);

I don't think we should do thorough runtime checking for integration header format. This file is auto-generated by the compiler, so it's should be validated in scope of #2236. We can validate that file format is correct with asserts.

@bader bader merged commit 0f33f7a into intel:sycl Aug 4, 2020
@keryell
Copy link
Contributor

keryell commented Aug 4, 2020

I am unsure to understand "Renamed SCYL kernels to SPIR kernels " ab6f272
While the tests are using SPIR, the code does not seem to be very specific to SPIR. Does this apply for CUDA PTX back-end or plain LLVM IR kernels for example?

@bader
Copy link
Contributor

bader commented Aug 4, 2020

While the tests are using SPIR, the code does not seem to be very specific to SPIR. Does this apply for CUDA PTX back-end or plain LLVM IR kernels for example?

This pass is applied to the functions with SPIR_KERNEL calling conversion and when module triple contains sycldevice: https://github.com/intel/llvm/blob/sycl/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp#L577

If you have better wording, we are open to changes.

@keryell
Copy link
Contributor

keryell commented Aug 4, 2020

Why would you not apply this optimization on any kind of SYCL kernels?
I am not familiar with the PTX compiler flow. Is it using SPIR internally and the triple is changed later by some LLVM passes after applying this DAE optimization?
Or is this a way to slow down non-SPIR devices compared to SPIR devices? :-)

@bader
Copy link
Contributor

bader commented Aug 4, 2020

Why would you not apply this optimization on any kind of SYCL kernels?
I am not familiar with the PTX compiler flow. Is it using SPIR internally and the triple is changed later by some LLVM passes after applying this DAE optimization?

AFAIK, PTX compiler flow is also using "sycldevice" environment component, but I'm not sure about spir_kernel calling convention. Tagging @Naghasan. @keryell, what do you suggest using for "any kind of SYCL kernels" detection?

Or is this a way to slow down non-SPIR devices compared to SPIR devices? :-)

No. :-)

@Naghasan
Copy link
Contributor

Naghasan commented Aug 4, 2020

Why would you not apply this optimization on any kind of SYCL kernels?
I am not familiar with the PTX compiler flow. Is it using SPIR internally and the triple is changed later by some LLVM passes after applying this DAE optimization?

AFAIK, PTX compiler flow is also using "sycldevice" environment component

It does, it is used to trigger some SYCL specific transformations (local memory support and global offset).

but I'm not sure about spir_kernel calling convention. Tagging @Naghasan. @keryell, what do you suggest using for "any kind of SYCL kernels" detection?

Hitting a nerve here :-) AFAIK, only SPIR uses a calling convention to id a kernel entry point (along side metadata). For NVPTX, entry points are id by a metadata https://llvm.org/docs/NVPTXUsage.html#kernel-metadata and I think it requires to have the default CC (but don't quote me on this). There is a ptx_kernel CC, but it is not much used any more.

So I think this:

 bool FuncIsSpirKernel =
      CheckSpirKernels &&
      StringRef(F.getParent()->getTargetTriple()).contains("sycldevice") &&
      F.getCallingConv() == CallingConv::SPIR_KERNEL;

Could be turned into something like:

 bool FuncIsSYCLKernel =
      CheckSpirKernels &&
      StringRef(F.getParent()->getTargetTriple()).contains("sycldevice") &&
      isKernelFunction(F);

where isKernelFunction implements the logic for known backend. AFAIK there is no generic function that implement this generically.

Or is this a way to slow down non-SPIR devices compared to SPIR devices? :-)

No. :-)

I'm sure it is :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants