Skip to content

Conversation

@sergey-semenov
Copy link
Contributor

@sergey-semenov sergey-semenov commented Apr 5, 2023

Retreive kernel argument mask while creating the kernel and bundle it
together with the cached PiKernel or in the created sycl::kernel object.
This removes an extra map lookup during enqueue of cached kernels.

@sergey-semenov sergey-semenov temporarily deployed to aws April 6, 2023 12:22 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 6, 2023 12:59 — with GitHub Actions Inactive
Comment on lines 369 to +371
Copy link
Contributor

@aelovikov-intel aelovikov-intel Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto [Kernel, NameToIgnore, ArgMask] = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I prefer the std::tie version since it allows to ignore the value without creating an unused variable.

@sergey-semenov sergey-semenov temporarily deployed to aws April 12, 2023 14:46 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 12, 2023 16:16 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 12, 2023 18:23 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 12, 2023 19:45 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 13, 2023 12:06 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 13, 2023 12:48 — with GitHub Actions Inactive
Retreive kernel argument mask while creating the kernel and bundle it
together with the cached PiKernel or in the created sycl::kernel object.
This removes an extra map lookup during enqueue.
@sergey-semenov sergey-semenov changed the title [DO NOT MERGE][WIP] Test kernel argument mask changes [SYCL] Remove extra map lookup for eliminated kernel arguments Apr 18, 2023
@sergey-semenov sergey-semenov marked this pull request as ready for review April 18, 2023 15:59
@sergey-semenov sergey-semenov temporarily deployed to aws April 18, 2023 18:15 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 18, 2023 20:45 — with GitHub Actions Inactive
Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the fusion JIT look good overall, just one nit.

Could the eliminated arg mask of the fused kernel be attached to the new kernel/CGExecKernel directly, e.g., somewhere around line 812?

@sergey-semenov sergey-semenov temporarily deployed to aws April 19, 2023 20:44 — with GitHub Actions Inactive
@sergey-semenov sergey-semenov temporarily deployed to aws April 20, 2023 11:56 — with GitHub Actions Inactive
@sergey-semenov
Copy link
Contributor Author

Could the eliminated arg mask of the fused kernel be attached to the new kernel/CGExecKernel directly, e.g., somewhere around line 812?

@sommerlukas That probably could be done in the fused kernel case, but it would require a different approach compared to the usual case where the mask is prepared/retrieved during pi_kernel creation (and instead be done earlier during CG creation). This would only remove the initial lookup during caching though, and the focus of this patch is to reduce overhead of repeated submissions of cached kernels, so I'd rather not make this a part of this change.

@sergey-semenov
Copy link
Contributor Author

@intel/llvm-reviewers-runtime Could you please have a look?

@sergey-semenov sergey-semenov temporarily deployed to aws April 20, 2023 13:42 — with GitHub Actions Inactive
@sommerlukas
Copy link
Contributor

Could the eliminated arg mask of the fused kernel be attached to the new kernel/CGExecKernel directly, e.g., somewhere around line 812?

@sommerlukas That probably could be done in the fused kernel case, but it would require a different approach compared to the usual case where the mask is prepared/retrieved during pi_kernel creation (and instead be done earlier during CG creation). This would only remove the initial lookup during caching though, and the focus of this patch is to reduce overhead of repeated submissions of cached kernels, so I'd rather not make this a part of this change.

Ok, makes sense, thanks for the explanation.

Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed changes to jit_compiler.cpp?.

Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


KernelArgMask Result;
for (int I = 0; I < NBytesForSize; ++I)
SizeInBits |= static_cast<std::uint64_t>(Bytes[I]) << I * NBitsInElement;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This op is fairly simple, but a quick glance stumbles on a number of red flags.

What if the ByteArray Bytes has less than 8 elements? Or more? I'm assuming that doesn't happen in practice, but the ByteArray class itself is constructed with a Ptr and a Size.
I know the multiplication takes precedence but it parentheses around I * NBitsInElement might make it clearer.
And the static_cast is just so we can do the I= with no compiler warnings, I presume. But with all these things occurring in one line, a helpful comment might help put the reader at ease.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll follow up on this in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants