-
Notifications
You must be signed in to change notification settings - Fork 796
[SYCL] Remove extra map lookup for eliminated kernel arguments #8958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto [Kernel, NameToIgnore, ArgMask] = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I prefer the std::tie version since it allows to ignore the value without creating an unused variable.
Retreive kernel argument mask while creating the kernel and bundle it together with the cached PiKernel or in the created sycl::kernel object. This removes an extra map lookup during enqueue.
7c837d2 to
8fd348d
Compare
sommerlukas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the fusion JIT look good overall, just one nit.
Could the eliminated arg mask of the fused kernel be attached to the new kernel/CGExecKernel directly, e.g., somewhere around line 812?
@sommerlukas That probably could be done in the fused kernel case, but it would require a different approach compared to the usual case where the mask is prepared/retrieved during pi_kernel creation (and instead be done earlier during CG creation). This would only remove the initial lookup during caching though, and the focus of this patch is to reduce overhead of repeated submissions of cached kernels, so I'd rather not make this a part of this change. |
|
@intel/llvm-reviewers-runtime Could you please have a look? |
Ok, makes sense, thanks for the explanation. |
sommerlukas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reviewed changes to jit_compiler.cpp?.
cperkinsintel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| KernelArgMask Result; | ||
| for (int I = 0; I < NBytesForSize; ++I) | ||
| SizeInBits |= static_cast<std::uint64_t>(Bytes[I]) << I * NBitsInElement; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This op is fairly simple, but a quick glance stumbles on a number of red flags.
What if the ByteArray Bytes has less than 8 elements? Or more? I'm assuming that doesn't happen in practice, but the ByteArray class itself is constructed with a Ptr and a Size.
I know the multiplication takes precedence but it parentheses around I * NBitsInElement might make it clearer.
And the static_cast is just so we can do the I= with no compiler warnings, I presume. But with all these things occurring in one line, a helpful comment might help put the reader at ease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll follow up on this in another PR.
Retreive kernel argument mask while creating the kernel and bundle it
together with the cached PiKernel or in the created sycl::kernel object.
This removes an extra map lookup during enqueue of cached kernels.