-
Notifications
You must be signed in to change notification settings - Fork 801
[WIP] No handler submit #18842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
[WIP] No handler submit #18842
Conversation
sycl/source/detail/queue_impl.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we template this on the number of dimensions, like you did for queue::submit_no_handler?
I think that would allow us to get rid of detail::NDRDescT and just pass the nd_range<D> all the way down the stack. We could get rid of padId and padRange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review. It is a good point, that there is potential for optimization related to NDRDescT. Currently, the enqueueImpKernel scheduler function takes it as an argument, and applies some transformations, so I left it as is. I think there is work in progress to optimize this, and once it is complete, I can update the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. We have so much instance of ugliest code duplication added with a promise to refactor once design settles. Please do the right thing from the start.
sycl/include/sycl/queue.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How feasible do you think it is to basically inline submit_no_handler_impl here?
This is definitely a huge improvement over what we had before, but we're still taking compile-time information (the kernel name, kernel parameters, etc) and turning it into run-time information that crosses over into libsycl.so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I'm not sure if we can inline this. The public header doesn't have access to the internal queue_impl object, also at some point the templated parameters need to be transformed into run-time ones. Do you have any suggestions how to handle those parameters more efficiently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to go through these and figure out what has to become a runtime parameter and why, and consider changing the interface to the impl object. My understanding is that some of the original decisions that were made for handler were motivated by design issues that no longer apply.
For example, I think the KernelName and KernelParamDescGetter here had to be stored as runtime information before, because the type information couldn't possibly be stored between the call to parallel_for and the deferred submission inside handler::finalize(). And since the information had to be stored in a runtime variable, there was no reason not to push the functions using that information into the impl.
Things are different with the new design, because we can submit directly to the backend without deferring anything. I think that means we can lift things out of the impl and back into the headers, where it makes sense to do so.
I haven't done an exhaustive analysis, but here's an example of the sort of thing we should do, focusing on KernelName:
- We convert
KernelTypetoKernelNamehere only so we can eventually pass it toenqueueImpKernel. enqueueImpKernelonly needs theKernelNameto find the rightur_kernel_handle_thandle.- If
enqueueImpKernel(or a new version of it) accepted aur_kernel_handle_tinstead of aKernelName, we could:- Use
KernelTypeto jump straight to aKernelNameBasedCacheT; - Look up the right
ur_kernel_handle_t; and then - Pass the
ur_kernel_handle_ttolibsycl.so.
- Use
...which would bypass the need to create a std::string or do any string operations. (We could also choose to pass just the KernelNameBasedCacheT*, instead of a ur_kernel_handle_t).
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that means we can lift things out of the impl and back into the headers, where it makes sense to do so.
ABI stability is another major reason why we want everything in impl and not in the sycl::__class__ objects. I haven't looked into the PR yet, so I'd just say that "lifting" code is fine, lifting data is likely not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and I'm not going to propose that we do anything other than follow our existing ABI break policy.
All I meant was that I expect there are going to be many instances where something doesn't need to be stored at all in the new design. We should take advantage of that to do as much as possible in the body of parallel_for, and pass the minimum amount of information through to the impl.
For example, instead of passing a function pointer through to the impl object to set arguments:
detail::kernel_param_desc_t (*KernelParamDescGetter)(int) = &(detail::getKernelParamDesc<KernelName>);
...
enqueueImpKernel(..., KernelParamDescGetter, ...);We could split enqueueImpKernel into a few separate functions and move anything type-dependent back into the header, something like:
// Interaction with the scheduler is probably too complicated to put in the headers.
enqueueImpKernelSchedulerParts(...);
// We have the KernelName type info here, so leverage it.
ur_kernel_handle_t Kernel = detail::getKernelNameBasedCache<KernelName>()->getKernel();
// Doing this here instead of in impl allows the compiler to unroll and optimize away the branches.
// (Because we know everything about the kernel at compile-time.)
for (int i = 0; i < detail::getKernelNumParams<KernelName>(); ++i) {
switch (detail::getKernelParamDesc<KernelName>(i))
{
case accessor:
enqueueImpKernelSetAccessorArg(Kernel, i, ...);
case std_layout:
enqueueImpKernelSetArg(Kernel, i, ...);
}
}
enqueueImpKernelSubmit(Kernel);Kernel submission is very complicated, especially in the old handler path, so I'm not sure exactly what the new path has to look like. But hopefully this gives you a bit more insight into where I'm coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as I said, I agree with "lifting code" :)
f0422df to
161bef1
Compare
161bef1 to
d016855
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we call it queue::launch_grouped?
sycl/include/sycl/queue.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this live outside queue directly in sycl::detail? If private queue access is an issue, then maybe
struct LaunchUtils {
static blah-blah(...) {}
};
class queue {
friend struct LaunchUtils;
}
sycl/source/detail/queue_impl.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this should be a single argument. Ideally, renamed KerneNameBasedCachePtr->TypeErasedKernelInfo. I've had a discussion with @sergey-semenov about this, he was going to prototype that change. Can you sync with him about this?
sycl/source/detail/queue_impl.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. We have so much instance of ugliest code duplication added with a promise to refactor once design settles. Please do the right thing from the start.
sycl/source/detail/queue_impl.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the event is not ready, I think getHandle() can return NULL (if the command is a host task or not yet enqueued yet)
d016855 to
4405fe8
Compare
4405fe8 to
082e2c5
Compare
No description provided.