-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Design doc: optional kernel features #3781
[SYCL][Doc] Design doc: optional kernel features #3781
Conversation
First public review of the design document to conform to the SYCL 2020 "optional kernel features" behavior. This mostly covers SYCL 2020 section 5.7 "Optional kernel features", but it also covers the design for the `[[sycl::requires()]]` attribute. Since a SPIR-V extension is proposed as part of this design, this commit also includes a proposed extension to SPIR-V for enabling code conditionally based on specialization constants. The SPIR-V extension is more general than required for the "optional kernel features" design because it also includes functionality that will be required for the SYCL_EXT_ONEAPI_DEVICE_IF extension to DPC++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed up until #### Representation in SPIR-V
Co-authored-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com> Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
Clarify wording about when `errc::kernel_not_supported` is thrown.
Expand the design to include the case when device functions are exported from a shared library, which is a new feature proposed in intel#3210.
Address a code review comment asking how we can decorate `atomic_ref` with `[[sycl::requires()]]` only when the type is 8 bytes. Add an appendix showing how this can be done using partial specialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only looked at OptionalDeviceFeatures.md
file, will review for the proposed SPIR-V extension later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial SPIR-V extension review, will do at least one more iteration later. So far a lot of questions to get better understanding of intended behavior in some corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience with me as this fell off my radar until @elizabethandrews pinged me on it. I've added some thoughts, but I may have more as this percolates in my brain.
* Clarify that the initial short list of requirements will be explained in detail later. * Clarify what "static call tree" means when there are function pointers. * Post-link tool now diagnoses an error if a `SYCL_EXTERNAL` function is incorrect decorated with `[[sycl::requires()]]`. * Use `OpSpecConstantFalse` instead of `OpSpecConstantTrue`.
* Clarify that the new "SYCL/requirements" property list has properties of type `BYTE_ARRAY`, which is one of the existing property types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from DPC++ Tools point of view
Alexey Sachkov proposed an optimization for comparing the types in each IR instruction with the `!intel_types_that_use_aspects` set. Capture this in a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the latest draft of the document and it LGTM. @erichkeane may have additional questions or feedback, however.
Rename the property set from "SYCL/image-requirements" to "SYCL/device-requirements" to avoid confusion with the SYCL types `unsampled_image` and `sampled_image`.
We are planning to change `[[sycl::requires()]]` in the SYCL 2020 spec to omit the `has()` clause, which means the parameter to that attribute will just be a list of aspects. Make the same change to `[[sycl::uses_aspects()]]` in this design document. Also add a note that we plan to change the name of `[[sycl::requires()]]` to `[[sycl::device_has()]]`. We can update this design document again once that change is adopted into the SYCL 2020 spec.
@AlexeySachkov, @kbobrovs, @AaronBallman: Requesting re-approval since my recent commits dismissed your previous approvals. @erichkeane: Should I expect review from you as well, or is @AaronBallman covering for CFE? |
This has fallen pretty deeply on my list of things to do unfortunately, but if @AaronBallman and @elizabethandrews are happy, then so am I. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
In that case, asking also for an approval from @elizabethandrews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DPC++ FE requirements LGTM
exception must be raised synchronously from the kernel invocation command | ||
(e.g. `parallel_for()`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor
exception must be raised synchronously from the kernel invocation command | |
(e.g. `parallel_for()`). | |
exception must be raised synchronously from the kernel submission command | |
(e.g. `submit()`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement is from the SYCL 2020 specification:
Of course, applications that make use of optional kernel features should ensure that a kernel using such a feature is submitted only to a device that supports the feature. If the application submits a command group using a secondary queue, then any kernel submitted from the command group should use only features that are supported by both the primary queue’s device and the secondary queue’s device. If an application fails to do this, the implementation must throw a synchronous exception with the
errc::kernel_not_supported
error code from the kernel invocation command (e.g.parallel_for()
).
In general, our philosophy is to diagnose runtime errors by throwing an exception from the API that causes the error whenever this is reasonably possible. We think this makes it easier for applications to know the cause of a failure. For example, if the exception is thrown from the submit()
call, it might not be clear if the error is from paralle_for()
or from constructing an accessor, etc.
Is there a reason why you think the exception should be delayed and thrown from submit()
rather than from parallel_for()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you think the exception should be delayed and thrown from submit() rather than from parallel_for()?
It's related to how things are currently implemented. Currently we record everything that happens in the CG lambda(like "accessor created", "kernel execution requested") doing only basic API related checks, then when CG is executed we send this request to the scheduler component which may or may not construct low-level handles and enqueue low-level operations immediately(it depends on the situation). Most of the errors coming from the last step.
Also there are complains that queue::submit
call is taking too much time of the calling thread, so eventually we might want to move as much as we can to a separate(dispatcher) thread. If we did so we would not be able to throw synchronous exceptions for most of things that currently happen during queue::submit
anymore.
For example, if the exception is thrown from the submit() call, it might not be clear if the error is from paralle_for() or from constructing an accessor, etc.
CG lambda can have only one "action"(kernel execution or memcpy or ...), the exception "errc::kernel_not_supported" is quite specific, so I think it would be quite hard to not understand that it happens because queue::submit
asks to execute an "incompatible" kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CG lambda can have only one "action"(kernel execution or memcpy or ...), the exception "errc::kernel_not_supported" is quite specific, so I think it would be quite hard to not understand that it happens because queue::submit asks to execute an "incompatible" kernel.
I agree that it would be clear in this case. However, there are other exceptions that could be thrown in the "submit" function, especially related to accessors. The SYCL spec expects these to be thrown from the API that causes the error (e.g. from the accessor
constructor if the parameters to the accessor are invalid), not delayed and thrown from the queue::submit()
function. These exceptions would not be clear if they were delayed because the submit function could construct many accessors.
Does DPC++ currently delay these exceptions also until the kernel submission function returns?
Also there are complains that queue::submit call is taking too much time of the calling thread, so eventually we might want to move as much as we can to a separate(dispatcher) thread.
I'm curious about what we do now in the calling thread. Does the JIT compilation happen currently in the calling thread, or is this already scheduled on a separate thread? Note that we were careful to design the errc::kernel_not_supported
exception such that it could be diagnosed in the calling thread even before attempting to JIT compile the kernel, thus allowing the JIT compilation to happen on a different thread.
Note that the exception must be thrown synchronously, not delayed and thrown on | ||
the queue's asynchronous handler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify why this is a requirement?
In the current implementation it's easy to throw a synchronous exception, but it can be a
restriction for the implementation in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my response to your other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the question about the exception type can be discussed separately.
@gmlueck I can merge it now. Would you like to revisit and update PR description or should it be used for squashed commit message? |
I'd like to change the comment you use for the squashed commit. How should I do that? Should I change the description of this PR, or type a new commit message in a comment for you to use, or something else? |
@pvchupin: I updated the PR description, so this is ready to merge now. |
Design document to conform to the SYCL 2020 "optional kernel feature"
behavior. This mostly covers SYCL 2020 section 5.7 "Optional kernel
features", but it also covers the design for the
[[sycl::requires()]]
attribute.