-
Notifications
You must be signed in to change notification settings - Fork 1
[SYCL] Documentation for the proposed design for kernel argument decomposition. #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
A rendered version of the proposed documentation can be viewed at https://github.com/tahonermann/llvm-project/blob/sycl-upstream-fe-decomposition-design/clang/docs/SYCLSupport.rst#kernel-argument-validation-and-decomposition. |
``P`` is a class type, and ``sycl::is_device_copyable_v<P>`` is true, ``P`` shall | ||
satisfy the constraints listed in | ||
`section 3.13.1, "Device copyable" <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec::device.copyable>`__. | ||
``A`` is passed as a bit-copyable argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a case like this:
struct dc {
dc(const dc& other) {/*... not trivial ...*/}
};
template<>
struct sycl::is_device_copyable<dc> : std::true_type {};
struct udt {
float f;
dc d;
};
Do you expect this rule to apply to struct udt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule that governs udt
is the earlier non-union aggregate case. In this case, udt
has no subobjects that require decomposition and is therefore bit-copyable. dc
is presumed to satisfy the device copyable rules because sycl::is_device_copyable_v<dc>
is true (presumed because the copy constructor cannot be confirmed to perform a bitwise copy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is your intent, then I think rule 2 needs to be clarified somehow. There is no mention in that rule about sub-object types that are marked is_device_copyable
. For that matter, there is no mention in that rule about sub-object types that are (or are not) trivially copyable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those cases fall into the "Otherwise ..." portion of the rule. I'll try to add some clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify this. Please review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this is confusing. Rule 4 is very detailed about what constitutes "device copyability" (e.g. it talks about the SYCL_DEVICE_COPYABLE
macro, etc.) However, rule 2 just says "All subobjects that do not require decomposition shall be device copyable", with no further explanation about what "device copyable" means. This gives the impression that the macro does not apply to rule 2, while it does apply to rule 4.
I actually think it would be clearer to omit the details about when an object is "device copyable" -- these details are in the SYCL specification. Thus, I think you could simplify rule 4.
In fact, I think rules 3, 4, and 6 could all be combined into a single rule that just says:
Otherwise, if
P
is a device copyable type, thenA
is passed as a bit-copyable argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing these, I was thinking of things from an implementer perspective and, in particular, what type checking needs to be done in which order. That is why I separated the checks for trivial copyability and the sycl::is_device_copyable
trait. From a specification perspective, what you suggest makes sense.
However, given recent discussion, I think we'll be pursuing a different approach. I have a proto-POC that provides the capabilities that Andrei is requesting. If discussions with him go well, I'll submit a new PR with documentation for that approach.
clang/docs/SYCLSupport.rst
Outdated
}; | ||
k(); | ||
// Destructors runs for 'k', 'k.st', 'k.a.sta[0]', and 'k.a.sta[1]'. | ||
// Destructors run for 'st2', 'asta1_2', and 'asta2_2'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of the argument types was explicitly marked "device copyable", would you still generate a call to its destructor here? I think you said in chat that you would, with the rationale that the SYCL spec says that it is the user's responsibility to ensure that the destructor has no effect when run on the device (section 3.13.1 "Device copyable"). I agree that this is what the spec says.
@rolandschulz do you think this is what people expect when they mark a type as device copyable? Or, do they expect that the compiler will not generate a call to the destructor in device code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I guess it would actually be up to the calling convention. If the convention is callee-destroy, then destructors for parameters will be called when the function returns. Since there is no traditional caller for the invocation of the entry point, then destructors would not be called for a caller-destroy convention. Perhaps I should change the comment to "Destructors may run" for parameters.
Eliding calls to destructors for objects that are not decomposed and passed as their own argument might be possible, but would be surprising in my opinion. In this example, k
is illustrated as a local variable and I would expect implementation to do likewise. Eliding destructor calls might be problematic; see #53 for an example of a crash that Mariya fixed by adding missing cleanup handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a note on that in the spec "Likewise, it is unspecified whether the implementation actually calls the destructor for such a class on the device since the destructor must have no effect on the device.". If I recall the discussions correctly we did that to allow either implementation. My understanding is that #53 shouldn't be needed because it should only fix a problem for wrong code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I had mentioned that note somewhere, but I don't see it now. It probably is worth explicitly mentioning somewhere.
I don't think there is anything wrong with the test case added for that issue (aside from a missing specialization of sycl::is_device_copyable<DCopyable>
). The relevant type is declared as:
struct DCopyable {
int i;
~DCopyable();
};
The requirements in section 3.13.1 don't require a trivial destructor. The requirement is that the destructor is public, not defined as deleted, and "has no effect when executed on the device".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more specificity around destructors and added a reference to 3.13.1 regarding it being unspecified if they are called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes made to address review comments. Please review again.
``P`` is a class type, and ``sycl::is_device_copyable_v<P>`` is true, ``P`` shall | ||
satisfy the constraints listed in | ||
`section 3.13.1, "Device copyable" <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec::device.copyable>`__. | ||
``A`` is passed as a bit-copyable argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify this. Please review.
clang/docs/SYCLSupport.rst
Outdated
}; | ||
k(); | ||
// Destructors runs for 'k', 'k.st', 'k.a.sta[0]', and 'k.a.sta[1]'. | ||
// Destructors run for 'st2', 'asta1_2', and 'asta2_2'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more specificity around destructors and added a reference to 3.13.1 regarding it being unspecified if they are called.
No description provided.