-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Design specification for "device global" feature #4686
Conversation
Address review comments from @GarveyJoe.
Update the design to align with the latest version of the extension API spec. Also add specifications for underlying changes to SPIR-V and OpenCL.
Update names and wording to make it clear the the new SPIR-V decorations apply to all consumers, even though they are mostly useful for FPGA.
Add some final clarifications to the design doc before creating PR. Rename properties to align with latest review comments against API spec.
Update the unresolved issue about diagnosing invalid `device_global` declarations to reference the PR against the API spec that describes exactly what these restrictions are.
In Design Spec in an Open you talk about avoid creating unique SPIR-V properties for each of the 3 properties listed, and leveraging some common mechanism. Are the decorations listed in the SPIR-V spec just there until that open gets resolved? |
No, I was not thinking that the SPIR-V decorations would change. Rather, I was wondering if there is some common infrastructure for propagating the properties from source code, through LLVM IR, and then result in the SPIR-V decorations that are specified. |
@AlexeySachkov, @kbobrovs: you may also be interested in this design review. (I know that @romanovvlad has already seen it.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the approach is quite complicated, but provided the API the extension defines I cannot suggest anything better. The only thing is that we could try to get rid of need in low level runtimes extensions by generating special kernel which do the same what suggest extension APIs do, but it might be less performant solution.
We decided that we need some alternate way for the host to reference a global variable in the kernel other than using the exported SPIR-V name. Extend the `HostAccessINTEL` decoration to take a new `Name` parameter. This allows the host to access even a non-exported `OpVariable`. Changes are still needed to the "DeviceGlobal.md" spec to explain how this new field will be set by the compiler. Those changes will be made in a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I have some preliminary comments. Haven't taken a look at unresolved issues yet. Will do so shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl-post-link
section LGTM
Resolve the remaining open issues and address review comments: * Describe how compile-time properties are handled, using the new compile-time property design document as a basis. * Address remaining open issues, mostly with the front-end. * Address review comments that have been made so far.
Remove some extraneous commentary about the `device_image_scope` property. Details of that property are describe more thoroughly in the extension API specification.
Clarify the backend requirements for copying to / from a module scope (global) variable.
My latest commits addressed all the open issues. I'm looking especially for a review from the front-end team (@elizabethandrews or @premanandrao). @AlexeySachkov: there were a few updates to the @romanovvlad: there are also a few updates to the runtime, so please review that also. @bashbaug: can you (or someone you delegate) review the OpenCL and SPIR-V specs? I believe the changes here are all in line with what we discussed earlier. |
Since PR intel#4697 was merged, there is no need to point to that PR. Instead, just point to the extension specification for device global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPIR-V and OpenCL parts LGTM.
sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc
Outdated
Show resolved
Hide resolved
This addresses comments from the FPGA team.
f0d9512
sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc
Outdated
Show resolved
Hide resolved
@romanovvlad, @elizabethandrews, and @bashbaug: would you mind re-approving? I think the only changes since your last approval were a few tweaks to the SPIR-V spec (f0d9512 and 76b559c). @AlexeySachkov: Could you also re-approve? There were a few changes to the sycl-post-link tool since your last approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl-post-link
part LGTM
Assuming that no error diagnostic is issued, the `sycl-post-link` tool includes | ||
the IR definition of each `device_global` variable in the modules that | ||
reference that variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck, I've just realized that we might want to be more precise here. In particular, at the moment sycl-post-link
clones all global variables into every produced device image:
llvm/llvm/tools/sycl-post-link/sycl-post-link.cpp
Lines 480 to 485 in 7ae7ca8
// It's not easy to trace global variable's uses inside needed functions | |
// because global variable can be used inside a combination of operators, so | |
// mark all global variables as needed and remove dead ones after cloning. | |
for (const auto &G : M.globals()) { | |
GVs.insert(&G); | |
} |
I assume that could be a problem, if we later produce device image properties with unused globals. At least we will waste some memory and probably compute time. What do you think?
P.S. I also don't think that it would be so complex to trace global variable usages if we don't need to differentiate types of the uses (i.e. reads vs writes). Traversing Use -> User chain until we encounter a function for which we already know the list of device images it will be cloned into should be relatively straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be correct to include all global variables in all modules. As you say, however, this will be wasteful. For example, consider some very large device global that is referenced by only one kernel. It would waste device memory to include this variable in all modules. This might be particularly problematic on FPGA, where device memory is more limited.
What about the current wording in the design doc should be made more precise? That wording already says that the IR for a device global should only be copied to a module if the module references the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the current wording in the design doc should be made more precise? That wording already says that the IR for a device global should only be copied to a module if the module references the variable.
Maybe I'm misreading it, but I don't seem to find "should only be copied" part, i.e. I don't see the part which restricts sycl-post-link
from doing extra copies.
It should be correct to include all global variables in all modules. As you say, however, this will be wasteful. For example, consider some very large device global that is referenced by only one kernel. It would waste device memory to include this variable in all modules. This might be particularly problematic on FPGA, where device memory is more limited.
Ok, thanks for confirming. I think this conversation can be marked as resolved
@bader: I think all stakeholders have approved. Can this be merged? |
We recently added more checks to pre-commit validation and I'd like to make sure that this patch doesn't break documentation build. I suppose we need to add new sycl/doc/DeviceGlobal.md to sycl/doc/index.rst. Current CI checks results are almost one month old and do not cover documentation build. |
2581f60
I did the merge and added "DeviceGlobal.md" to the "index.rst". There didn't seem to be any obvious order to the design documents in the index, so I put it last. |
Add a design specification for the DPC++ extension API
SYCL_EXT_ONEAPI_DEVICE_GLOBAL
. This includes proposed SPIR-V and OpenCL extensions which are required for the implementation of this DPC++ feature.