-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL] Rework MarkDevice and children #3475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We identified during a previous review that collectKernelAttributes and the MarkDeviceFunction object are in a situation that is very difficult to maintain, so this attempts to fix that by making the ownership model clear.
FYI @jinge90, @smanna12, @premanandrao, and @AaronBallman. I think I fixed a couple of bugs I found as I went through here, but want to see if we break anything later in the process. Note there are a bunch of "TODOs" as I discovered things that we could likely do different/better, but I'd like to see the results of the pre-commit first to see if I break any of this. |
@AaronBallman and @keryell : Note the last patch that adds a test. Seemingly the constexpr-if example already works! |
@@ -10,15 +10,15 @@ | |||
} | |||
|
|||
#else | |||
[[cl::reqd_work_group_size(2, 2, 2)]] void not_direct_two() {} // expected-note {{conflicting attribute is here}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the changes here are because we are doing a DFS instead of a BFS. This obviously does not change the validity of any program.
IMO, the diagnostics are better now, since you are more likely to have the first diagnostic be 'child of the thing that added it' rather than a peer, which seems easier to debug as an end user to me. AND as added benefit, we only end up walking the call graph 1x instead of 2x.
@@ -6,13 +6,11 @@ | |||
|
|||
sycl::queue q; | |||
|
|||
// expected-note@+1{{function implemented using recursion declared here}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we were only able to figure this out by pure happenstance before, there is a bug in SemaSYCL.cpp currently where we decide that the caller of a recursive function is ALSO recursive. This is not a sound assumption, and even in the cases it is true, it doesn't catch all cases.
Fixing that ends up meaning we cannot catch the following example, but this does not change the validity of the program (since we get 1 diagnostic here instead of 2 like previously):
int func1();
int func2() { func1(); }
int func1() { func2(); }
kernel<class Foo>([](){ func1(); });
Because we have to give up on running through diagnostics, we catch that func1
is recursive, but cannot catch func2
, because we don't continue running down the list.
clang/lib/Sema/SemaSYCL.cpp
Outdated
@@ -328,6 +328,8 @@ static void collectSYCLAttributes(Sema &S, FunctionDecl *FD, | |||
SYCLIntelNoGlobalWorkOffsetAttr, SYCLSimdAttr>(A); | |||
}); | |||
|
|||
// TODO: ERICH: We should probably warn on the bottom3 as well and turn this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one I'll probably visit once I get the build-bots to agree I'm OK with my refactor so far. It seems that we should still be warning instead of silently dropping here, but even if that is a problem, I still want to make it so we have much less copy/paste here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, we actually have a number of lit tests that depend on us not diagnosing or removing the bottom 3 (now 4 after merging @zahiraam s patch). I simplified this now, but we can't combine all 4. That is perhaps a future "TODO".
clang/lib/Sema/SemaSYCL.cpp
Outdated
} | ||
} | ||
} | ||
// TODO: ERICH: This likely needs a better name and documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions welcome :)
clang/lib/Sema/SemaSYCL.cpp
Outdated
WorkList.pop_back(); | ||
if (!Visited.insert(FD).second) | ||
continue; // We've already seen this Decl | ||
// TODO: is this too clever? Should this just be a called function? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think here. See MarkDevices for how this is used.
static void PropagateAndDiagnoseDeviceAttr(Sema &S, Attr *A, | ||
FunctionDecl *SYCLKernel, | ||
FunctionDecl *KernelBody) { | ||
switch (A->getKind()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled the attribute propagation into its own function here, MarkDevice/MarkDevices was just plain unwieldy as a result. Most of this is a copy/paste, except it is no longer a member function and has a different tab depth.
} | ||
|
||
void Sema::MarkDevices(void) { | ||
// This Tracker object ensures that the SyclDeviceDecls collection includes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the entry-point here. The idea is that Tracker keeps track of the info for the entire TU, ensures that all SYCL_EXTERNAL functions are here, and ensures diagnostics happen on destruction.
// This type does the actual analysis on a per-kernel basis. It does this to | ||
// make sure that we're only ever dealing with the context of a single | ||
// kernel at a time. | ||
SingleDeviceFunctionTracker T{Tracker, SYCLKernel}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found we were constantly doing a set/reset of the collections owned by the parent when we were at a point where we only cared about the current kernel. This object does the analysis work, and just rejoins its information into the parent collection when it is done.
Alright, I think this is ready for review now (as am I). Thanks in advance all! |
SYCLIntelFPGADisableLoopPipeliningAttr, | ||
SYCLIntelFPGAInitiationIntervalAttr>(A); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erich, can you explain what is happening with this copy_if? How are these attributes getting excluded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is copying these ONLY IF this is a 'directly-called'. These attributes are all the set from the 'old' line 575, which only added them if they were 'directly called' from the kernel. They get copied if, for instance, they are the attribute on the lambda.
clang/lib/Sema/SemaSYCL.cpp
Outdated
DeviceFunctionTracker Tracker(*this); | ||
|
||
for (Decl *D : syclDeviceDecls()) { | ||
FunctionDecl *SYCLKernel = cast<FunctionDecl>(D); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this 'auto'?
|
||
[[intel::max_work_group_size(1, 1, 1)]] // expected-note {{conflicting attribute is here}} | ||
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the three conflicts? If I squint, I can see may be two :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the test output:
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:52:34: error: conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function
parallel_for<class KernelName>([]() {}); // expected-error 3 {{conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:15:3: note: conflicting attribute is here
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:13:3: note: conflicting attribute is here
[[cl::reqd_work_group_size(2, 2, 2)]] void not_direct_two() {} // expected-note 2 {{conflicting attribute is here}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:52:34: error: conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function
parallel_for<class KernelName>([]() {}); // expected-error 3 {{conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:15:3: note: conflicting attribute is here
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:21:3: note: conflicting attribute is here
[[cl::reqd_work_group_size(4, 4, 4)]] // expected-note 1 {{conflicting attribute is here}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:52:34: error: conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function
parallel_for<class KernelName>([]() {}); // expected-error 3 {{conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:15:3: note: conflicting attribute is here
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:13:3: note: conflicting attribute is here
[[cl::reqd_work_group_size(2, 2, 2)]] void not_direct_two() {} // expected-note 2 {{conflicting attribute is here}}
It conflicts with 'not_direct_two', func_three, then not-direct-2 does again. This is a side-effect of us not removing these attributes and leaving them in place. This is consistent with the existing implementation, except in this case we hit it 2x (instead of the others conflicting with eachother) due to the Depth-first nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How awful would it be to also make this more const
correct given that you're reworking much of it anyway?
Can you point out some examples of where you mean? I tried to add FYI: just building a patch to fix the comments you mentioned, so I didn't just resolve them for no reason :) |
clang/lib/Sema/SemaSYCL.cpp
Outdated
MarkDeviceFunction(Sema &S) | ||
: RecursiveASTVisitor<MarkDeviceFunction>(), SemaRef(S) {} | ||
DiagDeviceFunction( | ||
Sema &S, const llvm::SmallPtrSetImpl<FunctionDecl *> &RecursiveFuncs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping we could make RecursiveFuncs
hold const FunctionDecl *
because nothing mutates the functions once they get passed in here. However, I see why that's really annoying (because we take the container rather than using a more general interface with iterators), so I suppose this is as const correct as we're likely to get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, DeviceFunctions cannot be const I think, but the RecursiveFuncs I can make const. I'll do that, thanks!
if (Existing->getXDimVal(Ctx) != RWGSA->getXDimVal(Ctx) || | ||
Existing->getYDimVal(Ctx) != RWGSA->getYDimVal(Ctx) || | ||
Existing->getZDimVal(Ctx) != RWGSA->getZDimVal(Ctx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't specific to your review, but I think there's a bug here -- we're not checking to see how this attribute is spelled and so we may be comparing things we don't expect to compare. e.g., the OpenCL spelling for this attribute has argument order X, Y, Z and the SYCL spelling has argument order Z, Y, X.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol... that is depressing. I just moved this code though.
Are we sure that the SemaDeclAttr part doesn't correct the order of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why this is an existing bug. I'm pretttttty sure that SemaDeclAttr isn't correcting the order of those because we need to keep the order sensible for AST dumping, pretty printing, etc. IIRC, we tried that route but had to roll it back. @smanna12 may know more (or be able to look into whether this is a real issue or just me being paranoid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a bug with the existing implementation. I plan to refactor this attribute (separate SYCL vs OpenCL spelling).
} | ||
case attr::Kind::SYCLIntelMaxWorkGroupSize: { | ||
auto *SIMWGSA = cast<SYCLIntelMaxWorkGroupSizeAttr>(A); | ||
if (auto *Existing = SYCLKernel->getAttr<ReqdWorkGroupSizeAttr>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar concern here about not caring about the way the attribute was spelled.
clang/lib/Sema/SemaSYCL.cpp
Outdated
case attr::Kind::SYCLIntelFPGADisableLoopPipelining: | ||
case attr::Kind::SYCLIntelFPGAInitiationInterval: | ||
case attr::Kind::SYCLSimd: { | ||
if ((A->getKind() == attr::Kind::SYCLSimd) && KernelBody && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit messy -- put this logic in the attr::Kind::SYCLSimd
case, then use LLVM_FALLTHROUGH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof... i didn't notice that when moving this. THAT is simple enough and offensive enough I'll fix it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think this generally LGTM. We can address the open question about reqd_work_group_size
behavior in follow-ups.
Last time I use the web app for this :)
Only change with the merge was the conflict with the recursion becoming an error (so a different message). This needs re-review, but the change is quite minimal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am good with the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for this cleanup!
/summary:run |
Still need to do the code gen, but that is broken at least until the rebase on intel#3475, which is awaiting merge.
* upstream/sycl: (39 commits) [CI] Switch to default clang-format version. (intel#3540) [Driver][NFC] Cleanup some option setting for SYCL offload (intel#3542) [GitHub Actions] Update main branch sync schedule [SYCL][NFC] Fix potential namespace conflicts with PSTL in tuple.hpp (intel#3541) [SYCL] Bump sycl library minor version (intel#3538) [SYCL][CUDA] Implemented cuda_piextUSMEnqueueMemAdvise (intel#3365) [SYCL][FPGA] Add mutual diagnostic of max_concurrency attribute in conjunction of disable_loop_pipelining attribute (intel#3512) [SYCL] [MATRIX] Enable joint_matrix_load, joint_matrix_store, and joint_matrix_mad for AMX (intel#3503) [ESIMD] Skip rewriting functions used through function pointers (intel#3527) [SYCL] Fix address space for spec constants buffer (intel#3521) [SYCL] Correct the tablegen for checking mutually exclusive stmt attrs (intel#3519) [SYCL][PI][L0][NFC] Refactor setting of LastCommandEvent (intel#3528) [SYCL] Fix group local memory sharing issue (intel#3489) [SYCL][NFC] Fix post-commit failure (intel#3532) [SYCL][Doc] Remove extension mechanism (intel#3526) [SYCL] Move sycl.hpp in install directory and adjust driver to match (intel#3523) [SYCL][ESIMD] Update ESIMD docs to address recent user comments: (intel#3516) [NFCI][SYCL] Correct -fdeclare-spirv-builtins to use marshalling (intel#3515) [SYCL] Rework MarkDevice and children (intel#3475) [SYCL] Fix StringLiteral Ctor issue from intel#3504. (intel#3520) ...
We identified during a previous review that collectKernelAttributes and
the MarkDeviceFunction object are in a situation that is very difficult
to maintain, so this attempts to fix that by making the ownership model
clear.