Skip to content

[SYCL] Rework MarkDevice and children #3475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 9, 2021
Merged

Conversation

erichkeane
Copy link
Contributor

We identified during a previous review that collectKernelAttributes and
the MarkDeviceFunction object are in a situation that is very difficult
to maintain, so this attempts to fix that by making the ownership model
clear.

We identified during a previous review that collectKernelAttributes and
the MarkDeviceFunction object are in a situation that is very difficult
to maintain, so this attempts to fix that by making the ownership model
clear.
@erichkeane
Copy link
Contributor Author

FYI @jinge90, @smanna12, @premanandrao, and @AaronBallman. I think I fixed a couple of bugs I found as I went through here, but want to see if we break anything later in the process. Note there are a bunch of "TODOs" as I discovered things that we could likely do different/better, but I'd like to see the results of the pre-commit first to see if I break any of this.

@erichkeane
Copy link
Contributor Author

@AaronBallman and @keryell : Note the last patch that adds a test. Seemingly the constexpr-if example already works!

@@ -10,15 +10,15 @@
}

#else
[[cl::reqd_work_group_size(2, 2, 2)]] void not_direct_two() {} // expected-note {{conflicting attribute is here}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the changes here are because we are doing a DFS instead of a BFS. This obviously does not change the validity of any program.

IMO, the diagnostics are better now, since you are more likely to have the first diagnostic be 'child of the thing that added it' rather than a peer, which seems easier to debug as an end user to me. AND as added benefit, we only end up walking the call graph 1x instead of 2x.

@@ -6,13 +6,11 @@

sycl::queue q;

// expected-note@+1{{function implemented using recursion declared here}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we were only able to figure this out by pure happenstance before, there is a bug in SemaSYCL.cpp currently where we decide that the caller of a recursive function is ALSO recursive. This is not a sound assumption, and even in the cases it is true, it doesn't catch all cases.

Fixing that ends up meaning we cannot catch the following example, but this does not change the validity of the program (since we get 1 diagnostic here instead of 2 like previously):

int func1();
int func2() { func1(); }
int func1() { func2(); }
kernel<class Foo>([](){ func1(); });

Because we have to give up on running through diagnostics, we catch that func1 is recursive, but cannot catch func2, because we don't continue running down the list.

@@ -328,6 +328,8 @@ static void collectSYCLAttributes(Sema &S, FunctionDecl *FD,
SYCLIntelNoGlobalWorkOffsetAttr, SYCLSimdAttr>(A);
});

// TODO: ERICH: We should probably warn on the bottom3 as well and turn this
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one I'll probably visit once I get the build-bots to agree I'm OK with my refactor so far. It seems that we should still be warning instead of silently dropping here, but even if that is a problem, I still want to make it so we have much less copy/paste here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, we actually have a number of lit tests that depend on us not diagnosing or removing the bottom 3 (now 4 after merging @zahiraam s patch). I simplified this now, but we can't combine all 4. That is perhaps a future "TODO".

}
}
}
// TODO: ERICH: This likely needs a better name and documentation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions welcome :)

WorkList.pop_back();
if (!Visited.insert(FD).second)
continue; // We've already seen this Decl
// TODO: is this too clever? Should this just be a called function?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think here. See MarkDevices for how this is used.

static void PropagateAndDiagnoseDeviceAttr(Sema &S, Attr *A,
FunctionDecl *SYCLKernel,
FunctionDecl *KernelBody) {
switch (A->getKind()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the attribute propagation into its own function here, MarkDevice/MarkDevices was just plain unwieldy as a result. Most of this is a copy/paste, except it is no longer a member function and has a different tab depth.

}

void Sema::MarkDevices(void) {
// This Tracker object ensures that the SyclDeviceDecls collection includes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the entry-point here. The idea is that Tracker keeps track of the info for the entire TU, ensures that all SYCL_EXTERNAL functions are here, and ensures diagnostics happen on destruction.

// This type does the actual analysis on a per-kernel basis. It does this to
// make sure that we're only ever dealing with the context of a single
// kernel at a time.
SingleDeviceFunctionTracker T{Tracker, SYCLKernel};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found we were constantly doing a set/reset of the collections owned by the parent when we were at a point where we only cared about the current kernel. This object does the analysis work, and just rejoins its information into the parent collection when it is done.

@erichkeane erichkeane marked this pull request as ready for review April 6, 2021 13:40
@erichkeane
Copy link
Contributor Author

Alright, I think this is ready for review now (as am I). Thanks in advance all!

SYCLIntelFPGADisableLoopPipeliningAttr,
SYCLIntelFPGAInitiationIntervalAttr>(A);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erich, can you explain what is happening with this copy_if? How are these attributes getting excluded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is copying these ONLY IF this is a 'directly-called'. These attributes are all the set from the 'old' line 575, which only added them if they were 'directly called' from the kernel. They get copied if, for instance, they are the attribute on the lambda.

DeviceFunctionTracker Tracker(*this);

for (Decl *D : syclDeviceDecls()) {
FunctionDecl *SYCLKernel = cast<FunctionDecl>(D);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this 'auto'?


[[intel::max_work_group_size(1, 1, 1)]] // expected-note {{conflicting attribute is here}}
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the three conflicts? If I squint, I can see may be two :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the test output:

/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:52:34: error: conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function
  parallel_for<class KernelName>([]() {}); // expected-error 3 {{conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function}}
                                 ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:15:3: note: conflicting attribute is here
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
  ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:13:3: note: conflicting attribute is here
[[cl::reqd_work_group_size(2, 2, 2)]] void not_direct_two() {} // expected-note 2 {{conflicting attribute is here}}
  ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:52:34: error: conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function
  parallel_for<class KernelName>([]() {}); // expected-error 3 {{conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function}}
                                 ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:15:3: note: conflicting attribute is here
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
  ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:21:3: note: conflicting attribute is here
[[cl::reqd_work_group_size(4, 4, 4)]] // expected-note 1 {{conflicting attribute is here}}
  ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:52:34: error: conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function
  parallel_for<class KernelName>([]() {}); // expected-error 3 {{conflicting attributes applied to a SYCL kernel or SYCL_EXTERNAL function}}
                                 ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:15:3: note: conflicting attribute is here
[[intel::max_work_group_size(1, 1, 1)]] // expected-note 3 {{conflicting attribute is here}}
  ^
/export/iusers/ekeane1/workspaces/sycl/clang/test/SemaSYCL/check-notdirect-attribute-propagation.cpp:13:3: note: conflicting attribute is here
[[cl::reqd_work_group_size(2, 2, 2)]] void not_direct_two() {} // expected-note 2 {{conflicting attribute is here}}

It conflicts with 'not_direct_two', func_three, then not-direct-2 does again. This is a side-effect of us not removing these attributes and leaving them in place. This is consistent with the existing implementation, except in this case we hit it 2x (instead of the others conflicting with eachother) due to the Depth-first nature.

Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How awful would it be to also make this more const correct given that you're reworking much of it anyway?

@erichkeane
Copy link
Contributor Author

erichkeane commented Apr 7, 2021

How awful would it be to also make this more const correct given that you're reworking much of it anyway?

Can you point out some examples of where you mean? I tried to add const in a few places, but some of the dependencies make that difficult/impossible. If you have some offensive examples, I can at least try to work harder on those. For example, RecursiveASTVisitor's traverse functions for some reason are non-const, which made little sense to me. Additionally, we end up 'modifying' the function decls in a few places (like by adding or removing attributes!) so it tends to be difficult in other places too.

FYI: just building a patch to fix the comments you mentioned, so I didn't just resolve them for no reason :)

MarkDeviceFunction(Sema &S)
: RecursiveASTVisitor<MarkDeviceFunction>(), SemaRef(S) {}
DiagDeviceFunction(
Sema &S, const llvm::SmallPtrSetImpl<FunctionDecl *> &RecursiveFuncs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we could make RecursiveFuncs hold const FunctionDecl * because nothing mutates the functions once they get passed in here. However, I see why that's really annoying (because we take the container rather than using a more general interface with iterators), so I suppose this is as const correct as we're likely to get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, DeviceFunctions cannot be const I think, but the RecursiveFuncs I can make const. I'll do that, thanks!

Comment on lines +3520 to +3522
if (Existing->getXDimVal(Ctx) != RWGSA->getXDimVal(Ctx) ||
Existing->getYDimVal(Ctx) != RWGSA->getYDimVal(Ctx) ||
Existing->getZDimVal(Ctx) != RWGSA->getZDimVal(Ctx)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't specific to your review, but I think there's a bug here -- we're not checking to see how this attribute is spelled and so we may be comparing things we don't expect to compare. e.g., the OpenCL spelling for this attribute has argument order X, Y, Z and the SYCL spelling has argument order Z, Y, X.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol... that is depressing. I just moved this code though.

Are we sure that the SemaDeclAttr part doesn't correct the order of these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why this is an existing bug. I'm pretttttty sure that SemaDeclAttr isn't correcting the order of those because we need to keep the order sensible for AST dumping, pretty printing, etc. IIRC, we tried that route but had to roll it back. @smanna12 may know more (or be able to look into whether this is a real issue or just me being paranoid).

Copy link
Contributor

@smanna12 smanna12 Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a bug with the existing implementation. I plan to refactor this attribute (separate SYCL vs OpenCL spelling).

}
case attr::Kind::SYCLIntelMaxWorkGroupSize: {
auto *SIMWGSA = cast<SYCLIntelMaxWorkGroupSizeAttr>(A);
if (auto *Existing = SYCLKernel->getAttr<ReqdWorkGroupSizeAttr>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar concern here about not caring about the way the attribute was spelled.

case attr::Kind::SYCLIntelFPGADisableLoopPipelining:
case attr::Kind::SYCLIntelFPGAInitiationInterval:
case attr::Kind::SYCLSimd: {
if ((A->getKind() == attr::Kind::SYCLSimd) && KernelBody &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit messy -- put this logic in the attr::Kind::SYCLSimd case, then use LLVM_FALLTHROUGH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof... i didn't notice that when moving this. THAT is simple enough and offensive enough I'll fix it :)

AaronBallman
AaronBallman previously approved these changes Apr 8, 2021
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this generally LGTM. We can address the open question about reqd_work_group_size behavior in follow-ups.

Last time I use the web app for this :)
@erichkeane
Copy link
Contributor Author

Only change with the merge was the conflict with the recursion becoming an error (so a different message). This needs re-review, but the change is quite minimal.

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with the changes.

Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for this cleanup!

@bader
Copy link
Contributor

bader commented Apr 9, 2021

/summary:run

erichkeane pushed a commit to erichkeane/llvm that referenced this pull request Apr 9, 2021
Still need to do the code gen, but that is broken at least until the
rebase on intel#3475, which is awaiting merge.
@bader bader merged commit 551f528 into intel:sycl Apr 9, 2021
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Apr 14, 2021
* upstream/sycl: (39 commits)
  [CI] Switch to default clang-format version. (intel#3540)
  [Driver][NFC] Cleanup some option setting for SYCL offload (intel#3542)
  [GitHub Actions] Update main branch sync schedule
  [SYCL][NFC] Fix potential namespace conflicts with PSTL in tuple.hpp (intel#3541)
  [SYCL] Bump sycl library minor version (intel#3538)
  [SYCL][CUDA] Implemented cuda_piextUSMEnqueueMemAdvise (intel#3365)
  [SYCL][FPGA] Add mutual diagnostic of max_concurrency attribute in conjunction of disable_loop_pipelining attribute (intel#3512)
  [SYCL] [MATRIX] Enable joint_matrix_load, joint_matrix_store, and joint_matrix_mad for AMX (intel#3503)
  [ESIMD] Skip rewriting functions used through function pointers (intel#3527)
  [SYCL] Fix address space for spec constants buffer (intel#3521)
  [SYCL] Correct the tablegen for checking mutually exclusive stmt attrs (intel#3519)
  [SYCL][PI][L0][NFC] Refactor setting of LastCommandEvent (intel#3528)
  [SYCL] Fix group local memory sharing issue (intel#3489)
  [SYCL][NFC] Fix post-commit failure (intel#3532)
  [SYCL][Doc] Remove extension mechanism (intel#3526)
  [SYCL] Move sycl.hpp in install directory and adjust driver to match (intel#3523)
  [SYCL][ESIMD] Update ESIMD docs to address recent user comments: (intel#3516)
  [NFCI][SYCL] Correct -fdeclare-spirv-builtins to use marshalling (intel#3515)
  [SYCL] Rework MarkDevice and children (intel#3475)
  [SYCL] Fix StringLiteral Ctor issue from intel#3504. (intel#3520)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants