Skip to content

WIP: [SYCL] Diagnose violations from function object passed to the kernel #654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions clang/lib/Sema/Sema.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1386,6 +1386,13 @@ Sema::Diag(SourceLocation Loc, const PartialDiagnostic& PD) {
static void emitCallStackNotes(Sema &S, FunctionDecl *FD) {
auto FnIt = S.DeviceKnownEmittedFns.find(FD);
while (FnIt != S.DeviceKnownEmittedFns.end()) {
if (S.getLangOpts().SYCLIsDevice &&
FnIt->second.FD->hasAttr<SYCLKernelAttr>()) {
// Skip over the routines with sycl_kernel attributes
// in the traceback as they are likely from SYCL headers.
FnIt = S.DeviceKnownEmittedFns.find(FnIt->second.FD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this setting the next loop to the thing you just made sure has the kernel attribute? Shouldn't you be setting it to the 'next' one, or at least checking something different? What is the contents of DeviceKnownEmittedFns? Using a std::pair for that seems like a mistake here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what it does is find that this routine has the kernel attribute, and then tries to find its caller.
DeviceKnownEmittedFns is a map (llvm::DenseMap), with callee paired with its known caller and its location.

I am not very comfortable changing that data structure (used elsewhere too) and definitely don't want to change it as part of this change.

continue;
}
DiagnosticBuilder Builder(
S.Diags.Report(FnIt->second.Loc, diag::note_called_by));
Builder << FnIt->second.FD;
Expand Down Expand Up @@ -1533,6 +1540,24 @@ void Sema::markKnownEmitted(
}
}

// Function object calls are not walked above.
Copy link
Contributor

@Fznamznon Fznamznon Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why "Function object calls are not walked above"?
I don't see anything specific for function objects in the following code.

Copy link
Contributor Author

@premanandrao premanandrao Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function object call expression appears in the IR as:

CXXMethodDecl 0x55fbe8077ef0 <exc_func.cpp:18:44, col:55> col:41 operator() 'void () const' inline
`-CompoundStmt 0x55fbe8078040 <col:46, col:55>
  `-CallExpr 0x55fbe8078020 <col:48, col:52> 'void'
    `-ImplicitCastExpr 0x55fbe8078008 <col:48> 'void (*)()' <FunctionToPointerDecay>
      `-DeclRefExpr 0x55fbe8077fe8 <col:48> 'void ()' lvalue Function 0x55fbe804d308 'bar' 'void ()'

This kind of call is not in the Call Graph created for the device, so I need to walk it here. It is the absence of this that makes the exception code such as in the example test added, not be diagnosed. This is what the code below this comment does. I am not sure what you mean by "I don't see anything specific". The code below, basically the block below "if (CS)", extracts the underlying function from the call expression.

Copy link
Contributor

@Fznamznon Fznamznon Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand, thank you very much for the explanation.
Do you know why such calls are missing in the device Call Graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand, thank you very much for the explanation.
Do you know why such calls are missing in the device Call Graph?

I really don't know. That's the honest answer. :-)
This code was leveraged from existing implementation for CUDA; perhaps this does not make sense there? On the other hand, I do not see any explicit comments stating that this is deliberate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know. That's the honest answer. :-)

I think it would be great if we understand the problem before making any changes, isn't it? :)
AFAIK, more or less modern versions of CUDA support lambda functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd very much like to understand the scope of the required callgraph changes before allowing said fix me. Investigation here to see what change would be necessary would be appreciated.

Note uses of callgraph are well tested, so the risk of breaking something is low. The lack of testing is more for unused code paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will investigate fixing the call-graph and report back in the next few days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually use the clang::CallGrpah like we were discussing above. "DeviceCallGraph" is simply a dense map.

I see that OpenMP and CUDA do similar things, do they have the same problem? If so, is there a more generic solution for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I dug in. This isn't the solution. The problem is that CheckSYCLCall is getting the caller incorrectly.

getCurFunctionDecl gets the wrong thing. It will get the function containing the lambda at definition time, rather than the lambda operator(). In CodeGen there is a fantastic alternative here that is about a 6 character replacement (I think it is getCurBlockDecl()), but it isn't exactly as easy here.

I believe the best way (after looking through SEMA) is to change it to getCurLexicalContext, then check if that DeclContext is a FunctionOrMethod. If so, use it as the Caller FunctionDecl.

That said, that brings up the following issue:

int foo() { throw 1; }
struct S {
    int bar = foo();
};
int main() { kernel_single_task<class fake_kernel>([](){ S s; });

That ALSO has the problem, except we do our reporting at a functiondecl->caller level, when the declcontext can actually be a CXXRecordDecl as well. Presumably, the devicecallgraph needs to have a NamedDecl (or even a Decl?, heck, perhaps a DeclContext?!) as its caller, since functions can be called at namespace, global, or class scope.

Consider:

int foo() { throw 1; }
int global_var = foo(); // always executed, even on device.  Probably needs to diagnose as well!

Ugg... Thought of another example, so I'm not sure LEXICAL context is the right one. Experimentation is needed.

struct S {
    static int f;
};
S::f = foo(); // always executed, but lexical context is namespace/TU decl.  Really, it should be part of 'S'.  

I wonder if we need to do some work later to get the actual declaration context in those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Erich! The test cases are illuminating!
I will take a cut a redesigning how we do the device diagnostics.

// Extract the call expression from the statement block.
FunctionDecl *FD = C.Callee->getDefinition();
const CompoundStmt *CS = FD ? dyn_cast_or_null<CompoundStmt>(FD->getBody())
: nullptr;
if (CS)
for (auto *I : CS->body()) {
FunctionDecl *D = nullptr;
CallExpr *CE = dyn_cast<CallExpr>(I);
if (CE)
D = dyn_cast_or_null<FunctionDecl>(CE->getCalleeDecl());
if (!D || Seen.count(D) || IsKnownEmitted(S, D))
continue;
Seen.insert(D);
Worklist.push_back(
{/* Caller = */ C.Callee, /* Callee = */ D, CE->getBeginLoc()});
}

// Add all functions called by Callee to our worklist.
auto CGIt = S.DeviceCallGraph.find(C.Callee);
if (CGIt == S.DeviceCallGraph.end())
Expand Down
12 changes: 0 additions & 12 deletions clang/lib/Sema/SemaSYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -354,18 +354,6 @@ class MarkDeviceFunction : public RecursiveASTVisitor<MarkDeviceFunction> {
return true;
}

bool VisitGCCAsmStmt(GCCAsmStmt *S) {
SemaRef.Diag(S->getBeginLoc(), diag::err_sycl_restrict)
<< Sema::KernelUseAssembly;
return true;
}

bool VisitMSAsmStmt(MSAsmStmt *S) {
SemaRef.Diag(S->getBeginLoc(), diag::err_sycl_restrict)
<< Sema::KernelUseAssembly;
return true;
}

// The call graph for this translation unit.
CallGraph SYCLCG;
// The set of functions called by a kernel function.
Expand Down
15 changes: 15 additions & 0 deletions clang/test/SemaSYCL/diag-fobj-call.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
// RUN: %clang_cc1 -fcxx-exceptions -fsycl-is-device -fsyntax-only -verify %s

void bar() { throw 5; } // expected-no-error
void foo() { throw 10; } // expected-error {{SYCL kernel cannot use exceptions}}

template <typename name, typename Func>
__attribute__((sycl_kernel)) void kernel_single_task(Func kernelFunc) {
kernelFunc();
}

int main() {
// expected-note@+1 {{called by 'operator()'}}
kernel_single_task<class fake_kernel>([]() { foo(); });
bar();
}
1 change: 1 addition & 0 deletions clang/test/SemaSYCL/inline-asm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ __attribute__((sycl_kernel)) void kernel_single_task(Func kernelFunc) {

int main() {
foo();
// expected-note@+1 {{called by 'operator()'}}
kernel_single_task<class fake_kernel>([]() { bar(); });
return 0;
}
1 change: 0 additions & 1 deletion clang/test/SemaSYCL/sycl-restrict.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,6 @@ __attribute__((sycl_kernel)) void kernel_single_task(Func kernelFunc) {
kernelFunc();
a_type ab;
a_type *p;
// expected-note@+1 {{called by 'kernel_single_task'}}
use2(ab, p);
}

Expand Down