-
Notifications
You must be signed in to change notification settings - Fork 793
WIP: [SYCL] Diagnose violations from function object passed to the kernel #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Scan the body of the routine passed as a function object to the SYCL kernel. The call graph traversed previously to emit deferred diagnostics was missing the passed function object. Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
FnIt->second.FD->hasAttr<SYCLKernelAttr>()) { | ||
// Skip over the routines with sycl_kernel attributes | ||
// in the traceback as they are likely from SYCL headers. | ||
FnIt = S.DeviceKnownEmittedFns.find(FnIt->second.FD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this setting the next loop to the thing you just made sure has the kernel attribute? Shouldn't you be setting it to the 'next' one, or at least checking something different? What is the contents of DeviceKnownEmittedFns? Using a std::pair for that seems like a mistake here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what it does is find that this routine has the kernel attribute, and then tries to find its caller.
DeviceKnownEmittedFns is a map (llvm::DenseMap), with callee paired with its known caller and its location.
I am not very comfortable changing that data structure (used elsewhere too) and definitely don't want to change it as part of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the changes you have made, The commit message says "Scan the body of the routine passed as a function object to the SYCL kernel. "
Let's clarify terminology, at least...
I don't see any test cases which tests passing a function object to the SYCL kernel.
So, under "function object passed to the SYCL kernel" you mean lambda or functor, passed as an argument to the function with sycl_kernel
attribute, right?
That's what he means, yes. Also, I vastly prefer we not use the word "functor" as that has additional meaning in functional programming languages that is prefer not to get into. In standard C++ we prefer to call these function objects to prevent that confusion. |
Yes, thanks Erich. |
@@ -1533,6 +1540,24 @@ void Sema::markKnownEmitted( | |||
} | |||
} | |||
|
|||
// Function object calls are not walked above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why "Function object calls are not walked above"?
I don't see anything specific for function objects in the following code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function object call expression appears in the IR as:
CXXMethodDecl 0x55fbe8077ef0 <exc_func.cpp:18:44, col:55> col:41 operator() 'void () const' inline
`-CompoundStmt 0x55fbe8078040 <col:46, col:55>
`-CallExpr 0x55fbe8078020 <col:48, col:52> 'void'
`-ImplicitCastExpr 0x55fbe8078008 <col:48> 'void (*)()' <FunctionToPointerDecay>
`-DeclRefExpr 0x55fbe8077fe8 <col:48> 'void ()' lvalue Function 0x55fbe804d308 'bar' 'void ()'
This kind of call is not in the Call Graph created for the device, so I need to walk it here. It is the absence of this that makes the exception code such as in the example test added, not be diagnosed. This is what the code below this comment does. I am not sure what you mean by "I don't see anything specific". The code below, basically the block below "if (CS)", extracts the underlying function from the call expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand, thank you very much for the explanation.
Do you know why such calls are missing in the device Call Graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand, thank you very much for the explanation.
Do you know why such calls are missing in the device Call Graph?
I really don't know. That's the honest answer. :-)
This code was leveraged from existing implementation for CUDA; perhaps this does not make sense there? On the other hand, I do not see any explicit comments stating that this is deliberate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know. That's the honest answer. :-)
I think it would be great if we understand the problem before making any changes, isn't it? :)
AFAIK, more or less modern versions of CUDA support lambda functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd very much like to understand the scope of the required callgraph changes before allowing said fix me. Investigation here to see what change would be necessary would be appreciated.
Note uses of callgraph are well tested, so the risk of breaking something is low. The lack of testing is more for unused code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will investigate fixing the call-graph and report back in the next few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually use the clang::CallGrpah like we were discussing above. "DeviceCallGraph" is simply a dense map.
I see that OpenMP and CUDA do similar things, do they have the same problem? If so, is there a more generic solution for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I dug in. This isn't the solution. The problem is that CheckSYCLCall is getting the caller incorrectly.
getCurFunctionDecl gets the wrong thing. It will get the function containing the lambda at definition time, rather than the lambda operator(). In CodeGen there is a fantastic alternative here that is about a 6 character replacement (I think it is getCurBlockDecl()), but it isn't exactly as easy here.
I believe the best way (after looking through SEMA) is to change it to getCurLexicalContext, then check if that DeclContext is a FunctionOrMethod. If so, use it as the Caller FunctionDecl.
That said, that brings up the following issue:
int foo() { throw 1; }
struct S {
int bar = foo();
};
int main() { kernel_single_task<class fake_kernel>([](){ S s; });
That ALSO has the problem, except we do our reporting at a functiondecl->caller level, when the declcontext can actually be a CXXRecordDecl as well. Presumably, the devicecallgraph needs to have a NamedDecl (or even a Decl?, heck, perhaps a DeclContext?!) as its caller, since functions can be called at namespace, global, or class scope.
Consider:
int foo() { throw 1; }
int global_var = foo(); // always executed, even on device. Probably needs to diagnose as well!
Ugg... Thought of another example, so I'm not sure LEXICAL context is the right one. Experimentation is needed.
struct S {
static int f;
};
S::f = foo(); // always executed, but lexical context is namespace/TU decl. Really, it should be part of 'S'.
I wonder if we need to do some work later to get the actual declaration context in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Erich! The test cases are illuminating!
I will take a cut a redesigning how we do the device diagnostics.
@premanandrao Is this PR still work in progress? |
Thanks for the ping. I will close this PR and put in a new PR when I have a fix. |
Scan the body of the routine passed as a function object to the SYCL
kernel. The call graph traversed previously to emit deferred
diagnostics was missing the passed function object.
Signed-off-by: Premanand M Rao premanand.m.rao@intel.com