-
Notifications
You must be signed in to change notification settings - Fork 770
[SYCL] Add new auto device code split mode #2827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add new auto device code split mode #2827
Conversation
/summary:run |
I would argue that the current title is somewhat misleading: before revising the actual changes, I've mistaken this for "always run the split unless
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add tests.
/summary:run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The driver part LGTM.
This patch introduces new device code split mode `auto`, which is intended to automatically select the best device code split mode and apply it. At the moment, `auto` is equivalent to `per_source` for most cases and it is equivalent to `off` in case of precense of function pointers.
Added LIT tests. Update heuristic to disable device code split when indirect calls are present in the input module.
4175932
to
73faebb
Compare
@@ -289,6 +291,42 @@ enum KernelMapEntryScope { | |||
Scope_Global // single entry in the map for all kernels | |||
}; | |||
|
|||
static KernelMapEntryScope selectDeviceCodeSplitScopeAutomatically(Module &M) { | |||
// Here we can employ various heuristics to decide which way to split kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more comments then the code here :) I think the idea is pretty straightforward and the description could be shrinked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-structured comments a bit in edfa44c
if (IROutputOnly && DoSplit) { | ||
errs() << "error: -" << SplitMode.ArgStr << " can't be used with -" | ||
<< IROutputOnly.ArgStr << "\n"; | ||
if (IROutputOnly && (DoSplit && SplitMode != SPLIT_AUTO)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why SPLIT_AUTO is allowed with IR output only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why SPLIT_AUTO is allowed with IR output only?
Technically this is me being a bit lazy to dig further into the driver and do some changes there. Basically, there are compilation flows (related to AOT), where sycl-post-link
is involved with -ir-output-only
option and according to my changes, -split=
is almost always passed to sycl-post-link
(unless split was explicitly disabled) and it triggered errors in some LIT tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the auto
heuristics in presence of -ir-output-only
is fixed to "don't split" - ? please add a comment then, to user-visible description too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the auto heuristics in presence of -ir-output-only is fixed to "don't split" - ? please add a comment then, to user-visible description too
That's correct, updated help message of the tool in edfa44c
Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The driver part LGTM.
Since `auto` value introduction for `-fsycl-device-code-split` in commit 184d258 (intel#2827), it stands as the default code split mechanism. Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Since `auto` value introduction for `-fsycl-device-code-split` in commit 184d258 (intel#2827), it stands as the default code split mechanism. Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
This improves SPV_KHR_untyped_pointers extension. Removing struct type from special handling (translate as typed pointer) allowed to fix `spirv-val` error in `CXX/global-ctor.cl` test: ``` error: line 88: OpFunctionCall Argument <id> '25[%this1]'s type does not match Function <id> '11[%_ptr_Generic_class_Something]'s parameter type. %30 = OpFunctionCall %void %_ZNU3AS49SomethingC2Ei %this1 %26 ``` Other changes allow to translate structs in a new way without violating validation or test checks. Original commit: KhronosGroup/SPIRV-LLVM-Translator@15fd1cc50e12465
This patch introduces new device code split mode
auto
, which isintended to automatically select the best device code split mode and
apply it.
At the moment,
auto
is equivalent toper_source
for most cases andit is equivalent to
off
in case of precense of function pointers.