[SYCL] Implement hierarchical parallelism API. #221

kbobrovs · 2019-06-15T20:41:51Z

This is the first part of SYCL hierarchical parallelism implementation. It
implements all related APIs:

h_item class
group::parallel_for_work_item functions
handler::parallel_for_work_group functions

It is able to run workloads which use these APIs but do not contain data
or code with group-visible side effects between the work group and work
item scopes. Upcoming patches will implement the remaining part: private_data, data and code work-group level "scoping".

Signed-off-by: Konstantin Bobrovsky konstantin.s.bobrovsky@intel.com

agozillon · 2019-06-17T21:56:19Z

sycl/include/CL/sycl/detail/common.hpp

+// is to better support handling of situations where there must be a loop nest
+// over a multi-dimensional space - it allows to avoid generating unnecessary
+// outer loops like 'for (int z=0; z<1; z++)' in case of 1D and 2D iteration
+// spaces or writing specializations of the algorighms for 1D, 2D and 3D cases.


Not particularly relevant as it's understandable either way but "algorighms" -> "algorithms"

Thanks for the catch. Will change

agozillon · 2019-06-17T22:48:37Z

sycl/include/CL/sycl/handler.hpp

-  template <typename KernelType, int Dims>
-  void parallel_for(nd_range<Dims> ExecutionRange, KernelType KernelFunc) {
-    parallel_for<KernelType, KernelType, Dims>(ExecutionRange, KernelFunc);
+  template <typename KernelName = std::nullptr_t, typename KernelType, int Dims>


Possibly a reason behind it that I'm not seeing at the moment, but is there a reason the dimension is named "Dims" here and "dimensions" in kernel_parallel_for_work_group and in the other parallel_for_work_group interface (https://github.com/intel/llvm/pull/221/files#diff-234ddc8d342a3b24bb61d130262e9624R665). Perhaps worth renaming the other two that use "dimensions" to "Dims" to keep consistency. Maybe just me being pedantic though!

No particular reason. Will make it "Dims" - I agree, should be uniform.

agozillon · 2019-06-17T23:34:53Z

sycl/include/CL/sycl/detail/spirv_glue.hpp

+
+#include "CL/__spirv/spirv_vars.hpp"
+
+#define DEFINE_INIT_SIZES(POSTFIX)                                             \


This may be a silly request, but would it be possible to call this something other than spirv_glue.hpp, as at the moment we currently use these functions as a sort of top level API that offloads to SPIRV/SPIR intrinsics based on what we compile for. e.g: https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/include/CL/__spir/spir_vars.hpp I'm aware the names are SPIRV oriented though.

Maybe this isn't the place for that though or the best implementation on our end (I did have a slightly more complex/generic implementation in a prior commit but it was a bit overly complex for our current needs), but it would be nice to have a sort of top level id/size/group etc. API that calls the underlying implementation be it OpenCL SPIR calls, SPIRV calls or something else entirely. Hopefully this is a somewhat reasonable sounding idea/request.

Good point. We should not be tied to SPIRV, as there can be other models or even runtimes on the device side underneath. Is it correct interpretation of your concern? How about device_rt_intrin.hpp or device_rt_builtins.hpp? For now there is __spirv* variant only, later implementation for more native RTs can be added. @bader, please also comment.

That is indeed a correct interpretation of my concern! device_rt_builtins.hpp sounds fine by me but if you want to call it device_rt_intrin.hpp that's also all good in my book. Thanks for taking my concern into consideration!

I'm totally agree with @agozillon that spirv_glue is unfortunate name - even w/o applying his "generalization idea".
I think current implementation of these functions can reside in spirv_vars header.

I'm all in for using built-ins to get id/size information.
@agozillon, feel free to propose any changes simplifying triSYCL development. I think we can re-use https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/include/CL/__spir/spir_vars.hpp and have two different implementations of __spir_ocl_get_* built-ins:

#if __COMPILE_TO_SPIRV__ typedef size_t size_t_vec __attribute__((ext_vector_type(3))); extern "C" const __constant size_t_vec __spirv_BuiltInGlobalSize; extern "C" const __constant size_t_vec __spirv_BuiltInGlobalInvocationId; extern "C" const __constant size_t_vec __spirv_BuiltInWorkgroupSize; extern "C" const __constant size_t_vec __spirv_BuiltInLocalInvocationId; extern "C" const __constant size_t_vec __spirv_BuiltInWorkgroupId; extern "C" const __constant size_t_vec __spirv_BuiltInGlobalOffset; // __spir_ocl_get_* implementations can be generated by macro function constexpr __spir_ocl_get_global_id(int dim) { switch (dim) { case 0: return __spirv_BuiltInGlobalInvocationId.x; case 1: return __spirv_BuiltInGlobalInvocationId.y; case 2: return __spirv_BuiltInGlobalInvocationId.z; } } ... #elif __XILINX_SPECIFIC_IMPLEMENTATION__ // some other implementation of __spir_ocl_get_* #endif

@bader, I already made the change to device_rt_intrin.hpp, which seems to address @agozillon's concerns. Are you OK if we live with this name for now?

@bader, OK, will do

Something along those lines would be useful I think, I don't necessarily mind what form it takes and the top level function doesn't need to be named _spir_ocl_get* it could be _sycl_get* if you wish, in our case that's just my cheap/cheat way of working around conflicts with user functions with the same mangling as OpenCL SPIR built-ins like get_global_id etc.

If you guys are happy with an abstraction like that (similar to our current approach) it's good with us, but perhaps someone has a more generic approach? As for the most part it would be nice to have an API for these functions that allows anyone to add there own underlying implementation, not necessarily just SPIR/SPIRV built-ins!

I had a slightly more generic and perhaps less sane implementation in a previous commit: triSYCL/sycl@489f98d#diff-0e3fe0cefdf466dc994182d2e123ba01 that may be more useful? I reverted it as we don't really have need for other built-ins just yet, so I can't really justify forcing someone else to maintain it... and a separate file approach allowed me to avoid frequent merge conflicts.

In either case, I don't want to be a bother and hold up these commits for too long, it can perhaps be a separate merge/issue discussion. For now I'm happy with either, the renaming of the header to something less SPIRV centric or the movement of the functions to reside in spirv_vars. Whatever works for you guys!

Introducing _sycl_get* (or __sycl_get*) C APIs (C mangling assumed, right?) sounds like a good idea. Those would be generic SYCL "intrinsics" which must be resolved by actual SYCL device Back-end, e.g. by linking with _device.bc which will provide implementations of __sycl_get* for . So far we have __spirv* serving in this role, with a nice side effect that we don't need to provide _device.bc, because SPIRV translator knows how to lower them. But the problem with current __spirv* is that the interface is based on variables rather than functions, which is less robust.

This sounds like a separate important task which needs little more discussion, so I suggest that for the purpose of hierarchical parallelism I follow what @bader suggested - move init functions in the spirv_vars.hpp for now removing the device_rt_intrin.hpp, then we can refactor along the lines @agozillon proposed.
?

Yes, that sounds like a separate task. I am happy with the change of removing the new header and moving the functions into spirv_vars.hpp for now, that seems fine.

bader

My comments for the first two patches.

sycl/include/CL/sycl/group.hpp

bader · 2019-06-19T12:32:19Z

sycl/include/CL/sycl/detail/spirv_glue.hpp

+
+#include "CL/__spirv/spirv_vars.hpp"
+
+#define DEFINE_INIT_SIZES(POSTFIX)                                             \


I'm totally agree with @agozillon that spirv_glue is unfortunate name - even w/o applying his "generalization idea".
I think current implementation of these functions can reside in spirv_vars header.

I'm all in for using built-ins to get id/size information.
@agozillon, feel free to propose any changes simplifying triSYCL development. I think we can re-use https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/include/CL/__spir/spir_vars.hpp and have two different implementations of __spir_ocl_get_* built-ins:

#if __COMPILE_TO_SPIRV__ typedef size_t size_t_vec __attribute__((ext_vector_type(3))); extern "C" const __constant size_t_vec __spirv_BuiltInGlobalSize; extern "C" const __constant size_t_vec __spirv_BuiltInGlobalInvocationId; extern "C" const __constant size_t_vec __spirv_BuiltInWorkgroupSize; extern "C" const __constant size_t_vec __spirv_BuiltInLocalInvocationId; extern "C" const __constant size_t_vec __spirv_BuiltInWorkgroupId; extern "C" const __constant size_t_vec __spirv_BuiltInGlobalOffset; // __spir_ocl_get_* implementations can be generated by macro function constexpr __spir_ocl_get_global_id(int dim) { switch (dim) { case 0: return __spirv_BuiltInGlobalInvocationId.x; case 1: return __spirv_BuiltInGlobalInvocationId.y; case 2: return __spirv_BuiltInGlobalInvocationId.z; } } ... #elif __XILINX_SPECIFIC_IMPLEMENTATION__ // some other implementation of __spir_ocl_get_* #endif

sycl/include/CL/sycl/handler.hpp

keryell

Looks good.
Just a few comments...

keryell · 2019-06-19T18:29:29Z

sycl/include/CL/sycl/detail/common.hpp

+  // Terminal part of the implementation.
+  template <int DIM, template <int> class LoopBoundTy, typename FuncTy,
+            template <int> class LoopIndexTy = LoopBoundTy>
+  static ALWAYS_INLINE detail::enable_if_t<DIM == 0> iterate_impl(


It is not clear to me while you need std::enable_if and cannot rely on normal template standardization.
Something similar to https://github.com/triSYCL/triSYCL/blob/master/include/CL/sycl/parallelism/detail/parallelism.hpp#L119 ?

Right, this would be more elegant.

Interface remained the same, implementation will now use struct specialization for DIM==0 instead of enable_if_t to terminate recursion

keryell · 2019-06-19T18:39:33Z

sycl/include/CL/sycl/detail/cg.hpp

    for (int I = Dims_; I < 3; ++I) {
      GlobalSize[I] = 1;
      LocalSize[I] = LocalSize[0] ? 1 : 0;
      GlobalOffset[I] = 0;
+      NumWorkGroups[I] = 0;


Design question: is there any reason to have hard-coded 3D explicit descriptors.
It looks to me it is a hard-coded SPIR-V/OpenCL constraint that leaks out at a higher level and pollute the abstraction.
Would it make more sense to have only the required number of dimensions everywhere and just in the back-end (the PI stuff?) have some conversion to higher-level dimensions if required by enqueue-ing some kernel or whatever?

This sounds interesting. Maybe it is worth to open an issue and discuss the redesign there? @romanovvlad - please chime in.

This patch just follows the existing design here.

We always use 3 dimensions to get rid of template parameters, so we can easily pass it to the sycl library. We could probably create 3 different types for 3 dimensions, all of these types would be inherited from common interface, but I think it's too complex for this simple structure.

I understand your point.
On the long term we can do some optimization if we realize that it means more resources on the device to store this useless information and so on.

These fields are stored on host side only.

Ah, not that bad, then.

sycl/include/CL/sycl/detail/cg.hpp

keryell · 2019-06-19T18:42:49Z

sycl/include/CL/sycl/detail/cg.hpp

+  /// Number of workgroups, used to record the number of workgroups from the
+  /// simplest form of parallel_for_work_group. If set, all other fields must be
+  /// zero
+  sycl::range<3> NumWorkGroups;


Magical 3... It could be Dims

as @romanovvlad wrote:

We always use 3 dimensions to get rid of template parameters

sycl/include/CL/sycl/detail/cg.hpp

keryell · 2019-06-19T19:02:12Z

sycl/include/CL/sycl/item.hpp

@@ -20,6 +20,8 @@ struct Builder;
 }
 template <int dimensions> struct id;


Probably everything is a class instead of a struct now

id is still a struct

keryell · 2019-06-19T19:02:31Z

sycl/include/CL/sycl/item.hpp

@@ -86,6 +88,7 @@ template <int dimensions = 1, bool with_offset = true> struct item {
  // For call constructor inside conversion operator
  friend struct item<dimensions, false>;


Probably everything along is a class instead of a struct now

item is still a struct

It is always good to have some feedback from the implementors.
If you find some discrepancies in the spec, feel free to file an issue or even better a PR :-) on https://github.com/KhronosGroup/SYCL-Docs
Note that the published PDF might be out-dated compared to the source files.

sycl/test/hier_par/hier_par_basic.cpp

keryell · 2019-06-19T19:09:32Z

sycl/test/hier_par/hier_par_basic.cpp

+      // - h_item::get_global_id
+      // The global size is not known, so kernel code needs to adapt
+      // dynamically, hence the complex loop bound and index computation.
+      std::memset(ptr, 0, range_length * sizeof(ptr[0]));


At least it is useful to have some tests using C-style and pointers to be sure SYCL can work with C programs... :-)

Right. Let's say the test aims to be closer to what usual SYCL developer would write or port :-)

You are depressing me ;-)

bader

Overall looks good, but there are a few thing we can improve.

sycl/include/CL/sycl/detail/common.hpp

sycl/test/hier_par/hier_par_basic.cpp

sycl/source/detail/scheduler/commands.cpp

sycl/include/CL/sycl/id.hpp

kbobrovs · 2019-06-23T18:39:14Z

As suggested by @bader, I split this PR into two:
#239
kbobrovs#1

The last one depends on the previous one, which is not yet in intel/llvm, so I could not create it in intel/llvm. I'm was not sure how to best handle this, so I added all reviewers as collaborators into my repo, so that you guys can review there. I hope I addressed all outstanding comments from this PR.
@keryell, @agozillon, @romanovvlad - please take another look.

kbobrovs · 2019-06-27T06:33:49Z

Please review the last two commits (not sure why other commits got into this PR):

[SYCL] Move getNextPowerOfTwo body into the header to avoid link errors.
-- this one fixes some existing problem which prevented me from testing
[SYCL] Implement hierarchical parallelism API.
-- this one is moved from my private branch PR ([SYCL] Implement hierarchical parallelism API. kbobrovs/llvm#1) back here per @bader 's request. All comments are addressed.

kbobrovs · 2019-06-27T17:01:52Z

[SYCL] Move getNextPowerOfTwo body into the header to avoid link errors.
-- this one fixes some existing problem which prevented me from testing

This one was fixed by others, so please review only 1 commit - [SYCL] Implement hierarchical parallelism API.

bader

Overall looks good.
A few minor issues.
Please, also finish resolving issues from my previous review - there a couple of them left.

sycl/test/hier_par/hier_par_basic.cpp

sycl/include/CL/sycl/detail/cg.hpp

sycl/include/CL/sycl/group.hpp

sycl/test/hier_par/hier_par_basic.cpp

kbobrovs · 2019-06-30T05:55:07Z

Had to revert template <... template <int> typename ...> back to template <... template <int> class ...>, because the former leads to warning that it is C++17 feature and one of test failures.

bader · 2019-07-01T10:02:10Z

@kbobrovs, could you resolve the conflict with recently committed patches, please?

This is the first part of SYCL hierarchical parallelism implementation. It implements main related APIs: - h_item class - group::parallel_for_work_item functions - handler::parallel_for_work_group functions It is able to run workloads which use these APIs but do not contain data or code with group-visible side effects between the work group and work item scopes. Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>

kbobrovs · 2019-07-02T05:54:13Z

@kbobrovs, could you resolve the conflict with recently committed patches, please?

ok, done

romanovvlad assigned bader Jun 16, 2019

agozillon reviewed Jun 17, 2019

View reviewed changes

kbobrovs force-pushed the sycl branch from 34dd24f to 0da066d Compare June 18, 2019 21:44

bader reviewed Jun 19, 2019

View reviewed changes

keryell reviewed Jun 19, 2019

View reviewed changes

bader requested changes Jun 21, 2019

View reviewed changes

This was referenced Jun 21, 2019

Fixes enabling hierarchical parallelism API. #239

Merged

[SYCL] Implement hierarchical parallelism API. kbobrovs/llvm#1

Closed

kbobrovs closed this Jun 24, 2019

kbobrovs force-pushed the sycl branch from 0da066d to c05692e Compare June 24, 2019 04:59

kbobrovs mentioned this pull request Jun 26, 2019

[SYCL] Abstraction layer for device-side SIMT intrinsics like getting global id #253

Closed

kbobrovs reopened this Jun 27, 2019

kbobrovs force-pushed the sycl branch from add3411 to 4212a8a Compare June 27, 2019 17:00

bader reviewed Jun 28, 2019

View reviewed changes

sycl/test/hier_par/hier_par_basic.cpp Outdated Show resolved Hide resolved

sycl/test/hier_par/hier_par_basic.cpp Outdated Show resolved Hide resolved

sycl/include/CL/sycl/detail/cg.hpp Outdated Show resolved Hide resolved

sycl/include/CL/sycl/group.hpp Outdated Show resolved Hide resolved

kbobrovs force-pushed the sycl branch 2 times, most recently from 5829664 to 49455d0 Compare June 28, 2019 19:26

bader reviewed Jun 28, 2019

View reviewed changes

sycl/test/hier_par/hier_par_basic.cpp Outdated Show resolved Hide resolved

bader mentioned this pull request Jun 28, 2019

[SYCL] Image_accessor Host Implementation #271

Merged

kbobrovs force-pushed the sycl branch 3 times, most recently from eb6444d to 90b703d Compare June 28, 2019 22:44

bader previously approved these changes Jun 29, 2019

View reviewed changes

kbobrovs dismissed bader’s stale review via 1414345 June 29, 2019 19:02

kbobrovs force-pushed the sycl branch from 90b703d to 1414345 Compare June 29, 2019 19:02

bader previously approved these changes Jun 29, 2019

View reviewed changes

kbobrovs dismissed bader’s stale review via 62e436d July 2, 2019 05:53

kbobrovs force-pushed the sycl branch from 1414345 to 62e436d Compare July 2, 2019 05:53

bader approved these changes Jul 2, 2019

View reviewed changes

bader merged commit 4646fc1 into intel:sycl Jul 2, 2019


		#include "CL/__spirv/spirv_vars.hpp"

		#define DEFINE_INIT_SIZES(POSTFIX) \

		@@ -20,6 +20,8 @@ struct Builder;
		}
		template <int dimensions> struct id;

		@@ -86,6 +88,7 @@ template <int dimensions = 1, bool with_offset = true> struct item {
		// For call constructor inside conversion operator
		friend struct item<dimensions, false>;

[SYCL] Implement hierarchical parallelism API. #221

[SYCL] Implement hierarchical parallelism API. #221

Uh oh!

Conversation

kbobrovs commented Jun 15, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agozillon Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agozillon Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

keryell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

agozillon Jun 17, 2019 •

edited

Loading

agozillon Jun 20, 2019 •

edited

Loading

bader left a comment •

edited

Loading