[SYCL][Docs] Add design for moving to the new offloading model #8658

mdtoguchi · 2023-03-15T00:52:27Z

The community has moved to using a different model for performing offloading behaviors during link. This document provides details of moving from our current offloading model which performs all of the device link and compilation behaviors in the driver and moves these actions into a separate tool which performs a more 'conglomerate' link step.

bader

@mdtoguchi, the document looks good.
I think it's ready for review, so I suggest convert this draft to a pull request.
🔥🚀

sycl/doc/design/OffloadDesign.md

bader · 2023-03-17T22:34:54Z

sycl/doc/design/OffloadDesign.md

+to be controlled by the `clang-linker-wrapper`.  There are controlling options
+that are available to the user which need to be provided to the wrapper to
+understand which device libraries are not desired by the end user.


"not desired"? Can you give an example, please?
I thought "device library" is a separate file, so we can just remove the name of the library from the command line.

I don't know how much it is used in practice, but the behavior is controlled by the existing -fsycl-device-lib=arg. so maybe the wording of 'not desired' is incorrect and should be more just a statement of control that is available that can be used.

sycl/doc/design/OffloadDesign.md

mdtoguchi · 2023-03-21T13:59:00Z

@bader, thanks for the comments. I will integrate them and add information about the decision to allow for the ability to read the older format before marking this as ready.

ajaykumarkannan · 2023-04-10T18:02:16Z

sycl/doc/design/OffloadDesign.md

+
+It is expected that all new binaries generated with the updated offloading
+model will represent the embedded fat object format, moving away from the
+`clang-offload-bundler` usage.  We will not support a mixing of fat object


This could be a problem for users that want to use existing third party libraries with newer libraries. I'm not sure if that's a use case that is worth addressing here.

The ability to read in the older format should allow for the mixing of old and new libraries. It is not expected to be able to have both old and new format objects in a single archive however.

That's the situation I'm slightly worried about if a third party vendor doesn't upgrade the format. Is there a mechanism for warning the user when mixing the two types, and is there a way to upgrade an existing library to the newer format?

Please note that backwards compatibility is not guaranteed to be preserved forever anyway and there are cases when the application has to be anyway recompiled in order to work with newer runtime.

It is not expected to be able to have both old and new format objects in a single archive however.

@mdtoguchi, I wonder, though, why can't we support that? If we anyway scan each archive member to handle it in accordance with its type, why can't we detect old/new format at that stage and act on it in a correct way? Am I missing something?

We could support, we would need to get down to within the archive level and unbundle/extract on an individual basis there. Would there be instances where folks would want to use old objects, compile new objects and put them into an archive, or even just adding new objects to an old archive?

I think mostly just linking old archives with new archives, but it's a contrived case that we may be able to live without.

ajaykumarkannan · 2023-04-10T18:22:04Z

sycl/doc/design/OffloadDesign.md

+
+| Option Name                  | Purpose                                      |
+|------------------------------|----------------------------------------------|
+| `--fpga-tool-deps=<arg>`     | Comma separated list of dependency files used for FPGA hardware compiles using `aoc` |


Is this list only ever needed for FPGA compile? Or can it be generalized to something like --backend-tool-deps and populated only when necessary?

As I think of this, the dependency file that is generated during the compilation should be added to the final binary before the link is called. This would mean that this option is not necessary. It also implies we need to call out the ability to add the dependency file to the fat binary so it can be used during the link/call to aoc.

ajaykumarkannan · 2023-04-10T18:23:42Z

sycl/doc/design/OffloadDesign.md

+#### spir64_fpga support
+
+Compilation behaviors involving AOT for FPGA involve an additional call to
+the either `aoc` (for Hardware) or `opencl-aot` (for Simulation).  This call


opencl-aot is only needed for what we call emulation (x86 emulation of the user kernel). aoc is used for both hardware and simulation (cycle-accurate simulation of the generated Verilog code) platforms.

Thanks - I will correct this.

ajaykumarkannan · 2023-04-10T18:26:32Z

sycl/doc/design/OffloadDesign.md

+command will be processed by a new options to the wrapper,
+`--fpga-tool-arg=<arg>`
+
+The FPGA target also has support for additional generated binaries that


I just want to clarify something here. With this new model, it sounds like it's possible to package binaries like aoco, aocr, or aocx into the fat object files that the link stage can pick up, correct? If yes, then I'm not sure if Diagram 1 is exactly right, since it sounds like you may have an "Offline Compile" stage there. Unless you're implying that there could be a AOT compile stage as part of the "Device Compile".

The case of using aocx/aoco/aocr files in object files/archives is not explicitly called out and the diagram is more a reflection of the general AOT compilation case. The usage of the FPGA specific device types should be spelled out more succinctly.

gmlueck · 2023-04-11T17:50:31Z

sycl/doc/design/OffloadDesign.md

@@ -0,0 +1,326 @@
+# Implementation design for offloading model


Adding a global comment here, so the conversation can be threaded.

It seems like there is a lot of overlap with this design and the change we plan to do for AOT for the optional kernel features. I think this design document should at least describe how those AOT changes will be reflected in this new offload design. We could still implement them as separate phases if we want, though I suspect it will be easier to do some of the work together.

In particular, I think we should consider adding the following "preparatory work" to this design, which will make the optional kernel features support easier:

Fully support the interface where specific device names are provided via the -fsycl-targets command line option. Currently, we allow only a single device name like -fsycl-targets=intel_gpu_pvc. We should expand this to allow multiple device names such as -fsycl-targets=intel_gpu_pvc,intel_gpu_dg1.

Currently, we invoke ocloc once for all GPU targets. This should be changed to invoke it separately for each GPU device on the -fsycl-targets command line option.

Currently, the output from ocloc (i.e. native code for all GPU targets) is bundled as one offload region whose type is PI_DEVICE_BINARY_TYPE_NATIVE. Now that we invoke ocloc separately for each GPU device, will will have a separate offload region for each GPU device named on the -fsycl-targets command line option. We need some more device-specific label for these offload regions other than PI_DEVICE_BINARY_TYPE_NATIVE.

The SYCL runtime needs to be adjusted to find the appropriate offload region for the target device.

I think this will affect the offload design in several ways, but one that comes to mind is the --gen-tool-arg option that you propose. This is probably not granular enough because clang-linker-wrapper will invoke ocloc several times, once for each GPU device. Do we need some way to specify different arguments for each invocation?

Good points - current proposal is very singular in nature. In order to support the multiple targets, this will need to be broken down into meaningful pieces where we can specify an arbitrary number of different devices and given each device, a separate ocloc call is performed. This should also provide a way of passing different additional options to each individual ocloc call, effectively enabling support for items like -Xsycl-target-backend=intel_gpu_pvc <opts>

gmlueck · 2023-04-11T17:52:59Z

sycl/doc/design/OffloadDesign.md

+`-Xsycl-target-backend=spir64_x86_64 <opts>` command will be processed by a new
+option to the wrapper, `--cpu-tool-arg=<arg>`
+
+### Integration of the sycl-aspect-filter


Did you intend to leave this section empty?

FWIW, I think we will change the "optional kernel features" design, so that we do not need the sycl-aspect-filter tool. Instead, I think sycl-post-link can do the filtering itself and emit different LLVM bitcode files for each device when in AOT mode.

Maybe this section should just be deleted?

It was kind of a placeholder, but I didn't do anything with it. I will remove.

artemrad

Does this change include the development of developer tools that will allow us to extract intermediate files from the executable?

mdtoguchi · 2023-04-11T22:47:17Z

Does this change include the development of developer tools that will allow us to extract intermediate files from the executable?

@artemrad, there is an existing tool named clang-offload-extract which we can retain to allow for extraction of embedded target images. clang-offload-packager can be used to unbundle/extract from generated fat objects.

asudarsa · 2023-04-13T23:20:49Z

sycl/doc/design/images/OffloadGeneralFlow.png

Thanks Mike for the documentation. Looks good overall. One comment. llvm-spirv translation stage is missing inside the Linker Wrapper. Please add that here and also mention it in the text.

Thanks

ajaykumarkannan

Generally, I think the proposal looks good to me.

gmlueck · 2023-05-16T20:35:51Z

sycl/doc/design/OffloadDesign.md

+|--------|---------------|----------------|----------------------------|
+| CPU    | spir64_x86_64 | opencl-aot     | `--cpu-tool-arg=<arg>`     |
+| GPU    | spir64_gen    | ocloc          | `--gen-tool-arg=<arg>`     |
+| FPGA   | spir64_fpga   | aoc/opencl-aot | `--fpga-tool-arg=<arg>`    |


How does clang-linker-wrapper know whether it needs to invoke ocloc, opencl-aot, etc.? I guess it must look for the presence of the --gen-tool-arg, --cpu-tool-arg, etc. options? What if there are no "arguments" that need to be passed to that tool? Is this an issue, or do we always have some arguments that need to be specified?

I think the offline compiler is selected by the target triple.

I have added some clarifying information. The clang-linker-wrapper is responsible for full discovery of targets based on findings within the binaries. 431dff7

gmlueck · 2023-05-16T21:02:59Z

sycl/doc/design/OffloadDesign.md

+*Example: clang-linker-wrapper options*
+
+Each OCLOC call will be represented as a separate device binary that is
+individually wrapped and linked into the final executable.


How does clang-linker-wrapper know how many times to invoke ocloc? How does it know which GPU target is associated with each ocloc invocation?

Is the idea that clang-linker-wrapper will count the number of --gen-tool-arg options and parse the value to find the "-device pvc" part?

Would it make sense to change the syntax to make it easier to find the device name, something like:

--gen-tool-arg=pvc "-options extraopt_pvc"

Something else to keep in mind ... my thought is that we will change sycl-post-link to emit a separate device image for each GPU target. This is another reason why clang-linker-wrapper will need to know the set of GPU targets.

Good point - we will need to provide a way to differentiate more than just the arch (spir64_gen vs spir64_x86_64). I like your idea of taking the known syntax of -opt=<target> "opts" which we currently use for -Xsycl-target-backend

gmlueck · 2023-05-16T21:30:11Z

sycl/doc/design/OffloadDesign.md

+It is expected that the wrap information that is generated to be wrapped
+around the device binary will match current wrapping information that is used
+for the exiting offload model.  The wrapping in the old model is using the
+`clang-offload-wrapper` tool.


In order to implement the AOT part of "optional kernel features", we need to somehow annotate each device binary with an identifier for its GPU target. For example, a device binary created for --gen-tool-arg=pvc needs to be annotated as a "PVC" binary. My thought was to use the GMDID for this annotation. For example, the GMDID for pvc is 12.60.7 (see this listing).

I think we currently annotate AOT binaries with PI_DEVICE_BINARY_TYPE_NATIVE. Is this part of the "wrapping" you describe here? Is there anything about this design that will make it difficult to change this to the GMDID?

We would have to provide the GMDID mapping in the driver (and continue to update as new ID values come along) so when a user wants intel_gpu_pvc the GMDID will be used and embedded into corresponding information tagged with the device binary within the fat object. When the clang-linker-wrapper is extracting, it will know to use the information as tagged accordingly to set the proper device with the ocloc call.

Rather than embedding the list of GMDID's in the driver, I think we can include the GMDID for each target in the device config file that we are designing (#9371). Once that happens, the driver will use the config file information to determine the set of legal targets for -fsycl-targets and also get the associated GMDID from there.

Clean up some smaller topics regarding fat objects and move them Add offload-packager usage information

aelovikov-intel

LGTM but it's somewhat outside of my area of expertise. @sergey-semenov , @steffenlarsen do you want to take a look as well?

asudarsa · 2023-07-20T20:31:10Z

sycl/doc/design/OffloadDesign.md

Suggested change

The device libraries that are linked in is provided by the driver. The driver

is responsible for letting the `clang-linker-wrapper` know what device libraries

are required to be linked in as well as the location.

A list of device libraries that need to linked in with user code is provided by the driver. The driver is also responsible for letting the `clang-linker-wrapper` know the location of the device libraries.

asudarsa · 2023-07-20T20:33:47Z

sycl/doc/design/OffloadDesign.md

Do we need this option? Can you please specify example of files do we expect to pass here? Thanks

@asudarsa, the libraries here are the ones that are part of the first llvm-link call which does not use --only-needed. Basically device libraries that are required in full. Perhaps this isn't needed for SYCL though but more for OpenMP. I'll clean this out here.

asudarsa · 2023-07-20T20:34:20Z

sycl/doc/design/OffloadDesign.md

Suggested change

| `--device-library-location=<arg>` | The location in which the device libraries reside to be used during compilation |

| `--device-library-location=<arg>` | The location in which the device libraries reside |

asudarsa · 2023-07-20T20:36:11Z

sycl/doc/design/OffloadDesign.md

Suggested change

*Table: Options to control device libraries*

*Table: Options to pass device libraries to the clang-linker-wrapper*

asudarsa · 2023-07-20T20:39:18Z

sycl/doc/design/OffloadDesign.md

IIUC

Suggested change

The device libraries are controlled via the `-fno-sycl-device-lib=arg` option

where the driver determines based on this option which libraries to tell the

linker wrapper to pull in.

The driver also passes `-fno-sycl-device-lib=arg` option to the clang-linker-wrapper. This option is used to determine the exact set of device libraries that need to be pulled in.

This shouldn't be passed to the linker wrapper. As the driver controls what device libraries are to be linked in, the driver can determine which ones are associated to the -fno-sycl-device-lib option.

asudarsa · 2023-07-20T20:40:32Z

sycl/doc/design/OffloadDesign.md

Do we want to repeat this option here?

This corresponds to the 'list' of device libraries which are now controlled by the driver. I will remove this.

asudarsa

few changes requested. Please address. Thanks

asudarsa

LGTM. Thanks

sycl/doc/design/OffloadDesign.md

mdtoguchi · 2023-08-22T17:32:33Z

@intel/llvm-gatekeepers, can this be merged?

bader reviewed Mar 17, 2023

View reviewed changes

mdtoguchi added 3 commits March 27, 2023 17:08

Add some information in regards to SPIR-V based device in fat objects

8098ad2

Address a few more comments from review

9fbf1ae

Additional information regarding support of older bundled format

54b1258

mdtoguchi marked this pull request as ready for review March 28, 2023 01:41

mdtoguchi requested a review from a team as a code owner March 28, 2023 01:41

Add doc to toc

a8d3caa

mdtoguchi requested a review from a team as a code owner March 28, 2023 17:08

bader approved these changes Mar 31, 2023

View reviewed changes

ajaykumarkannan reviewed Apr 10, 2023

View reviewed changes

AlexeySachkov approved these changes Apr 11, 2023

View reviewed changes

gmlueck reviewed Apr 11, 2023

View reviewed changes

artemrad reviewed Apr 11, 2023

View reviewed changes

Merge remote-tracking branch 'intel_llvm/sycl' into offload-design

56d82ff

asudarsa reviewed Apr 13, 2023

View reviewed changes

mdtoguchi added 6 commits April 13, 2023 17:04

Elaborate on ability to signify multiple spir64_gen device calls

b5dd9a2

Improve information in regards to FPGA specific AOT compiles

1bace07

Update images to svg, add info for post-link and translate tools

e58f78f

Merge remote-tracking branch 'intel_llvm/sycl' into offload-design

9eeb10f

Adjust image as it was not being rendered consistently

256edeb

Improve documentation considering FPGA behaviors and unique AOC binaries

52b9a1c

mdtoguchi requested review from ajaykumarkannan, asudarsa and gmlueck May 10, 2023 19:07

ajaykumarkannan approved these changes May 15, 2023

View reviewed changes

gmlueck reviewed May 16, 2023

View reviewed changes

AlexeySachkov mentioned this pull request May 30, 2023

[SYCLNATIVECPU] Initial Native CPU plug-in implementation #9635

Merged

mdtoguchi mentioned this pull request Jun 7, 2023

[SYCL][Docs] Add design document for Device Config File #9371

Merged

AlexeySachkov mentioned this pull request Jun 15, 2023

C++ Modules Do Not Work with SYCL #9245

Open

mdtoguchi added 6 commits July 10, 2023 17:32

Add enhanced syntax for passing options for spir64_gen targets

b164ce5

Update device library usage model

744c50e

Adjust a number of smaller review comments - modify llvm-foreach usage

20e2396

Merge remote-tracking branch 'intel_llvm/sycl' into offload-design

2bf4a8a

Review adjustments

7ae098d

Clean up some smaller topics regarding fat objects and move them Add offload-packager usage information

Update fat object generation image

bc17fe5

mdtoguchi requested a review from a team as a code owner July 11, 2023 17:44

mdtoguchi requested review from aelovikov-intel and asudarsa July 11, 2023 17:44

aelovikov-intel approved these changes Jul 11, 2023

View reviewed changes

asudarsa reviewed Jul 20, 2023

View reviewed changes

asudarsa requested changes Jul 21, 2023

View reviewed changes

mdtoguchi added 2 commits July 20, 2023 17:59

Merge remote-tracking branch 'intel_llvm/sycl' into offload-design

1da136d

Address a few more items found in review

b1eb941

mdtoguchi requested a review from asudarsa July 21, 2023 01:32

AlexeySachkov mentioned this pull request Jul 24, 2023

opencl-aot fails to compile SYCL kernels with an unsupported subgroup size #10531

Open

asudarsa approved these changes Aug 1, 2023

View reviewed changes

asudarsa reviewed Aug 1, 2023

View reviewed changes

sycl/doc/design/OffloadDesign.md Outdated Show resolved Hide resolved

mdtoguchi added 3 commits August 1, 2023 17:54

Adjust option name for library location

4cb67a5

Merge remote-tracking branch 'intel_llvm/sycl' into offload-design

cc8fc05

Merge branch 'sycl' into offload-design

e7ad9f7

aelovikov-intel merged commit 41b02b1 into intel:sycl Aug 22, 2023

		@@ -0,0 +1,326 @@
		# Implementation design for offloading model

-The device libraries that are linked in is provided by the driver.  The driver
-is responsible for letting the `clang-linker-wrapper` know what device libraries
-are required to be linked in as well as the location.
+A list of device libraries that need to linked in with user code is provided by the driver.  The driver is also responsible for letting the `clang-linker-wrapper` know the location of the device libraries.

	\| `--device-library-location=<arg>` \| The location in which the device libraries reside to be used during compilation \|
	\| `--device-library-location=<arg>` \| The location in which the device libraries reside \|

	Table: Options to control device libraries
	Table: Options to pass device libraries to the clang-linker-wrapper

-The device libraries are controlled via the `-fno-sycl-device-lib=arg` option
-where the driver determines based on this option which libraries to tell the
-linker wrapper to pull in.
+The driver also passes `-fno-sycl-device-lib=arg` option to the clang-linker-wrapper. This option is used to determine the exact set of device libraries that need to be pulled in.

[SYCL][Docs] Add design for moving to the new offloading model #8658

[SYCL][Docs] Add design for moving to the new offloading model #8658

Uh oh!

Conversation

mdtoguchi commented Mar 15, 2023

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mdtoguchi commented Mar 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artemrad left a comment

Choose a reason for hiding this comment

Uh oh!

mdtoguchi commented Apr 11, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajaykumarkannan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmlueck May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

gmlueck May 16, 2023 •

edited

Loading