[Doc] Add design doc for dynamic linking of device code feature #3210

Fznamznon · 2021-02-12T13:43:42Z

No description provided.

sycl/doc/SharedLibraries.md

pvchupin · 2021-02-12T20:24:21Z

@kbobrovs, please review

kbobrovs

part I of review

sycl/doc/SharedLibraries.md

Co-authored-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com> Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>

sycl/doc/SharedLibraries.md

Fznamznon · 2021-03-05T16:17:22Z

@romanovvlad, @vladimirlaz, I'd like to get some review for runtime and cache sections, feel free to waterfall it to someone from runtime team, if you think the know related parts of runtime better.

sycl/doc/SharedLibraries.md

vladimirlaz · 2021-03-10T09:39:58Z

sycl/doc/SharedLibraries.md

+Programs that contain only `SYCL_EXTERNAL` functions will be cached only in
+compiled state, so they can be linked with other programs during dependency
+resolution.


Suggested change

Programs that contain only `SYCL_EXTERNAL` functions will be cached only in

compiled state, so they can be linked with other programs during dependency

resolution.

Programs that contain only `SYCL_EXTERNAL` functions will be cached only when exported

functions are used by images with kernels and identified by `main` image Id.

So once two kernel from different images use functions from the same `library` image it will

be duplicated in cache after linking with main programs during dependency

resolution.

Okay, there can be two cases when SYCL_EXTERNAL-only images are cached in in-memory cache:

Inside the main image, i.e. after it is linked with all its dependencies (the case that you described in your suggestion)

Non-linked SYCL_EXTERNAL-only images in compiled state. It can be done when some "main" image requested some SYCL_EXTERNAL-only image as dependency, we created a program for SYCL_EXTERNAL-only image, compiled it and stored in cache, so we don't have to do it again when some other main requests the same SYCL_EXTERNAL image. (the case that I tried to describe here). The idea was taken from existing cache for standard device libraries (see

llvm/sycl/source/detail/context_impl.hpp

Line 177 in 86716c5

std::map<std::pair<DeviceLibExt, RT::PiDevice>, RT::PiProgram>

).

Does it make sense?

vladimirlaz · 2021-03-10T09:50:57Z

sycl/doc/SharedLibraries.md

+In case when "main" image have imports information, device image hash should be
+created from all device images that are necessary to build it, i.e. hash out
+of "main" device image and set of 'SYCL_EXTERNAL'-only images that define all
+symbols imported by "main device image.


Current approach assume storing final binary after linking it means:

duplication of library images in every main program;

on change of library all main images in persistent cache are invalidated and JIT is initiated.
May be it is reasonable to store library images in cache separately from main image? It can be done both for in-memory and persistent or only for persistent cache depending on weight of link operation for binary programs.

May be it is reasonable to store library images in cache separately from main image? It can be done both for in-memory and persistent or only for persistent cache depending on weight of link operation for binary programs.

Do you mean compile to binary "main" image and "library" image and store them on disk separately? It sounds reasonable (the real dynamic libraries actually work this way), however I don't think we have devices that support linking of native device binaries yet, so right now It doesn't seem possible.

sycl/doc/SharedLibraries.md

Co-authored-by: vladimirlaz <vladimir.lazarev@intel.com>

Fznamznon · 2021-03-24T13:38:03Z

@kbobrovs @AlexeySachkov , could you please take a look again?

sycl/doc/SharedLibraries.md

kbobrovs

Thanks! LGTM. We can add more __SYCL_PI_DEVICE_BINARY_TARGET* formats if we want to distinguish between flavors of e.g. spirv64_gen native format, I assume.

smaslov-intel

LGTM.

Fznamznon · 2021-05-19T18:47:43Z

@s-kanaev , @AlexeySachkov , could you please take a look and give an approve again? There was some changes.

Expand the design to include the case when device functions are exported from a shared library, which is a new feature proposed in intel#3210.

Fznamznon · 2021-05-24T09:58:57Z

Ping also @gmlueck , I remember your requested to state requirements to backends, I did it under PI section. Please take a look.

gmlueck

Ping also @gmlueck , I remember your requested to state requirements to backends, I did it under PI section. Please take a look.

Thanks for adding that.

I added a smaller comment asking which exception we throw when there is a link error, but I approved anyway. Please respond, though about the exception. I also added a comment about the cache, but we might choose to wait on that if we are planning to restructure the cache later.

gmlueck · 2021-05-24T15:08:02Z

sycl/doc/SharedLibraries.md

+   {2, ...} => program 1
+```
+However the library code will be compiled twice if kernel from the library
+was enqueued before kernels from the application, i.e. in such case:


If you wanted to address this weakness (about compiling twice), I think you could do so by adding a "program state" to cache key. To see how this would work, first consider a simple case where there are no shared libraries. Here are the PI operations that are required:

pi_program1 = piProgramCreate(/*spir-v from OSMod 1*/); /* possibly set values of spec constants */ piProgramCompile(pi_program1, dev, opts); pi_program2 = piProgramLink(dev, opts, {pi_program1});

Note that we're calling piProgramCompile() followed by piProgramLink(), rather than just calling piProgramBuild(). The reason for doing this will become clear later.

Since we are adding a "program state" to the cache key, we can cache both pi_program1 and pi_program2: The result of piProgramCompile() is an object state program image, while the result of piProgramLink() is an executable state program image. Here are the entries in the cache:

[OSMod 1, spec consts, opts, dev, object] => pi_program1 [OSMod 1, spec consts, opts, dev, executable] => pi_program2

I think this is the same as what you propose, except I'm adding the "program state" to the cache key and I've cached pi_program1 in addition to pi_program2.

Any future request to run a kernel from OS module 1 on the same device with the same spec constants and the same build options will search the cache for [OSMod 1, spec consts, dev, executable], and it will find the pi_program2 that we already built.

Now let's consider the case where there is device code in a shared library that needs to be online linked. Using your example above, here are the PI operations that need to happen:

pi_program1 = piProgramCreate(/*spir-v from OSMod 1*/); /* possibly set values of spec constants */ piProgramCompile(pi_program1, dev, opts); pi_program2 = piProgramCreate(/*spir-v from OSMod 2*/); /* possibly set values of spec constants */ piProgramCompile(pi_program2, dev, opts); pi_program3 = piProgramLink(dev, opts, {pi_program1, pi_program2});

And we can put the following items in the cache:

[OSMod 1, spec consts, opts, dev, object] => pi_program1 [OSMod 2, spec consts, opts, dev, object] => pi_program2 [OSMod 1, spec consts, opts, dev, executable] => pi_program3 [OSMod 2, spec consts, opts, dev, executable] => pi_program3

Again, I think this is the same as what you propose except I have the new "program state" element in the key, and I'm caching some more pi_program objects.

If there is a subsequent attempt to submit ExternalKernel, the program manager will see that ExternalKernel comes from OSMod 2, and it will find pi_program3 in the cache. Again, this is exactly the same as your proposal.

Now, let's consider the case when ExternalKernel is submitted first. The PI operations are:

pi_program1 = piProgramCreate(/*spir-v from OSMod 2*/); /* possibly set values of spec constants */ piProgramCompile(pi_program1, dev, opts); pi_program2 = piProgramLink(dev, opts, {pi_program1});

And the cache will be:

[OSMod 2, spec consts, opts, dev, object] => pi_program1 [OSMod 2, spec consts, opts, dev, executable] => pi_program2

Now, the application submits InternalKernel, and the PI operations are:

pi_program3 = piProgramCreate(/*spir-v from OSMod 1*/); /* possibly set values of spec constants */ piProgramCompile(pi_program3, dev, opts); pi_program1 = /* found in cache: [OSMod 2, spec consts, opts, dev, object]*/; pi_program4 = piProgramLink(dev, opts, {pi_program3, pi_program1});

Notice that the redundant compilation of "OSMod 2" is eliminated because the result was in the cache.

The idea of storing mid-state (compiled) programs in the cache sounds good.
Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one? If so, the idea is quite feasible.

One of the concerns here might be an overhead of storing object programs in the cache and performing a build with two steps instead of single one. The overhead here is only a slight bit of memory consumption for map plus some time for inserting another cache entry into map tree. As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend. On the other hand, this overhead (if any) is sort of "one-shot" one.

Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one?

piProgramLink creates a new program object, but piProgramCompile does not. The logic I propose above only depends on piProgramLink creating a new program object.

As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend.

Yes, this is a good point. Looking at the implementation of piProgramLink in the Level Zero PI plugin, I think that code should check for the case when there is only one input. We can optimize that case because it is not necessary to call zeModuleDynamicLink.

For OpenCL, we need to ask the team that implements the OpenCL driver whether it is much more efficient to call clBuildProgram vs. calling both clCompileProgram and clLinkProgram.

Optimization of piProgramCompile & piProgramLink seems a bit out of the scope of this PR. I might include an improvement with object state programs in the cache though. Should I?

I do not have a strong opinion on expanding the cache to include object state as part of this design vs. doing it as a separate task. If you decide not to do it as part of this design, I think you should enter a Jira to make sure it does happen at some point, though. If you do this, the Jira could contain a link to this conversation (or copy the conversation into the Jira).

I think you should not split the calls to piProgramBuild() into separate piProgramCompile() / piProgramLink() calls unless you check that this won't introduce a performance regression. The fix I mention above to the Level Zero implementation of piProgramLink() is probably a very small 3-liner. You should also ask the OpenCL team about the efficiency of calling clBuildProgram vs. calling both clCompileProgram and clLinkProgram, though.

gmlueck · 2021-05-24T15:13:17Z

sycl/doc/SharedLibraries.md

+image. If they match some imported symbols then these matched symbols will be
+marked as resolved. The procedure repeats until all imported symbols are marked
+as resolved. In case all available device images are viewed, but some imported
+symbols remain unresolved, exception will be thrown.


What exception will be thrown? If we want to use one of the existing exception codes, I think it should either be errc::runtime or errc::build. (See Table 136 in the SYCL 2020 spec.) I think I have a slight preference for errc::runtime, so that we reserve errc::build for exceptions that happen when the application specifically asks to online compile or link a kernel bundle.

errc::build seems more suitable for me, since usually OpenCL build APIs emit build error when incoming program has unresolved symbols. In addition, in regular c++ errors about unresolved symbols are usually emitted during build time. However this is not a strong preference, if we want to use errc::runtime, that is ok for me.
@s-kanaev , @romanovvlad , any opinion?

I was just looking at the DPC++ code in program_manager. It looks like we currently throw compile_program_error if there is a JIT-time compilation error for a kernel. This is most similar to the SYCL 2020 errc::build exception code, so I presume that is what we will throw in the future. Therefore, it probably does make sense to throw errc::build if there are unresolved symbols.

errc::build sounds suitable here as we tried and failed to link several programs (which is part of build process). It's a pity the spec doesn't have link error code which would fit here even better.

s-kanaev

Didn't finish the review yet.

s-kanaev · 2021-05-25T11:43:43Z

sycl/doc/SharedLibraries.md

+allows to link device code dynamically at runtime, such as in the scenarios
+above.
+
+## Requirements:


NIT:

Suggested change

## Requirements:

## Requirements

s-kanaev · 2021-05-25T12:57:40Z

sycl/doc/SharedLibraries.md

+
+### sycl-post-link changes
+
+To support dynamic linking of device code , `sycl-post-link` performs 2 main


NIT:

Suggested change

To support dynamic linking of device code , `sycl-post-link` performs 2 main

In order to support dynamic linking of device code, `sycl-post-link` performs 2 main

s-kanaev · 2021-05-25T13:26:55Z

sycl/doc/SharedLibraries.md

+Mapping of extension strings and formats that can be linked:
+| Device image format | Extension string | Meaning |
+|---------------------|------------------|---------|
+| __SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64 | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported|


s-kanaev · 2021-05-26T07:06:55Z

sycl/doc/SharedLibraries.md

+  - AOT compilers must allow to compile SPIR-V modules with unresolved symbols
+  and produce device code in format that can be linked in run time and allows
+  to reduce JIT overhead
+  - OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]


NIT:

Suggested change

- OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]

- OpenCL program binary type `CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]`

s-kanaev · 2021-05-26T07:10:36Z

sycl/doc/SharedLibraries.md

+image. If they match some imported symbols then these matched symbols will be
+marked as resolved. The procedure repeats until all imported symbols are marked
+as resolved. In case all available device images are viewed, but some imported
+symbols remain unresolved, exception will be thrown.


errc::build sounds suitable here as we tried and failed to link several programs (which is part of build process). It's a pity the spec doesn't have link error code which would fit here even better.

s-kanaev · 2021-05-26T07:27:49Z

sycl/doc/SharedLibraries.md

+SYCL_EXTERNAL void LibFunc();
+
+Q.submit([&](cl::sycl::handler &CGH) {
+CGH.parallel_for<InternalKernel>( ... )


NIT

Suggested change

CGH.parallel_for<InternalKernel>( ... )

CGH.parallel_for<InternalKernel>( ... )

s-kanaev · 2021-05-26T07:28:00Z

sycl/doc/SharedLibraries.md

+    // 2. Prepared program is used to enqueue kernel
+
+Q.submit([&](cl::sycl::handler &CGH) {
+handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel


NIT:

Suggested change

handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel

handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel

s-kanaev · 2021-05-26T07:33:59Z

sycl/doc/SharedLibraries.md

+```
+I.e. each kernel name is mapped to a set of tuples that consists of OS module,
+spec constant values, JIT compiler options and device. Then concrete tuple is
+mapped to a program object. Several tuples can be mapped to a same program


Suggested change

mapped to a program object. Several tuples can be mapped to a same program

mapped to a program object. Several tuples can be mapped to the same program

s-kanaev · 2021-05-26T07:35:01Z

sycl/doc/SharedLibraries.md

+I.e. each kernel name is mapped to a set of tuples that consists of OS module,
+spec constant values, JIT compiler options and device. Then concrete tuple is
+mapped to a program object. Several tuples can be mapped to a same program
+object, they are created during process of compilation and symbols resolution


"they" refers to programs here.

Suggested change

object, they are created during process of compilation and symbols resolution

object. These tuples are created during process of compilation and symbols resolution

Or

Suggested change

object, they are created during process of compilation and symbols resolution

object. The program duplicating tuples are created during process of compilation and symbols resolution

s-kanaev · 2021-05-26T07:41:56Z

sycl/doc/SharedLibraries.md

+for concrete device image. When some program is made through linking of several
+programs created from device images that come from different OS modules,
+for each OS module in cache will be created a tuple with corresponding OS module
+id.


Suggested change

for concrete device image. When some program is made through linking of several

programs created from device images that come from different OS modules,

for each OS module in cache will be created a tuple with corresponding OS module

id.

for concrete device image. When some program is result of linking several programs from device images with different OS modules, a tuple is created for each OS module ID. These tuples are used as nested cache entries after kernel name.

s-kanaev · 2021-05-26T07:52:52Z

sycl/doc/SharedLibraries.md

+   {2, ...} => program 1
+```
+However the library code will be compiled twice if kernel from the library
+was enqueued before kernels from the application, i.e. in such case:


The idea of storing mid-state (compiled) programs in the cache sounds good.
Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one? If so, the idea is quite feasible.

One of the concerns here might be an overhead of storing object programs in the cache and performing a build with two steps instead of single one. The overhead here is only a slight bit of memory consumption for map plus some time for inserting another cache entry into map tree. As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend. On the other hand, this overhead (if any) is sort of "one-shot" one.

s-kanaev · 2021-05-26T07:59:34Z

sycl/doc/SharedLibraries.md

+In case when "main" image have imports information, device image hash should be
+created from all device images that are necessary to build it, i.e. hash out
+of "main" device image and set of images that define all
+symbols imported by "main" device image.


How the hash of several device images is obtained?
Is it obtained from a string which is result of appending the device images?
Then, I believe, order of device images should be defined and persistent across runs of the same application.

That is a good point. With the current scheme we cannot guarantee persistent order of device images. We could actually sort device images by some feature (like size, set of defined symbols and etc.) before creating a hash string. But the algorithm of search itself doesn't prevent using of different images to resolve dependencies, since it is assumed that the same symbol defined in several device images of the same format should have the same definition. However when some other device image is chosen to resolve dependency, it won't be technically the same program as the result of linking.
So, does sorting of device images before creation of hash string sounds ok?

So, does sorting of device images before creation of hash string sounds ok?

In order to get cross-run persistent order of device images they should be sorted. That's right. Size can't work as sorting feature as it may be equal for two arbitrary images. Having stable sort algorithm won't help here as the input order of images isn't persistent across multiple runs.
// having even low probability of image size matching makes us reject this idea

Set of defined symbols sounds good enough. Though, using OS module ID (a mere number) as sorting feature is of more optimized way.

Though, using OS module ID (a mere number) as sorting feature is of more optimized way.

Do you mean OS module handle? Isn't just a pointer?

llvm/sycl/include/CL/sycl/detail/os_util.hpp

Line 48 in 7d5ee05

using OSModuleHandle = intptr_t;

I'm not really sure that I understand how getting of OS module handle works, but can it change from run to run?

Do you mean OS module handle? Isn't just a pointer?

My bad, then.

@s-kanaev , do you mind if I make this addition and apply other NIT comments from you in the follow-up patch? I think it will take very much time to get all the approvals again.

I don't mind at all.

Fznamznon · 2021-05-31T08:04:25Z

@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch.

bader · 2021-05-31T08:58:29Z

sycl/doc/SharedLibraries.md

+std::cout << out[i] << “ “;
+
+// lib.cpp
+int SYCL_EXTERNAL LibDeviceFunc(int i) {


Suggested change

int SYCL_EXTERNAL LibDeviceFunc(int i) {

SYCL_EXTERNAL int LibDeviceFunc(int i) {0

To align with the declaration from the app.cpp. I hope SYCL_EXTERNAL position is not important for dynamic linking.

bader · 2021-05-31T08:59:50Z

sycl/doc/SharedLibraries.md

+clang++ -fsycl lib.cpp -shared -o helpers.so
+clang++ -fsycl app.cpp -lhelpers -o a.out
+./a.out
+Output: 0 2 4 6…


Suggested change

Output: 0 2 4 6…

Output: 0 2 4 6 ...

Let's avoid using non-ASCII symbols (just in case).

bader · 2021-05-31T14:58:18Z

sycl/doc/SharedLibraries.md

+above.
+
+## Requirements:
+User's device code that consists of some device API (`SYCL_EXTERNAL` functions),


Suggested change

User's device code that consists of some device API (`SYCL_EXTERNAL` functions),

User's device code

bader · 2021-05-31T15:01:12Z

sycl/doc/SharedLibraries.md

+
+## Requirements:
+User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
+is compiled into some form and it is not linked statically with device code of


Suggested change

is compiled into some form and it is not linked statically with device code of

can be compiled into some form and not linked statically with device code of

bader · 2021-05-31T15:02:32Z

sycl/doc/SharedLibraries.md

+## Requirements:
+User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
+is compiled into some form and it is not linked statically with device code of
+application. It can be a shared library with embedded device image or a


Suggested change

application. It can be a shared library with embedded device image or a

application. It can be embedded as a device image into a shared library or

bader · 2021-05-31T15:03:16Z

sycl/doc/SharedLibraries.md

+User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
+is compiled into some form and it is not linked statically with device code of
+application. It can be a shared library with embedded device image or a
+separate device image supplied with properties attached. This code is linked


Suggested change

separate device image supplied with properties attached. This code is linked

supplied as a separate device image with attached properties. This code is linked

bader · 2021-05-31T16:54:45Z

@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch.

I've started reading the document, but I realized that I'd like to re-reword/clarify quite a lot. I'm okay to merge this version and continue improving it, but I think we should make it clear for readers.
Could you add a note to the document that it's not final version, please?

Fznamznon · 2021-05-31T17:15:40Z

@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch.

I've started reading the document, but I realized that I'd like to re-reword/clarify quite a lot. I'm okay to merge this version and continue improving it, but I think we should make it clear for readers.
Could you add a note to the document that it's not final version, please?

Sure, added in 7b60419 .

bader · 2021-06-02T08:20:26Z

@Fznamznon, this change breaks Doxygen documentation build - https://github.com/intel/llvm/runs/2715240192?check_suite_focus=true.

Warning, treated as error:
/home/runner/work/llvm/llvm/repo/sycl/doc/SharedLibraries.md:document isn't included in any toctree

Please, fix it ASAP.
@alexbatashev, FYI.

Fznamznon · 2021-06-02T08:49:12Z

@Fznamznon, this change breaks Doxygen documentation build - https://github.com/intel/llvm/runs/2715240192?check_suite_focus=true.
Warning, treated as error:
/home/runner/work/llvm/llvm/repo/sycl/doc/SharedLibraries.md:document isn't included in any toctree
Please, fix it ASAP.
@alexbatashev, FYI.

Looking into it.

[Doc] Add design doc for shared device libraries feature

a27bcbf

Fznamznon added the Documentation Missing documentation for the code, compiler or runtime features, etc. label Feb 12, 2021

Fznamznon requested review from AlexeySachkov and kbobrovs February 12, 2021 13:43

Fznamznon requested a review from pvchupin as a code owner February 12, 2021 13:43

AlexeySachkov reviewed Feb 12, 2021

View reviewed changes

sycl/doc/SharedLibraries.md Outdated Show resolved Hide resolved

sycl/doc/SharedLibraries.md Outdated Show resolved Hide resolved

sycl/doc/SharedLibraries.md Outdated Show resolved Hide resolved

sycl/doc/SharedLibraries.md Show resolved Hide resolved

kbobrovs reviewed Feb 14, 2021

View reviewed changes

AGindinson self-requested a review February 14, 2021 14:40

kbobrovs reviewed Feb 14, 2021

View reviewed changes

sycl/doc/SharedLibraries.md Outdated Show resolved Hide resolved

sycl/doc/SharedLibraries.md Show resolved Hide resolved

sycl/doc/SharedLibraries.md Show resolved Hide resolved

sycl/doc/SharedLibraries.md Show resolved Hide resolved

Fznamznon and others added 2 commits February 18, 2021 10:24

Apply suggestions from code review

6fe222d

Co-authored-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com> Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>

Apply minor comment, fix a typo

604909c

Fznamznon commented Mar 3, 2021

View reviewed changes

Rename the feature, add runtime section

df953fc

Fznamznon changed the title ~~[Doc] Add design doc for shared device libraries feature~~ [Doc] Add design doc for dynamic linking of device code feature Mar 5, 2021

Apply suggestions from code review

702e1a4

Fznamznon requested review from romanovvlad and vladimirlaz March 5, 2021 16:14

Fznamznon requested review from kbobrovs and AlexeySachkov March 5, 2021 17:15

vladimirlaz reviewed Mar 10, 2021

View reviewed changes

keryell reviewed Mar 11, 2021

View reviewed changes

sycl/doc/SharedLibraries.md Outdated Show resolved Hide resolved

Apply suggestions from code review

60054b1

Co-authored-by: vladimirlaz <vladimir.lazarev@intel.com>

Fznamznon requested a review from gmlueck April 5, 2021 07:46

gmlueck reviewed Apr 5, 2021

View reviewed changes

sycl/doc/SharedLibraries.md Outdated Show resolved Hide resolved

sycl/doc/SharedLibraries.md Show resolved Hide resolved

sycl/doc/SharedLibraries.md Show resolved Hide resolved

Fznamznon added 2 commits April 9, 2021 17:52

Do not separate SYCL_EXTERNAL functions from kernels

459730b

Merge remote-tracking branch 'fork/shlibs-doc' into shlibs-doc

7f95079

Fznamznon requested a review from s-kanaev April 9, 2021 15:01

Fznamznon requested a review from kbobrovs April 26, 2021 10:55

Modify PI section

8ba2c92

Fznamznon requested a review from bader as a code owner May 18, 2021 09:17

kbobrovs previously approved these changes May 19, 2021

View reviewed changes

smaslov-intel previously approved these changes May 19, 2021

View reviewed changes

gmlueck added a commit to gmlueck/llvm that referenced this pull request May 21, 2021

Expand design to include exported device functions

cda2d8f

Expand the design to include the case when device functions are exported from a shared library, which is a new feature proposed in intel#3210.

bader requested a review from AlexeySachkov May 24, 2021 09:13

AlexeySachkov previously approved these changes May 24, 2021

View reviewed changes

gmlueck previously approved these changes May 24, 2021

View reviewed changes

s-kanaev reviewed May 25, 2021

View reviewed changes

s-kanaev reviewed May 26, 2021

View reviewed changes

bader reviewed May 31, 2021

View reviewed changes

Add a note that it is not a final version

7b60419

Fznamznon dismissed stale reviews from gmlueck, AlexeySachkov, smaslov-intel, and kbobrovs via 7b60419 May 31, 2021 17:14

bader merged commit dbc6b57 into intel:sycl May 31, 2021

romanovvlad mentioned this pull request Jun 23, 2021

[SYCL] Implementation of fallback assert #3767

Merged


		### sycl-post-link changes

		To support dynamic linking of device code , `sycl-post-link` performs 2 main

	To support dynamic linking of device code , `sycl-post-link` performs 2 main
	In order to support dynamic linking of device code, `sycl-post-link` performs 2 main

	\| __SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64 \| "pi_ext_spirv64_linking" \| Linking of SPIR-V 64-bit programs is supported\|
	\| `__SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64` \| "pi_ext_spirv64_linking" \| Linking of SPIR-V 64-bit programs is supported\|

	- OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]
	- OpenCL program binary type `CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]`

	CGH.parallel_for<InternalKernel>( ... )
	CGH.parallel_for<InternalKernel>( ... )

	handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel
	handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel

	mapped to a program object. Several tuples can be mapped to a same program
	mapped to a program object. Several tuples can be mapped to the same program

	object, they are created during process of compilation and symbols resolution
	object. These tuples are created during process of compilation and symbols resolution

	object, they are created during process of compilation and symbols resolution
	object. The program duplicating tuples are created during process of compilation and symbols resolution

	int SYCL_EXTERNAL LibDeviceFunc(int i) {
	SYCL_EXTERNAL int LibDeviceFunc(int i) {0

	User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
	User's device code

	is compiled into some form and it is not linked statically with device code of
	can be compiled into some form and not linked statically with device code of

	application. It can be a shared library with embedded device image or a
	application. It can be embedded as a device image into a shared library or

	separate device image supplied with properties attached. This code is linked
	supplied as a separate device image with attached properties. This code is linked

[Doc] Add design doc for dynamic linking of device code feature #3210

[Doc] Add design doc for dynamic linking of device code feature #3210

Uh oh!

Conversation

Fznamznon commented Feb 12, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pvchupin commented Feb 12, 2021

Uh oh!

kbobrovs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fznamznon commented Mar 5, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fznamznon commented Mar 24, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kbobrovs left a comment

Choose a reason for hiding this comment

Uh oh!

smaslov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Fznamznon commented May 19, 2021

Uh oh!

Fznamznon commented May 24, 2021

Uh oh!

gmlueck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment