Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add design doc for dynamic linking of device code feature #3210

Merged
merged 19 commits into from
May 31, 2021

Conversation

Fznamznon
Copy link
Contributor

No description provided.

@Fznamznon Fznamznon added the Documentation Missing documentation for the code, compiler or runtime features, etc. label Feb 12, 2021
@Fznamznon Fznamznon requested a review from pvchupin as a code owner February 12, 2021 13:43
@pvchupin
Copy link
Contributor

@kbobrovs, please review

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part I of review

@AGindinson AGindinson self-requested a review February 14, 2021 14:40
Fznamznon and others added 2 commits February 18, 2021 10:24
Co-authored-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com>
Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
@Fznamznon Fznamznon changed the title [Doc] Add design doc for shared device libraries feature [Doc] Add design doc for dynamic linking of device code feature Mar 5, 2021
@Fznamznon
Copy link
Contributor Author

@romanovvlad, @vladimirlaz, I'd like to get some review for runtime and cache sections, feel free to waterfall it to someone from runtime team, if you think the know related parts of runtime better.

Comment on lines 267 to 269
Programs that contain only `SYCL_EXTERNAL` functions will be cached only in
compiled state, so they can be linked with other programs during dependency
resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Programs that contain only `SYCL_EXTERNAL` functions will be cached only in
compiled state, so they can be linked with other programs during dependency
resolution.
Programs that contain only `SYCL_EXTERNAL` functions will be cached only when exported
functions are used by images with kernels and identified by `main` image Id.
So once two kernel from different images use functions from the same `library` image it will
be duplicated in cache after linking with main programs during dependency
resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, there can be two cases when SYCL_EXTERNAL-only images are cached in in-memory cache:

  • Inside the main image, i.e. after it is linked with all its dependencies (the case that you described in your suggestion)
  • Non-linked SYCL_EXTERNAL-only images in compiled state. It can be done when some "main" image requested some SYCL_EXTERNAL-only image as dependency, we created a program for SYCL_EXTERNAL-only image, compiled it and stored in cache, so we don't have to do it again when some other main requests the same SYCL_EXTERNAL image. (the case that I tried to describe here). The idea was taken from existing cache for standard device libraries (see
    std::map<std::pair<DeviceLibExt, RT::PiDevice>, RT::PiProgram>
    ).

Does it make sense?

In case when "main" image have imports information, device image hash should be
created from all device images that are necessary to build it, i.e. hash out
of "main" device image and set of 'SYCL_EXTERNAL'-only images that define all
symbols imported by "main device image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current approach assume storing final binary after linking it means:

  • duplication of library images in every main program;
  • on change of library all main images in persistent cache are invalidated and JIT is initiated.
    May be it is reasonable to store library images in cache separately from main image? It can be done both for in-memory and persistent or only for persistent cache depending on weight of link operation for binary programs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it is reasonable to store library images in cache separately from main image? It can be done both for in-memory and persistent or only for persistent cache depending on weight of link operation for binary programs.

Do you mean compile to binary "main" image and "library" image and store them on disk separately? It sounds reasonable (the real dynamic libraries actually work this way), however I don't think we have devices that support linking of native device binaries yet, so right now It doesn't seem possible.

Co-authored-by: vladimirlaz <vladimir.lazarev@intel.com>
@Fznamznon
Copy link
Contributor Author

@kbobrovs @AlexeySachkov , could you please take a look again?

@Fznamznon Fznamznon requested a review from gmlueck April 5, 2021 07:46
@Fznamznon Fznamznon requested a review from s-kanaev April 9, 2021 15:01
@Fznamznon Fznamznon requested a review from kbobrovs April 26, 2021 10:55
@Fznamznon Fznamznon requested a review from bader as a code owner May 18, 2021 09:17
kbobrovs
kbobrovs previously approved these changes May 19, 2021
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. We can add more __SYCL_PI_DEVICE_BINARY_TARGET* formats if we want to distinguish between flavors of e.g. spirv64_gen native format, I assume.

smaslov-intel
smaslov-intel previously approved these changes May 19, 2021
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Fznamznon
Copy link
Contributor Author

@s-kanaev , @AlexeySachkov , could you please take a look and give an approve again? There was some changes.

gmlueck added a commit to gmlueck/llvm that referenced this pull request May 21, 2021
Expand the design to include the case when device functions are
exported from a shared library, which is a new feature proposed
in intel#3210.
@bader bader requested a review from AlexeySachkov May 24, 2021 09:13
@Fznamznon
Copy link
Contributor Author

Ping also @gmlueck , I remember your requested to state requirements to backends, I did it under PI section. Please take a look.

AlexeySachkov
AlexeySachkov previously approved these changes May 24, 2021
gmlueck
gmlueck previously approved these changes May 24, 2021
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping also @gmlueck , I remember your requested to state requirements to backends, I did it under PI section. Please take a look.

Thanks for adding that.

I added a smaller comment asking which exception we throw when there is a link error, but I approved anyway. Please respond, though about the exception. I also added a comment about the cache, but we might choose to wait on that if we are planning to restructure the cache later.

{2, ...} => program 1
```
However the library code will be compiled twice if kernel from the library
was enqueued before kernels from the application, i.e. in such case:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to address this weakness (about compiling twice), I think you could do so by adding a "program state" to cache key. To see how this would work, first consider a simple case where there are no shared libraries. Here are the PI operations that are required:

pi_program1 = piProgramCreate(/*spir-v from OSMod 1*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program1, dev, opts);
pi_program2 = piProgramLink(dev, opts, {pi_program1});

Note that we're calling piProgramCompile() followed by piProgramLink(), rather than just calling piProgramBuild(). The reason for doing this will become clear later.

Since we are adding a "program state" to the cache key, we can cache both pi_program1 and pi_program2: The result of piProgramCompile() is an object state program image, while the result of piProgramLink() is an executable state program image. Here are the entries in the cache:

[OSMod 1, spec consts, opts, dev, object] => pi_program1
[OSMod 1, spec consts, opts, dev, executable] => pi_program2

I think this is the same as what you propose, except I'm adding the "program state" to the cache key and I've cached pi_program1 in addition to pi_program2.

Any future request to run a kernel from OS module 1 on the same device with the same spec constants and the same build options will search the cache for [OSMod 1, spec consts, dev, executable], and it will find the pi_program2 that we already built.

Now let's consider the case where there is device code in a shared library that needs to be online linked. Using your example above, here are the PI operations that need to happen:

pi_program1 = piProgramCreate(/*spir-v from OSMod 1*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program1, dev, opts);

pi_program2 = piProgramCreate(/*spir-v from OSMod 2*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program2, dev, opts);

pi_program3 = piProgramLink(dev, opts, {pi_program1, pi_program2});

And we can put the following items in the cache:

[OSMod 1, spec consts, opts, dev, object] => pi_program1
[OSMod 2, spec consts, opts, dev, object] => pi_program2
[OSMod 1, spec consts, opts, dev, executable] => pi_program3
[OSMod 2, spec consts, opts, dev, executable] => pi_program3

Again, I think this is the same as what you propose except I have the new "program state" element in the key, and I'm caching some more pi_program objects.

If there is a subsequent attempt to submit ExternalKernel, the program manager will see that ExternalKernel comes from OSMod 2, and it will find pi_program3 in the cache. Again, this is exactly the same as your proposal.

Now, let's consider the case when ExternalKernel is submitted first. The PI operations are:

pi_program1 = piProgramCreate(/*spir-v from OSMod 2*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program1, dev, opts);
pi_program2 = piProgramLink(dev, opts, {pi_program1});

And the cache will be:

[OSMod 2, spec consts, opts, dev, object] => pi_program1
[OSMod 2, spec consts, opts, dev, executable] => pi_program2

Now, the application submits InternalKernel, and the PI operations are:

pi_program3 = piProgramCreate(/*spir-v from OSMod 1*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program3, dev, opts);

pi_program1 = /* found in cache: [OSMod 2, spec consts, opts, dev, object]*/;

pi_program4 = piProgramLink(dev, opts, {pi_program3, pi_program1});

Notice that the redundant compilation of "OSMod 2" is eliminated because the result was in the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of storing mid-state (compiled) programs in the cache sounds good.
Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one? If so, the idea is quite feasible.

One of the concerns here might be an overhead of storing object programs in the cache and performing a build with two steps instead of single one. The overhead here is only a slight bit of memory consumption for map plus some time for inserting another cache entry into map tree. As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend. On the other hand, this overhead (if any) is sort of "one-shot" one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one?

piProgramLink creates a new program object, but piProgramCompile does not. The logic I propose above only depends on piProgramLink creating a new program object.

As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend.

Yes, this is a good point. Looking at the implementation of piProgramLink in the Level Zero PI plugin, I think that code should check for the case when there is only one input. We can optimize that case because it is not necessary to call zeModuleDynamicLink.

For OpenCL, we need to ask the team that implements the OpenCL driver whether it is much more efficient to call clBuildProgram vs. calling both clCompileProgram and clLinkProgram.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization of piProgramCompile & piProgramLink seems a bit out of the scope of this PR. I might include an improvement with object state programs in the cache though. Should I?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong opinion on expanding the cache to include object state as part of this design vs. doing it as a separate task. If you decide not to do it as part of this design, I think you should enter a Jira to make sure it does happen at some point, though. If you do this, the Jira could contain a link to this conversation (or copy the conversation into the Jira).

I think you should not split the calls to piProgramBuild() into separate piProgramCompile() / piProgramLink() calls unless you check that this won't introduce a performance regression. The fix I mention above to the Level Zero implementation of piProgramLink() is probably a very small 3-liner. You should also ask the OpenCL team about the efficiency of calling clBuildProgram vs. calling both clCompileProgram and clLinkProgram, though.

image. If they match some imported symbols then these matched symbols will be
marked as resolved. The procedure repeats until all imported symbols are marked
as resolved. In case all available device images are viewed, but some imported
symbols remain unresolved, exception will be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception will be thrown? If we want to use one of the existing exception codes, I think it should either be errc::runtime or errc::build. (See Table 136 in the SYCL 2020 spec.) I think I have a slight preference for errc::runtime, so that we reserve errc::build for exceptions that happen when the application specifically asks to online compile or link a kernel bundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errc::build seems more suitable for me, since usually OpenCL build APIs emit build error when incoming program has unresolved symbols. In addition, in regular c++ errors about unresolved symbols are usually emitted during build time. However this is not a strong preference, if we want to use errc::runtime, that is ok for me.
@s-kanaev , @romanovvlad , any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just looking at the DPC++ code in program_manager. It looks like we currently throw compile_program_error if there is a JIT-time compilation error for a kernel. This is most similar to the SYCL 2020 errc::build exception code, so I presume that is what we will throw in the future. Therefore, it probably does make sense to throw errc::build if there are unresolved symbols.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errc::build sounds suitable here as we tried and failed to link several programs (which is part of build process). It's a pity the spec doesn't have link error code which would fit here even better.

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't finish the review yet.

allows to link device code dynamically at runtime, such as in the scenarios
above.

## Requirements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
## Requirements:
## Requirements


### sycl-post-link changes

To support dynamic linking of device code , `sycl-post-link` performs 2 main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
To support dynamic linking of device code , `sycl-post-link` performs 2 main
In order to support dynamic linking of device code, `sycl-post-link` performs 2 main

Mapping of extension strings and formats that can be linked:
| Device image format | Extension string | Meaning |
|---------------------|------------------|---------|
| __SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64 | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
| __SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64 | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported|
| `__SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64` | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported|

- AOT compilers must allow to compile SPIR-V modules with unresolved symbols
and produce device code in format that can be linked in run time and allows
to reduce JIT overhead
- OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
- OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]
- OpenCL program binary type `CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]`

image. If they match some imported symbols then these matched symbols will be
marked as resolved. The procedure repeats until all imported symbols are marked
as resolved. In case all available device images are viewed, but some imported
symbols remain unresolved, exception will be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errc::build sounds suitable here as we tried and failed to link several programs (which is part of build process). It's a pity the spec doesn't have link error code which would fit here even better.

SYCL_EXTERNAL void LibFunc();

Q.submit([&](cl::sycl::handler &CGH) {
CGH.parallel_for<InternalKernel>( ... )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT

Suggested change
CGH.parallel_for<InternalKernel>( ... )
CGH.parallel_for<InternalKernel>( ... )

// 2. Prepared program is used to enqueue kernel

Q.submit([&](cl::sycl::handler &CGH) {
handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel
handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel

```
I.e. each kernel name is mapped to a set of tuples that consists of OS module,
spec constant values, JIT compiler options and device. Then concrete tuple is
mapped to a program object. Several tuples can be mapped to a same program
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mapped to a program object. Several tuples can be mapped to a same program
mapped to a program object. Several tuples can be mapped to the same program

I.e. each kernel name is mapped to a set of tuples that consists of OS module,
spec constant values, JIT compiler options and device. Then concrete tuple is
mapped to a program object. Several tuples can be mapped to a same program
object, they are created during process of compilation and symbols resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"they" refers to programs here.

Suggested change
object, they are created during process of compilation and symbols resolution
object. These tuples are created during process of compilation and symbols resolution

Or

Suggested change
object, they are created during process of compilation and symbols resolution
object. The program duplicating tuples are created during process of compilation and symbols resolution

Comment on lines +417 to +420
for concrete device image. When some program is made through linking of several
programs created from device images that come from different OS modules,
for each OS module in cache will be created a tuple with corresponding OS module
id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for concrete device image. When some program is made through linking of several
programs created from device images that come from different OS modules,
for each OS module in cache will be created a tuple with corresponding OS module
id.
for concrete device image. When some program is result of linking several programs from device images with different OS modules, a tuple is created for each OS module ID. These tuples are used as nested cache entries after kernel name.

{2, ...} => program 1
```
However the library code will be compiled twice if kernel from the library
was enqueued before kernels from the application, i.e. in such case:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of storing mid-state (compiled) programs in the cache sounds good.
Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one? If so, the idea is quite feasible.

One of the concerns here might be an overhead of storing object programs in the cache and performing a build with two steps instead of single one. The overhead here is only a slight bit of memory consumption for map plus some time for inserting another cache entry into map tree. As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend. On the other hand, this overhead (if any) is sort of "one-shot" one.

Comment on lines +474 to +477
In case when "main" image have imports information, device image hash should be
created from all device images that are necessary to build it, i.e. hash out
of "main" device image and set of images that define all
symbols imported by "main" device image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the hash of several device images is obtained?
Is it obtained from a string which is result of appending the device images?
Then, I believe, order of device images should be defined and persistent across runs of the same application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. With the current scheme we cannot guarantee persistent order of device images. We could actually sort device images by some feature (like size, set of defined symbols and etc.) before creating a hash string. But the algorithm of search itself doesn't prevent using of different images to resolve dependencies, since it is assumed that the same symbol defined in several device images of the same format should have the same definition. However when some other device image is chosen to resolve dependency, it won't be technically the same program as the result of linking.
So, does sorting of device images before creation of hash string sounds ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does sorting of device images before creation of hash string sounds ok?

In order to get cross-run persistent order of device images they should be sorted. That's right. Size can't work as sorting feature as it may be equal for two arbitrary images. Having stable sort algorithm won't help here as the input order of images isn't persistent across multiple runs.
// having even low probability of image size matching makes us reject this idea

Set of defined symbols sounds good enough. Though, using OS module ID (a mere number) as sorting feature is of more optimized way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, using OS module ID (a mere number) as sorting feature is of more optimized way.

Do you mean OS module handle? Isn't just a pointer?

using OSModuleHandle = intptr_t;

I'm not really sure that I understand how getting of OS module handle works, but can it change from run to run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean OS module handle? Isn't just a pointer?

My bad, then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-kanaev , do you mind if I make this addition and apply other NIT comments from you in the follow-up patch? I think it will take very much time to get all the approvals again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind at all.

@Fznamznon
Copy link
Contributor Author

@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch.

std::cout << out[i] << “ “;

// lib.cpp
int SYCL_EXTERNAL LibDeviceFunc(int i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int SYCL_EXTERNAL LibDeviceFunc(int i) {
SYCL_EXTERNAL int LibDeviceFunc(int i) {0

To align with the declaration from the app.cpp. I hope SYCL_EXTERNAL position is not important for dynamic linking.

clang++ -fsycl lib.cpp -shared -o helpers.so
clang++ -fsycl app.cpp -lhelpers -o a.out
./a.out
Output: 0 2 4 6…
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Output: 0 2 4 6
Output: 0 2 4 6 ...

Let's avoid using non-ASCII symbols (just in case).

above.

## Requirements:
User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
User's device code


## Requirements:
User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
is compiled into some form and it is not linked statically with device code of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is compiled into some form and it is not linked statically with device code of
can be compiled into some form and not linked statically with device code of

## Requirements:
User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
is compiled into some form and it is not linked statically with device code of
application. It can be a shared library with embedded device image or a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
application. It can be a shared library with embedded device image or a
application. It can be embedded as a device image into a shared library or

User's device code that consists of some device API (`SYCL_EXTERNAL` functions),
is compiled into some form and it is not linked statically with device code of
application. It can be a shared library with embedded device image or a
separate device image supplied with properties attached. This code is linked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
separate device image supplied with properties attached. This code is linked
supplied as a separate device image with attached properties. This code is linked

@bader
Copy link
Contributor

bader commented May 31, 2021

@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch.

I've started reading the document, but I realized that I'd like to re-reword/clarify quite a lot. I'm okay to merge this version and continue improving it, but I think we should make it clear for readers.
Could you add a note to the document that it's not final version, please?

@Fznamznon
Copy link
Contributor Author

@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch.

I've started reading the document, but I realized that I'd like to re-reword/clarify quite a lot. I'm okay to merge this version and continue improving it, but I think we should make it clear for readers.
Could you add a note to the document that it's not final version, please?

Sure, added in 7b60419 .

@bader bader merged commit dbc6b57 into intel:sycl May 31, 2021
@bader
Copy link
Contributor

bader commented Jun 2, 2021

@Fznamznon, this change breaks Doxygen documentation build - https://github.com/intel/llvm/runs/2715240192?check_suite_focus=true.

Warning, treated as error:
/home/runner/work/llvm/llvm/repo/sycl/doc/SharedLibraries.md:document isn't included in any toctree

Please, fix it ASAP.
@alexbatashev, FYI.

@Fznamznon
Copy link
Contributor Author

@Fznamznon, this change breaks Doxygen documentation build - https://github.com/intel/llvm/runs/2715240192?check_suite_focus=true.

Warning, treated as error:
/home/runner/work/llvm/llvm/repo/sycl/doc/SharedLibraries.md:document isn't included in any toctree

Please, fix it ASAP.
@alexbatashev, FYI.

Looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Missing documentation for the code, compiler or runtime features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.