-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Add design doc for dynamic linking of device code feature #3210
Conversation
@kbobrovs, please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part I of review
Co-authored-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com> Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
@romanovvlad, @vladimirlaz, I'd like to get some review for runtime and cache sections, feel free to waterfall it to someone from runtime team, if you think the know related parts of runtime better. |
sycl/doc/SharedLibraries.md
Outdated
Programs that contain only `SYCL_EXTERNAL` functions will be cached only in | ||
compiled state, so they can be linked with other programs during dependency | ||
resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Programs that contain only `SYCL_EXTERNAL` functions will be cached only in | |
compiled state, so they can be linked with other programs during dependency | |
resolution. | |
Programs that contain only `SYCL_EXTERNAL` functions will be cached only when exported | |
functions are used by images with kernels and identified by `main` image Id. | |
So once two kernel from different images use functions from the same `library` image it will | |
be duplicated in cache after linking with main programs during dependency | |
resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, there can be two cases when SYCL_EXTERNAL
-only images are cached in in-memory cache:
- Inside the
main
image, i.e. after it is linked with all its dependencies (the case that you described in your suggestion) - Non-linked
SYCL_EXTERNAL
-only images in compiled state. It can be done when some "main" image requested someSYCL_EXTERNAL
-only image as dependency, we created a program forSYCL_EXTERNAL
-only image, compiled it and stored in cache, so we don't have to do it again when some othermain
requests the sameSYCL_EXTERNAL
image. (the case that I tried to describe here). The idea was taken from existing cache for standard device libraries (seellvm/sycl/source/detail/context_impl.hpp
Line 177 in 86716c5
std::map<std::pair<DeviceLibExt, RT::PiDevice>, RT::PiProgram>
Does it make sense?
sycl/doc/SharedLibraries.md
Outdated
In case when "main" image have imports information, device image hash should be | ||
created from all device images that are necessary to build it, i.e. hash out | ||
of "main" device image and set of 'SYCL_EXTERNAL'-only images that define all | ||
symbols imported by "main device image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current approach assume storing final binary after linking it means:
- duplication of library images in every
main
program; - on change of library all
main
images in persistent cache are invalidated and JIT is initiated.
May be it is reasonable to store libraryimages
in cache separately frommain
image? It can be done both for in-memory and persistent or only for persistent cache depending onweight
of link operation for binary programs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it is reasonable to store library images in cache separately from main image? It can be done both for in-memory and persistent or only for persistent cache depending on weight of link operation for binary programs.
Do you mean compile to binary "main" image and "library" image and store them on disk separately? It sounds reasonable (the real dynamic libraries actually work this way), however I don't think we have devices that support linking of native device binaries yet, so right now It doesn't seem possible.
Co-authored-by: vladimirlaz <vladimir.lazarev@intel.com>
@kbobrovs @AlexeySachkov , could you please take a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM. We can add more __SYCL_PI_DEVICE_BINARY_TARGET* formats if we want to distinguish between flavors of e.g. spirv64_gen native format, I assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@s-kanaev , @AlexeySachkov , could you please take a look and give an approve again? There was some changes. |
Expand the design to include the case when device functions are exported from a shared library, which is a new feature proposed in intel#3210.
Ping also @gmlueck , I remember your requested to state requirements to backends, I did it under PI section. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping also @gmlueck , I remember your requested to state requirements to backends, I did it under PI section. Please take a look.
Thanks for adding that.
I added a smaller comment asking which exception we throw when there is a link error, but I approved anyway. Please respond, though about the exception. I also added a comment about the cache, but we might choose to wait on that if we are planning to restructure the cache later.
{2, ...} => program 1 | ||
``` | ||
However the library code will be compiled twice if kernel from the library | ||
was enqueued before kernels from the application, i.e. in such case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to address this weakness (about compiling twice), I think you could do so by adding a "program state" to cache key. To see how this would work, first consider a simple case where there are no shared libraries. Here are the PI operations that are required:
pi_program1 = piProgramCreate(/*spir-v from OSMod 1*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program1, dev, opts);
pi_program2 = piProgramLink(dev, opts, {pi_program1});
Note that we're calling piProgramCompile()
followed by piProgramLink()
, rather than just calling piProgramBuild()
. The reason for doing this will become clear later.
Since we are adding a "program state" to the cache key, we can cache both pi_program1
and pi_program2
: The result of piProgramCompile()
is an object state program image, while the result of piProgramLink()
is an executable state program image. Here are the entries in the cache:
[OSMod 1, spec consts, opts, dev, object] => pi_program1
[OSMod 1, spec consts, opts, dev, executable] => pi_program2
I think this is the same as what you propose, except I'm adding the "program state" to the cache key and I've cached pi_program1
in addition to pi_program2
.
Any future request to run a kernel from OS module 1 on the same device with the same spec constants and the same build options will search the cache for [OSMod 1, spec consts, dev, executable]
, and it will find the pi_program2
that we already built.
Now let's consider the case where there is device code in a shared library that needs to be online linked. Using your example above, here are the PI operations that need to happen:
pi_program1 = piProgramCreate(/*spir-v from OSMod 1*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program1, dev, opts);
pi_program2 = piProgramCreate(/*spir-v from OSMod 2*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program2, dev, opts);
pi_program3 = piProgramLink(dev, opts, {pi_program1, pi_program2});
And we can put the following items in the cache:
[OSMod 1, spec consts, opts, dev, object] => pi_program1
[OSMod 2, spec consts, opts, dev, object] => pi_program2
[OSMod 1, spec consts, opts, dev, executable] => pi_program3
[OSMod 2, spec consts, opts, dev, executable] => pi_program3
Again, I think this is the same as what you propose except I have the new "program state" element in the key, and I'm caching some more pi_program
objects.
If there is a subsequent attempt to submit ExternalKernel
, the program manager will see that ExternalKernel
comes from OSMod 2
, and it will find pi_program3
in the cache. Again, this is exactly the same as your proposal.
Now, let's consider the case when ExternalKernel
is submitted first. The PI operations are:
pi_program1 = piProgramCreate(/*spir-v from OSMod 2*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program1, dev, opts);
pi_program2 = piProgramLink(dev, opts, {pi_program1});
And the cache will be:
[OSMod 2, spec consts, opts, dev, object] => pi_program1
[OSMod 2, spec consts, opts, dev, executable] => pi_program2
Now, the application submits InternalKernel
, and the PI operations are:
pi_program3 = piProgramCreate(/*spir-v from OSMod 1*/);
/* possibly set values of spec constants */
piProgramCompile(pi_program3, dev, opts);
pi_program1 = /* found in cache: [OSMod 2, spec consts, opts, dev, object]*/;
pi_program4 = piProgramLink(dev, opts, {pi_program3, pi_program1});
Notice that the redundant compilation of "OSMod 2" is eliminated because the result was in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of storing mid-state (compiled) programs in the cache sounds good.
Do piProgramCompile
and piProgramLink
produce a distinct program object instead of modifying the source one? If so, the idea is quite feasible.
One of the concerns here might be an overhead of storing object
programs in the cache and performing a build with two steps instead of single one. The overhead here is only a slight bit of memory consumption for map plus some time for inserting another cache entry into map tree. As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild
could be a bit more optimized in time domain than piProgramCompile
plus piProgramLink
due to internals of backend. On the other hand, this overhead (if any) is sort of "one-shot" one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do piProgramCompile and piProgramLink produce a distinct program object instead of modifying the source one?
piProgramLink
creates a new program object, but piProgramCompile
does not. The logic I propose above only depends on piProgramLink
creating a new program object.
As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild could be a bit more optimized in time domain than piProgramCompile plus piProgramLink due to internals of backend.
Yes, this is a good point. Looking at the implementation of piProgramLink
in the Level Zero PI plugin, I think that code should check for the case when there is only one input. We can optimize that case because it is not necessary to call zeModuleDynamicLink
.
For OpenCL, we need to ask the team that implements the OpenCL driver whether it is much more efficient to call clBuildProgram
vs. calling both clCompileProgram
and clLinkProgram
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization of piProgramCompile
& piProgramLink
seems a bit out of the scope of this PR. I might include an improvement with object state programs in the cache though. Should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a strong opinion on expanding the cache to include object state as part of this design vs. doing it as a separate task. If you decide not to do it as part of this design, I think you should enter a Jira to make sure it does happen at some point, though. If you do this, the Jira could contain a link to this conversation (or copy the conversation into the Jira).
I think you should not split the calls to piProgramBuild()
into separate piProgramCompile()
/ piProgramLink()
calls unless you check that this won't introduce a performance regression. The fix I mention above to the Level Zero implementation of piProgramLink()
is probably a very small 3-liner. You should also ask the OpenCL team about the efficiency of calling clBuildProgram
vs. calling both clCompileProgram
and clLinkProgram
, though.
image. If they match some imported symbols then these matched symbols will be | ||
marked as resolved. The procedure repeats until all imported symbols are marked | ||
as resolved. In case all available device images are viewed, but some imported | ||
symbols remain unresolved, exception will be thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exception will be thrown? If we want to use one of the existing exception codes, I think it should either be errc::runtime
or errc::build
. (See Table 136 in the SYCL 2020 spec.) I think I have a slight preference for errc::runtime
, so that we reserve errc::build
for exceptions that happen when the application specifically asks to online compile or link a kernel bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errc::build
seems more suitable for me, since usually OpenCL build APIs emit build error when incoming program has unresolved symbols. In addition, in regular c++ errors about unresolved symbols are usually emitted during build time. However this is not a strong preference, if we want to use errc::runtime
, that is ok for me.
@s-kanaev , @romanovvlad , any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just looking at the DPC++ code in program_manager
. It looks like we currently throw compile_program_error
if there is a JIT-time compilation error for a kernel. This is most similar to the SYCL 2020 errc::build
exception code, so I presume that is what we will throw in the future. Therefore, it probably does make sense to throw errc::build
if there are unresolved symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errc::build
sounds suitable here as we tried and failed to link several programs (which is part of build process). It's a pity the spec doesn't have link
error code which would fit here even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't finish the review yet.
allows to link device code dynamically at runtime, such as in the scenarios | ||
above. | ||
|
||
## Requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
## Requirements: | |
## Requirements |
|
||
### sycl-post-link changes | ||
|
||
To support dynamic linking of device code , `sycl-post-link` performs 2 main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
To support dynamic linking of device code , `sycl-post-link` performs 2 main | |
In order to support dynamic linking of device code, `sycl-post-link` performs 2 main |
Mapping of extension strings and formats that can be linked: | ||
| Device image format | Extension string | Meaning | | ||
|---------------------|------------------|---------| | ||
| __SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64 | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
| __SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64 | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported| | |
| `__SYCL_PI_DEVICE_BINARY_TARGET_SPIRV64` | "pi_ext_spirv64_linking" | Linking of SPIR-V 64-bit programs is supported| |
- AOT compilers must allow to compile SPIR-V modules with unresolved symbols | ||
and produce device code in format that can be linked in run time and allows | ||
to reduce JIT overhead | ||
- OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
- OpenCL program binary type CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY] | |
- OpenCL program binary type `CL_PROGRAM_BINARY_TYPE_[COMPILED_OBJECT/LIBRARY]` |
image. If they match some imported symbols then these matched symbols will be | ||
marked as resolved. The procedure repeats until all imported symbols are marked | ||
as resolved. In case all available device images are viewed, but some imported | ||
symbols remain unresolved, exception will be thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errc::build
sounds suitable here as we tried and failed to link several programs (which is part of build process). It's a pity the spec doesn't have link
error code which would fit here even better.
SYCL_EXTERNAL void LibFunc(); | ||
|
||
Q.submit([&](cl::sycl::handler &CGH) { | ||
CGH.parallel_for<InternalKernel>( ... ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT
CGH.parallel_for<InternalKernel>( ... ) | |
CGH.parallel_for<InternalKernel>( ... ) |
// 2. Prepared program is used to enqueue kernel | ||
|
||
Q.submit([&](cl::sycl::handler &CGH) { | ||
handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel | |
handler.parallel_for([] { LibFunc(); }); // Prepared program is used to enqueue kernel |
``` | ||
I.e. each kernel name is mapped to a set of tuples that consists of OS module, | ||
spec constant values, JIT compiler options and device. Then concrete tuple is | ||
mapped to a program object. Several tuples can be mapped to a same program |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapped to a program object. Several tuples can be mapped to a same program | |
mapped to a program object. Several tuples can be mapped to the same program |
I.e. each kernel name is mapped to a set of tuples that consists of OS module, | ||
spec constant values, JIT compiler options and device. Then concrete tuple is | ||
mapped to a program object. Several tuples can be mapped to a same program | ||
object, they are created during process of compilation and symbols resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"they" refers to programs here.
object, they are created during process of compilation and symbols resolution | |
object. These tuples are created during process of compilation and symbols resolution |
Or
object, they are created during process of compilation and symbols resolution | |
object. The program duplicating tuples are created during process of compilation and symbols resolution |
for concrete device image. When some program is made through linking of several | ||
programs created from device images that come from different OS modules, | ||
for each OS module in cache will be created a tuple with corresponding OS module | ||
id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for concrete device image. When some program is made through linking of several | |
programs created from device images that come from different OS modules, | |
for each OS module in cache will be created a tuple with corresponding OS module | |
id. | |
for concrete device image. When some program is result of linking several programs from device images with different OS modules, a tuple is created for each OS module ID. These tuples are used as nested cache entries after kernel name. |
{2, ...} => program 1 | ||
``` | ||
However the library code will be compiled twice if kernel from the library | ||
was enqueued before kernels from the application, i.e. in such case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of storing mid-state (compiled) programs in the cache sounds good.
Do piProgramCompile
and piProgramLink
produce a distinct program object instead of modifying the source one? If so, the idea is quite feasible.
One of the concerns here might be an overhead of storing object
programs in the cache and performing a build with two steps instead of single one. The overhead here is only a slight bit of memory consumption for map plus some time for inserting another cache entry into map tree. As for performing build in two steps instead of one (I am not really sure about it), the piProgramBuild
could be a bit more optimized in time domain than piProgramCompile
plus piProgramLink
due to internals of backend. On the other hand, this overhead (if any) is sort of "one-shot" one.
In case when "main" image have imports information, device image hash should be | ||
created from all device images that are necessary to build it, i.e. hash out | ||
of "main" device image and set of images that define all | ||
symbols imported by "main" device image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How the hash of several device images is obtained?
Is it obtained from a string which is result of appending the device images?
Then, I believe, order of device images should be defined and persistent across runs of the same application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. With the current scheme we cannot guarantee persistent order of device images. We could actually sort device images by some feature (like size, set of defined symbols and etc.) before creating a hash string. But the algorithm of search itself doesn't prevent using of different images to resolve dependencies, since it is assumed that the same symbol defined in several device images of the same format should have the same definition. However when some other device image is chosen to resolve dependency, it won't be technically the same program as the result of linking.
So, does sorting of device images before creation of hash string sounds ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does sorting of device images before creation of hash string sounds ok?
In order to get cross-run persistent order of device images they should be sorted. That's right. Size can't work as sorting feature as it may be equal for two arbitrary images. Having stable sort algorithm won't help here as the input order of images isn't persistent across multiple runs.
// having even low probability of image size matching makes us reject this idea
Set of defined symbols sounds good enough. Though, using OS module ID (a mere number) as sorting feature is of more optimized way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, using OS module ID (a mere number) as sorting feature is of more optimized way.
Do you mean OS module handle? Isn't just a pointer?
using OSModuleHandle = intptr_t; |
I'm not really sure that I understand how getting of OS module handle works, but can it change from run to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean OS module handle? Isn't just a pointer?
My bad, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-kanaev , do you mind if I make this addition and apply other NIT comments from you in the follow-up patch? I think it will take very much time to get all the approvals again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind at all.
@bader , could you please take a look and merge if this is ok? I intend to merge the current state of the doc and apply rest of the comments in a follow up patch. |
std::cout << out[i] << “ “; | ||
|
||
// lib.cpp | ||
int SYCL_EXTERNAL LibDeviceFunc(int i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int SYCL_EXTERNAL LibDeviceFunc(int i) { | |
SYCL_EXTERNAL int LibDeviceFunc(int i) {0 |
To align with the declaration from the app.cpp
. I hope SYCL_EXTERNAL
position is not important for dynamic linking.
clang++ -fsycl lib.cpp -shared -o helpers.so | ||
clang++ -fsycl app.cpp -lhelpers -o a.out | ||
./a.out | ||
Output: 0 2 4 6… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output: 0 2 4 6… | |
Output: 0 2 4 6 ... |
Let's avoid using non-ASCII symbols (just in case).
above. | ||
|
||
## Requirements: | ||
User's device code that consists of some device API (`SYCL_EXTERNAL` functions), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User's device code that consists of some device API (`SYCL_EXTERNAL` functions), | |
User's device code |
|
||
## Requirements: | ||
User's device code that consists of some device API (`SYCL_EXTERNAL` functions), | ||
is compiled into some form and it is not linked statically with device code of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is compiled into some form and it is not linked statically with device code of | |
can be compiled into some form and not linked statically with device code of |
## Requirements: | ||
User's device code that consists of some device API (`SYCL_EXTERNAL` functions), | ||
is compiled into some form and it is not linked statically with device code of | ||
application. It can be a shared library with embedded device image or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application. It can be a shared library with embedded device image or a | |
application. It can be embedded as a device image into a shared library or |
User's device code that consists of some device API (`SYCL_EXTERNAL` functions), | ||
is compiled into some form and it is not linked statically with device code of | ||
application. It can be a shared library with embedded device image or a | ||
separate device image supplied with properties attached. This code is linked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate device image supplied with properties attached. This code is linked | |
supplied as a separate device image with attached properties. This code is linked |
I've started reading the document, but I realized that I'd like to re-reword/clarify quite a lot. I'm okay to merge this version and continue improving it, but I think we should make it clear for readers. |
7b60419
Sure, added in 7b60419 . |
@Fznamznon, this change breaks Doxygen documentation build - https://github.com/intel/llvm/runs/2715240192?check_suite_focus=true.
Please, fix it ASAP. |
Looking into it. |
No description provided.