Skip to content

[SYCL]Device lib default link #2277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from
Closed

Conversation

jinge90
Copy link
Contributor

@jinge90 jinge90 commented Aug 7, 2020

This PR is the draft patch to enable default device library linking in DPCPP program compiling. The basic idea is:

  1. Link the pre-built device libraries with user's device code in llvm-link phase.
  2. Remove the "unused" function in final device image, those "unused" functions are introduced by step Enable execution on GPU and ACC #1, this is done by adding "--dead-code-removal" option in sycl-post-link. This option just walks through all SPIR functions in device image and will remove those functions which are not invoked at all.

After doing this, we will simplify the usage of device library. Supposing we have a source file requiring std::complex math in SYCL device, previous building command:
clang++ -fsycl -c test_complex.cpp -o test_complex.o
clang++ -fsycl test_complex.o $sycl_lib_path/libsycl-cmath.o $sycl_lib_path/libsycl-complex.o -o test_complex.out
Now, the building command is:
clang++ -fsycl test_complex.cpp -o test_complex.out

Both jit compilation and AOT are supported and for jit, only "wrapper" libraries are linked as default, for AOT, "wrapper" and "fallback" libraries are linked as default.
Currently, the default linking is not enabled for NVPTX and intelfpga mode because: (1).there are not strong requirement; (2). I don't have environment for testing.

Thank you very much.

StringRef LibLoc, LibSysUtils;
if (isMSVCEnv) {
LibLoc = Args.MakeArgString(TC->getDriver().Dir + "/../bin");
LibSysUtils = "libsycl-msvc";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the device libs for Windows built in the bin directory as opposed to the lib directory like Linux?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe archive libraries on Windows are not supposed to be in bin directory. bin is for DLLs and executables. For OpenMP the BC libraries are located in lib (as they are considered archive librareis), and SPV libraries are located in bin (as the are considered 'dynamic' libraries, i.e. DLLs). Anyway, I think the BC libraries must not be located in bin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mdtoguchi and @vzakhari
Currently, the device libraries on Windows platform are located in bin/ folder for DPC++ compiler. You can find the details in :
https://github.com/intel/llvm/blob/sycl/libdevice/cmake/modules/SYCLLibdevice.cmake#L209
So, I followed it in this patch. I think the reason why put device libraries in bin/ on Windows is we want to put all 'sycl-related' libraries together with libsycl.so/dll? @asavonic , please feel free to correct me.
Thank you very much.

for (const StringRef &Lib : sycl_libs) {
SmallString<128> LibName(LibLoc);
llvm::sys::path::append(LibName, Lib);
llvm::sys::path::replace_extension(LibName, ".o");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.o is not a standard object name on Windows (typically it is .obj). Probably can be discussed/addressed separately, but having the naming be inconsistent with expectations is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mdtoguchi
As we discussed above, current device libraries use ".o" for both Linux and Windows. I agree that ".obj" should be used for Windows and I think we can address separately in another PR.
Thank you very much.

@jinge90
Copy link
Contributor Author

jinge90 commented Aug 10, 2020

Hi, all.
There are several lit tests failures in check-clang, they are expected failures. If we finally approved solution in this PR, I will update the tests.
Thank you very much.

@jinge90
Copy link
Contributor Author

jinge90 commented Aug 10, 2020

/summary:run

Signed-off-by: gejin <ge.jin@intel.com>
@bd4
Copy link

bd4 commented Aug 21, 2020

Having run into the issue of std::complex multiplication not working in device code without an extra lib link (which I discovered how to do only by recursive grep on the source tree and finding the test cases), I think this is a great usability improvement.

@bader
Copy link
Contributor

bader commented Aug 22, 2020

2. Remove the "unused" function in final device image, those "unused" functions are introduced by step #1, this is done by adding "--dead-code-removal" option in sycl-post-link. This option just walks through all SPIR functions in device image and will remove those functions which are not invoked at all.

I think adding -only-needed flag to llvm-link tool does the same.

@jinge90 jinge90 force-pushed the DeviceLib_default_link branch from 01077c4 to d436af2 Compare August 24, 2020 01:41
Signed-off-by: gejin <ge.jin@intel.com>
Signed-off-by: gejin <ge.jin@intel.com>
@jinge90
Copy link
Contributor Author

jinge90 commented Aug 24, 2020

  1. Remove the "unused" function in final device image, those "unused" functions are introduced by step Enable execution on GPU and ACC #1, this is done by adding "--dead-code-removal" option in sycl-post-link. This option just walks through all SPIR functions in device image and will remove those functions which are not invoked at all.

I think adding -only-needed flag to llvm-link tool does the same.

Hi, @bader , llvm-link --only-needed does the similar thing but llvm-link can't handle sycl specific scenario. I tried to use llvm-link --only-needed to avoid linking unnecessary symbols, for some simple tests, it can meet our requirements. However, I also got a bunch of failures. The root causes of those failures are same, kernel functions will be marked as "not needed" and will not be linked into final device image. Those failed cases will report runtime error saying something like "... kernel doesn't exist".
llvm-link only includes support for 'general link' and doesn't take any sycl specifics into consideration. Currently, in our patch, we will skip all "SPIR_KERNEL" functions when doing 'dead code removal', so kernel functions will not be deleted by mistake. And we also skip all "SPIR_FUNC" with "indrect-reference" attribute, they are not called by any other function but may be called via function pointer.
We can change llvm-link to make --only-needed to work for our requirements but I think it is not friendly for upstream.
Thank you very much.

@bader
Copy link
Contributor

bader commented Aug 24, 2020

@jinge90, I think these are FE bugs. Functions must have proper linkage type to be linked by generic LLVM tools (https://llvm.org/docs/LangRef.html#linkage-types).
Linking is not something "SYCL specific".

@jinge90
Copy link
Contributor Author

jinge90 commented Aug 24, 2020

@jinge90, I think these are FE bugs. Functions must have proper linkage type to be linked by generic LLVM tools (https://llvm.org/docs/LangRef.html#linkage-types).
Linking is not something "SYCL specific".

Hi, @bader
You are correct, linking is not "SYCL specific". I can reproduce this issue in non-SYCL program, all Kernel functions has "weak_odr" linkage, if we have 2 LLVM BC files: a.bc and b.bc, each of them has one function with "weak_odr" linkage, when linking them using: "llvm-link --only-needed a.bc b.bc -o c.bc", the final "c.bc" file will only include the function from a.bc.
If we have multiple kernel functions, using "--only-needed" will lead to some kernel missing in final device image.
Thank you very much.

@bader
Copy link
Contributor

bader commented Aug 24, 2020

all Kernel functions has "weak_odr" linkage

This sounds strange. User kernels has default linkage (see https://github.com/intel/llvm/blob/sycl/clang/lib/Sema/SemaSYCL.cpp#L1512). I think you mean kernels defined in the library. Right? If so, I guess only "fallback" implementation should have "weak_odr" linkage, so linker can override it if another definition with stronger linkage type is provided.

if we have 2 LLVM BC files: a.bc and b.bc, each of them has one function with "weak_odr" linkage, when linking them using: "llvm-link --only-needed a.bc b.bc -o c.bc", the final "c.bc" file will only include the function from a.bc.

That behavior sounds right to me, but I expect only one definition to have "weak_odr" linkage type.

If we have multiple kernel functions, using "--only-needed" will lead to some kernel missing in final device image.

That's something I didn't get. I suppose with have the same symbol (e.g. foo) defined in a.bc and b.bc and linker leaves only one definition as you describe above, why "some kernel missing"?

@bader
Copy link
Contributor

bader commented Aug 24, 2020

Hi, all.
I want to consult with you how to improve the user experience for device libraries, we may have following options:

  1. Make device libraries to be linked with kernel as default and provide a option "-fno-sycl-devicelib" to disable the default link, this can provide some flexibility to users.(This is what this patch is doing now)
  2. Add an option like "-fsycl-devicelib-enable", all device libraries are not linked automatically but users can add "-fsycl-option-enable" to involve them during building.
  3. Others?
    Do you have any comments or good idea?
    Thank you very much.

I guess we want users to have "similar to native CPU programming" experience if possible. Option 1 seems to be aligned with clang/gcc on linking standard C/C++ library.
Do you envision any issues with implementing option (1) for DPC++?

@jinge90
Copy link
Contributor Author

jinge90 commented Aug 25, 2020

/summary:run

@jinge90
Copy link
Contributor Author

jinge90 commented Aug 25, 2020

all Kernel functions has "weak_odr" linkage

This sounds strange. User kernels has default linkage (see https://github.com/intel/llvm/blob/sycl/clang/lib/Sema/SemaSYCL.cpp#L1512). I think you mean kernels defined in the library. Right? If so, I guess only "fallback" implementation should have "weak_odr" linkage, so linker can override it if another definition with stronger linkage type is provided.

I tried several simple tests and dumped the device IR, all kernel functions are marked as "weak_odr", is it expected behavior?
Following code snippet is from as simple test:

define weak_odr dso_local spir_kernel void @_ZTS10KernelTest(float addrspace(1)....
In fact, device libraries (both wrapper and fallback) doesn't include any kernel functions, we just provide a set of "SPIR_FUNC" with "weak" linkage.

if we have 2 LLVM BC files: a.bc and b.bc, each of them has one function with "weak_odr" linkage, when linking them using: "llvm-link --only-needed a.bc b.bc -o c.bc", the final "c.bc" file will only include the function from a.bc.

That behavior sounds right to me, but I expect only one definition to have "weak_odr" linkage type.

If we have multiple kernel functions, using "--only-needed" will lead to some kernel missing in final device image.

That's something I didn't get. I suppose with have the same symbol (e.g. foo) defined in a.bc and b.bc and linker leaves only one definition as you describe above, why "some kernel missing"?

In my test, both "a.bc" and "b.bc" only include one "weak_odr" function but the functions are totally different(different name, arg list....), after using "llvm-link --only-needed", one of the functions was ignored and not linked into final LLVM bc.
For example, a.bc has one function "_Z3fooPi", b.bc has one function "_Z3bari", both functions are "weak_odr".
If I use: "llvm-link --only-needed a.bc b.bc -o c.bc", c.bc will only include _Z3fooPi, b.bc's function is lost.
The scenario is almost same for SYCL, in multiple source compilation, we have multiple LLVM bc files and each of them includes one kernel function with "weak_odr" linkage, when using llvm-link --only-needed to link all LLVM bc, some LLVM bc's kernel function will be ignored and not linked into final device image.

In fact this problem is not related to "weak_odr" or any linkage type, it is due to llvm-link's algorithm for implementing "--only-needed". We can reproduce this issue without "weak_odr". When we use: "llvm-link --only-needed src-1.bc, src-2.bc...src-n.bc -o dst.bc", llvm-link's workflow is like following:
initial dst.bc is empty
move functions/globals in src-1.bc into dst.bc
for (idx from 2 to n) {
for (f in src-idx.bc's function list) {
if (f is dependent on by dst.bc)
f is linked into dst.bc
else
f will be ignored
}
}
The algorithm makes the input sources' sequence very important and it has the pre-assumption that the functions to be linked must be dependent on by other functions linked previously. So, if we have multiple SYCL kernels located in multiple LLVM BC, the pre-assumption may be broken and some SYCL kernel functions will be skipped by "--only-needed" option.
Thanks very much.

@jinge90
Copy link
Contributor Author

jinge90 commented Aug 25, 2020

Hi, all.
I want to consult with you how to improve the user experience for device libraries, we may have following options:

  1. Make device libraries to be linked with kernel as default and provide a option "-fno-sycl-devicelib" to disable the default link, this can provide some flexibility to users.(This is what this patch is doing now)
  2. Add an option like "-fsycl-devicelib-enable", all device libraries are not linked automatically but users can add "-fsycl-option-enable" to involve them during building.
  3. Others?
    Do you have any comments or good idea?
    Thank you very much.

I guess we want users to have "similar to native CPU programming" experience if possible. Option 1 seems to be aligned with clang/gcc on linking standard C/C++ library.
Do you envision any issues with implementing option (1) for DPC++?

Currently, I haven't envisioned any issues for (1) but there are many scenarios. Current patch supports jit and AOT but "-fintelfpga" and "NVPXT" are not supported as I don't have environment for testing for those targets.
Thanks very much.

@jinge90 jinge90 requested review from vladimirlaz and removed request for rbegam August 26, 2020 03:25
alexbatashev
alexbatashev previously approved these changes Aug 28, 2020
Copy link
Contributor

@alexbatashev alexbatashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL RT changes LGTM

Signed-off-by: gejin <ge.jin@intel.com>
Signed-off-by: Ge Jin <ge.jin@intel.com>
Signed-off-by: Ge Jin <ge.jin@intel.com>
Signed-off-by: Ge Jin <ge.jin@intel.com>
@jinge90
Copy link
Contributor Author

jinge90 commented Aug 31, 2020

Hi, @bader
I have updated the patch, the latest version will support "-fintelfpga" option. If developer uses "-fintelfpga" to build the code, both the wrapper and fallback libraries will be linked. The lit test for "-fintelfpga" mode is also added.

Signed-off-by: gejin <ge.jin@intel.com>
Signed-off-by: gejin <ge.jin@intel.com>
@@ -21,11 +21,11 @@

/// -fintelfpga -fsycl-link tests
// RUN: touch %t.o
// RUN: %clangxx -### -target x86_64-unknown-linux-gnu -fsycl -fintelfpga -fsycl-link %t.o -o libfoo.a 2>&1 \
// RUN: %clangxx -### -target x86_64-unknown-linux-gnu -fsycl -fno-sycl-devicelib -fintelfpga -fsycl-link %t.o -o libfoo.a 2>&1 \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mdtoguchi and @bader
Currently, I added "-fno-sycl-devicelib" to disable devicelib default link in order to not break those lit tests. As those tests depend on the compilation workflow.
I will add some lit test in driver to test devicelib link step later.
Thanks very much.

@jinge90
Copy link
Contributor Author

jinge90 commented Sep 1, 2020

/summary:run

@jinge90
Copy link
Contributor Author

jinge90 commented Sep 1, 2020

We will pend this PR decided to split the work into several steps, the first step is for default link: #2400
Thank you very much.

@vladimirlaz
Copy link
Contributor

@jinge90 . do you still need this PR (#2400 has been merged)? Could you please close this PR if it is not needed anymore?

@jinge90 jinge90 closed this Sep 21, 2020
jsji pushed a commit that referenced this pull request Feb 1, 2024
Add support for load/store operations for a cooperative matrix such that original matrix shape is known and implementations are able to reason about how to deal with the out of bounds.

CapabilityCooperativeMatrixCheckedInstructionsINTEL = 6192
CooperativeMatrixLoadCheckedINTEL = 6193
CooperativeMatrixStoreCheckedINTEL = 6194

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@b62cb55
kbenzie pushed a commit to kbenzie/intel-llvm that referenced this pull request Feb 17, 2025
[Spec] fix urKernelSuggestMaxCooperativeGroupCountExp
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
[Spec] fix urKernelSuggestMaxCooperativeGroupCountExp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants