Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocloc missing symbols from libigc.so #265

Open
stef opened this issue Mar 3, 2020 · 17 comments
Open

ocloc missing symbols from libigc.so #265

stef opened this issue Mar 3, 2020 · 17 comments
Labels
build issue distro Distribution specific questions IGC Issue related to IGC

Comments

@stef
Copy link

stef commented Mar 3, 2020

howdy,

i'm trying to compile this fine project for alpine linux, during that i ran into a bunch of roadblocks with this (and following issues) i'll try to document and validate my fixes to these. first issue:

Error loading the Generic builtin resource
Build failed with error code: -11
Command was: /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/build/bin/ocloc -q -file scheduler.cl -device bdw -cl-intel-greater-than-4GB-buffer-required -64 -out_dir /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/build/bin/scheduler/x64/gen8 -cpp_file -options -I/usr/include/igc -I/usr/include/igc/cif -I/usr/include/igc/ocl_igc_shared/executable_format -I/usr/include/igc/ocl_igc_shared/device_enqueue -I ../gen8 -cl-kernel-arg-info -cl-std=CL2.0 -cl-intel-disable-a64WA
make[2]: *** [igdrcl_lib_release/scheduler/CMakeFiles/scheduler_Gen8core.dir/build.make:62: bin/scheduler/x64/gen8/scheduler_Gen8core.bin] Error 245
make[2]: Leaving directory '/home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/build'
make[1]: *** [CMakeFiles/Makefile2:6914: igdrcl_lib_release/scheduler/CMakeFiles/scheduler_Gen8core.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

fixed by adding igc to the target_link_libraries of the offline_compiler (this patch also adds the libs for the symbols - like backtrace et al - that musl does not provide)

diff -Nurw compute-runtime-20.08.15750/offline_compiler/CMakeLists.txt src/compute-runtime-20.08.15750/offline_compiler/CMakeLists.txt
--- compute-runtime-20.08.15750/offline_compiler/CMakeLists.txt   2020-02-29 00:33:03.068525017 +0000
+++ src/compute-runtime-20.08.15750/offline_compiler/CMakeLists.txt  2020-02-29 00:41:59.361882810 +0000
@@ -140,7 +140,7 @@
 endif()

 if(UNIX)
-  target_link_libraries(ocloc dl pthread)
+  target_link_libraries(ocloc dl unwind execinfo igc)
 endif()

 set_target_properties(ocloc PROPERTIES FOLDER "offline_compiler")

it seems that by linking libigc.so this problem is fixed, but i wonder why is libigc missing at all, is it dynamically loaded, but something goes wrong during that? or is it ok to just at libigc as a target_link_library?

thanks for any insights.

@stef stef changed the title trying to build for alpine linux trying to build for alpine linux - ocloc missing symbols from libigc.so Mar 3, 2020
@stef stef changed the title trying to build for alpine linux - ocloc missing symbols from libigc.so cloc missing symbols from libigc.so Mar 3, 2020
@stef stef changed the title cloc missing symbols from libigc.so ocloc missing symbols from libigc.so Mar 3, 2020
@JacekDanecki
Copy link
Contributor

JacekDanecki commented Mar 4, 2020

IGC libraries are being loaded by ocloc using dlopen, see

Can you run ocloc manually with LD_DEBUG=libs environment variable set?
As you link ocloc with igc library I suppose there could be problem with symbols conflicts. You can also run ocloc with LD_DEBUG=all environment variable set, to see how symbols are resolved.

@stef
Copy link
Author

stef commented Mar 4, 2020

musl does not implement LD_DEBUG. i'll strace ocloc instead and see what libs it loads.

@stef
Copy link
Author

stef commented Mar 4, 2020

what's loaded:

"/usr/lib/libunwind.so.8"
"/usr/lib/libexecinfo.so.1"
"/usr/lib/libstdc++.so.6"
"/usr/lib/libgcc_s.so.1"
"/usr/lib/libigdfcl.so.1"
"/usr/lib/libLLVM-9.so"
"/usr/lib/libopencl-clang.so.9"
"/usr/lib/libffi.so.6"
"/lib/libz.so.1"
"/usr/lib/libxml2.so.2"
"/usr/lib/liblzma.so.5"
"/usr/lib/libigc.so.1"

strangely libgic gets loaded, but still i get the same error as in the issue indicated above. however if i add an LD_PRELOAD=/usr/lib/libigc.so.1 to the ocloc invocation ocloc succeeds and returns without output.

@stef
Copy link
Author

stef commented Mar 4, 2020

it looks like libigc.so is mapped correctly, as seen in strace:

open("/usr/lib/libigc.so.1", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=28370792, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\241\r\0\0\0\0\0"..., 960) = 960
mmap(NULL, 28577792, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3031a96000
mmap(0x7f3031b6b000, 7450624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0xd5000) = 0x7f3031b6b000
mmap(0x7f3032286000, 1400832, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0x7f0000) = 0x7f3032286000
mmap(0x7f30323dc000, 18853888, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x945000) = 0x7f30323dc000
mmap(0x7f30335a5000, 204800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f30335a5000
close(3)                          = 0

@JacekDanecki
Copy link
Contributor

"Error loading the Generic builtin resource" this message came from IGC, see line 951

@stef
Copy link
Author

stef commented Mar 4, 2020

when i run ocloc with LD_PRELOAD=/usr/lib/libigc.so and strace it, then i get this:

open("/usr/lib/libigc.so.1", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)     = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=28370792, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\241\r\0\0\0\0\0"..., 960) = 960
mmap(NULL, 28577792, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff64758a000
mmap(0x7ff64765f000, 7450624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0xd5000) = 0x7ff64765f000
mmap(0x7ff647d7a000, 1400832, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0x7f0000) = 0x7ff647d7a000
mmap(0x7ff647ed0000, 18853888, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x945000) = 0x7ff647ed0000
mmap(0x7ff649099000, 204800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff649099000
close(3)                          = 0

looks the same as in #265 (comment)

only it is loaded first, not last

@stef
Copy link
Author

stef commented Mar 4, 2020

"Error loading the Generic builtin resource" this message came from IGC, see line 951

yes, and it is fixed by linking/preloading libigc to ocloc, i traced that.

@JacekDanecki
Copy link
Contributor

Can you verify IGC was built correctly, linked with builtins? I've observed successful IGC builds although there were problems with builtins.

@stef
Copy link
Author

stef commented Mar 4, 2020

"Error loading the Generic builtin resource" this message came from IGC, see line 951

llvm::LoadBufferFromResource is failing and returning NULL here: https://github.com/intel/intel-graphics-compiler/blob/67351f4e52f52eb2e4d68a9a40599a77733b5603/IGC/AdaptorOCL/OCL/LoadBuffer.cpp#L59

when libigc is not linked/preloaded by only dlopened

@stef
Copy link
Author

stef commented Mar 4, 2020

Can you verify IGC was built correctly, linked with builtins? I've observed successful IGC builds although there were problems with builtins.

sure, how do i verify builtins?

@JacekDanecki
Copy link
Contributor

On Ubuntu 20.04 system I've such builtins symbols exported by igc 1.0.3390:

nm -DC /usr/lib/x86_64-linux-gnu/libigc.so | grep _igc_bif_
00000000018dc020 D _igc_bif_BC_120
00000000018dc000 D _igc_bif_BC_120_size
0000000001957680 D _igc_bif_BC_121
000000000195766c D _igc_bif_BC_121_size
00000000019d3be0 D _igc_bif_BC_122
00000000019d3bdc D _igc_bif_BC_122_size
$ dpkg -l libigc | grep libigc
ii  libigc         1.0.3390-1~ppa1~focal1 amd64        Intel(R) Graphics Compiler

@stef
Copy link
Author

stef commented Mar 4, 2020

confirmed, i have this:

% nm -DC /usr/lib/libigc.so | grep _igc_bif_
0000000000aa7020 D _igc_bif_BC_120
0000000000aa7000 D _igc_bif_BC_120_size
0000000000b226e0 D _igc_bif_BC_121
0000000000b226d8 D _igc_bif_BC_121_size
0000000000b9ecc0 D _igc_bif_BC_122
0000000000b9eca8 D _igc_bif_BC_122_size

@JacekDanecki
Copy link
Contributor

It's weird, maybe dlopen works differently on mucl.
ocloc uses dlopen with RTLD_LAZY | RTLD_DEEPBIND to load fcl and igc, then igc loads builtins symbols using dlsym with RTLD_DEFAULT

@stef
Copy link
Author

stef commented Mar 4, 2020

after consulting with the fine people of #musl the suggestion was to add RTLD_GLOBAL to this line:

https://github.com/intel/compute-runtime/blob/master/shared/source/os_interface/linux/os_library_linux.cpp#L33

and it seems to work.

@AdamCetnerowski AdamCetnerowski added build issue distro Distribution specific questions labels Mar 4, 2020
@JacekDanecki
Copy link
Contributor

Yeah, I've prepared simple reproducer on alpine, and it looks like RTLD_GLOBAL flag is required to load symbol by dlsym in the library loaded by dlopen.

~ # cat foo.cpp 
#include <stdio.h>
#include <dlfcn.h>

extern "C" {
int foo();
int boo();
}

typedef void (*fun_boo)();

int foo()
{
        void *m = RTLD_DEFAULT;

        printf("foo\n");
        fun_boo libboo = (fun_boo) dlsym(m, "boo");
        if (!libboo) {
                printf("%s\n", dlerror());
                return 22;
        }
        libboo();

        return 0;
}

int boo()
{
        printf("boo\n");
        return 0;
}

~ # cat main.cpp 
#include <dlfcn.h>
#include <stdio.h>

typedef int (*foo)();

int main()
{
        int ret;

        void * lib = dlopen("libfoo.so", RTLD_LAZY | RTLD_GLOBAL);
        if (!lib) {
                printf("%s\n", dlerror());
                return 2;
        }
        foo libfoo = (foo) dlsym(lib, "foo");
        if (!libfoo) {
                printf("%s\n", dlerror());
                return 22;
        }

        ret = libfoo();
        dlclose(lib);

        return ret;
}

~ # cat Makefile 
all:
        clang++ -g -shared -o libfoo.so foo.cpp
        clang++ -g -o main main.cpp -ldl
~ # export LD_LIBRARY_PATH=`pwd`
~ # ./main
foo
boo

When I remove RTLD_GLOBAL flag in dlopen, there is error in dlsym to load boo.

~ # ./main
foo
Symbol not found: boo

The same test works correctly on Fedora without RTLD_GLOBAL flag.

@stef
Copy link
Author

stef commented Mar 4, 2020

with a few more minor changes i manage to compile most of it now. however at the "end" i get random errors, like:

Running igdrcl_tests 1x6x16 in /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/build/bin/tgllp
cd /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/build/bin && /usr/bin/cmake -E env GTEST_OUTPUT=xml:test_logs/test_details_tgllp_1_6_16.xml /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/build/bin/igdrcl_tests --product tgllp --slices 1 --subslices 6 --eu_per_ss 16 --gtest_catch_exceptions=1 --gtest_repeat=1 --gtest_shuffle --gtest_random_seed=0 --disable_default_listener
product family: tgllp (29)
set timeout to: 45
Iteration: 1. random_seed: 31532

unknown file: Failure
C++ exception with description "Abort was called at 121 line in file /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/core/memory_manager/gfx_partition.cpp" thrown in the test body.
[  FAILED  ][ TGLLP ][ 31532 ] DeviceGenEngineTest.givenNonHwCsrModeWhenGetEngineThenDefaultEngineIsReturned

unknown file: Failure
C++ exception with description "Abort was called at 121 line in file /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/core/memory_manager/gfx_partition.cpp" thrown in the test body.
[  FAILED  ][ TGLLP ][ 31532 ] DeviceGenEngineTest.whenCreateDeviceThenInternalEngineHasDefaultType

unknown file: Failure
C++ exception with description "Abort was called at 121 line in file /home/s/tasks/aports/ugly/compute-runtime/src/compute-runtime-20.08.15750/core/memory_manager/gfx_partition.cpp" thrown in the test body.
[  FAILED  ][ TGLLP ][ 31532 ] DeviceGenEngineTest.givenHwCsrModeWhenGetEngineThenDedicatedForInternalUsageEngineIsReturned
SIGSEGV on: CommandEncodeSemaphore.whenAddingMiSemaphoreCommandThenExpectCompareFieldsAreSetCorrectly
Child aborted
make[2]: *** [unit_tests/CMakeFiles/run_tgllp_unit_tests.dir/build.make:62: run_tgllp_unit_tests] Error 1

the patches i applied to get this far can be seen in: aports-ugly/aports@ee97443

@stef
Copy link
Author

stef commented Mar 4, 2020

i'm happy to either elaborate the patches in this the previous comment in this issue (and then we could rename the issue to something like 'porting to alpine linux') or i can open separate issues for these patches, whatever is more convenient for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build issue distro Distribution specific questions IGC Issue related to IGC
Projects
None yet
Development

No branches or pull requests

4 participants