Skip to content

[CUDA] Unexpected Failing and Passing Tests #1603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fwyzard opened this issue Apr 29, 2020 · 5 comments
Closed

[CUDA] Unexpected Failing and Passing Tests #1603

fwyzard opened this issue Apr 29, 2020 · 5 comments
Labels
bug Something isn't working cuda CUDA back-end

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Apr 29, 2020

On a local build of the latest code base (tag: 20200429), running make -j16 check-sycl-cuda, I am getting both unexpected fails (under abi/, devicelib/ and tools/) and unexpected passes (under usm/).

Machine configuration:

$ cat /etc/redhat-release 
CentOS Linux release 8.1.1911 (Core) 

$ uname -a
Linux patatrack01 4.18.0-147.6.el8.x86_64 #1 SMP Tue Oct 15 15:19:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ clinfo -l
Platform #0: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) Gen9 HD Graphics NEO
Platform #1: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz

$ nvidia-smi -L
GPU 0: Tesla K40c (UUID: GPU-10345550-a45d-0c69-f985-b66167d07fc1)

Build configuration

SYCL_BASE="$PWD"

mkdir -p minimal
cd minimal

cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
  -DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl;opencl-aot" \
  -DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl;opencl-aot;libclc" \
  -DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_BASE/llvm/sycl \
  -DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_BASE/llvm/llvm-spirv \
  -DLLVM_ENABLE_ASSERTIONS=OFF \
  -DSYCL_ENABLE_WERROR=OFF \
  -DBUILD_SHARED_LIBS=OFF \
  -DLLVM_BUILD_LLVM_DYLIB=OFF \
  -DLLVM_LINK_LLVM_DYLIB=OFF \
  -DLLVM_LIBDIR_SUFFIX=64 \
  -DLIBCLC_TARGETS_TO_BUILD="nvptx64--;nvptx64--nvidiacl" \
  -DSYCL_BUILD_PI_CUDA=ON \
  -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
  -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF \
  $SYCL_BASE/llvm/llvm

make -j`nproc` -k sycl-toolchain

Test configuration

cd minimal
make -j`nproc` check-sycl-cuda

Test summary

********************
Failing Tests (11):
  SYCL :: abi/sycl_symbols_linux.dump
  SYCL :: devicelib/c99_complex_math_fp64_test.cpp
  SYCL :: devicelib/c99_complex_math_test.cpp
  SYCL :: devicelib/cmath_fp64_test.cpp
  SYCL :: devicelib/cmath_test.cpp
  SYCL :: devicelib/math_fp64_test.cpp
  SYCL :: devicelib/math_override_test.cpp
  SYCL :: devicelib/math_test.cpp
  SYCL :: devicelib/std_complex_math_fp64_test.cpp
  SYCL :: devicelib/std_complex_math_test.cpp
  SYCL :: tools/abi_check_negative.dump

********************
Unexpected Passing Tests (16):
  SYCL :: usm/allocator_vector.cpp
  SYCL :: usm/allocator_vector_fail.cpp
  SYCL :: usm/allocatorll.cpp
  SYCL :: usm/badmalloc.cpp
  SYCL :: usm/depends_on.cpp
  SYCL :: usm/dmemll.cpp
  SYCL :: usm/hmemll.cpp
  SYCL :: usm/memadvise.cpp
  SYCL :: usm/memcpy.cpp
  SYCL :: usm/memset.cpp
  SYCL :: usm/mixed.cpp
  SYCL :: usm/mixed2.cpp
  SYCL :: usm/mixed2template.cpp
  SYCL :: usm/mixed_queue.cpp
  SYCL :: usm/queue_wait.cpp
  SYCL :: usm/smemll.cpp


Testing Time: 68.86s
  Unsupported Tests  :  65
  Expected Passes    : 136
  Expected Failures  :  32
  Unexpected Failures:  11
  Unexpected Passes  :  16

The full log is attached: check-sycl-cuda-log.txt.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 29, 2020

These

$ env LLVM_BIN_PATH=/data/user/fwyzard/sycl/minimal/bin/ python /data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py --mode check_symbols --reference /data/user/fwyzard/sycl/llvm/sycl/test/abi/sycl_symbols_linux.dump /data/user/fwyzard/sycl/minimal/lib//libsycl.so
/data/user/fwyzard/sycl/minimal/bin/llvm-readobj: error: '/data/user/fwyzard/sycl/minimal/lib//libsycl.so': No such file or directory
Traceback (most recent call last):
  File "/data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py", line 112, in <module>
    main()
  File "/data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py", line 106, in main
    check_symbols(args.reference, args.target_library)
  File "/data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py", line 74, in check_symbols
    "-t", target_path])
  File "/usr/lib64/python2.7/subprocess.py", line 223, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/data/user/fwyzard/sycl/minimal/bin/llvm-readobj', '-t', '/data/user/fwyzard/sycl/minimal/lib//libsycl.so']' returned non-zero exit status 1

and

$ not env LLVM_BIN_PATH=/data/user/fwyzard/sycl/minimal/bin/ python /data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py --mode check_symbols --reference /data/user/fwyzard/sycl/llvm/sycl/test/tools/abi_check_negative.dump /data/user/fwyzard/sycl/minimal/lib//libsycl.so | FileCheck /data/user/fwyzard/sycl/llvm/sycl/test/tools/abi_check_negative.dump
/data/user/fwyzard/sycl/minimal/bin/llvm-readobj: error: '/data/user/fwyzard/sycl/minimal/lib//libsycl.so': No such file or directory
Traceback (most recent call last):
  File "/data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py", line 112, in <module>
    main()
  File "/data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py", line 106, in main
    check_symbols(args.reference, args.target_library)
  File "/data/user/fwyzard/sycl/llvm/sycl/tools//abi_check.py", line 74, in check_symbols
    "-t", target_path])
  File "/usr/lib64/python2.7/subprocess.py", line 223, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/data/user/fwyzard/sycl/minimal/bin/llvm-readobj', '-t', '/data/user/fwyzard/sycl/minimal/lib//libsycl.so']' returned non-zero exit status 1
FileCheck error: '<stdin>' is empty.
FileCheck command line:  FileCheck /data/user/fwyzard/sycl/llvm/sycl/test/tools/abi_check_negative.dump

are due to the test looking for libsycl.so in the wrong directory, i.e. disregarding -DLLVM_LIBDIR_SUFFIX=64.

I think the fix is either to change the tests to use %sycl_libs_dir instead of %llvm_build_lib_dir, or to change sycl/test/lit.site.cfg.py.in so that %llvm_build_lib_dir takes LLVM_LIBDIR_SUFFIX into account.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 29, 2020

I think the fix is either to change the tests to use %sycl_libs_dir instead of %llvm_build_lib_dir, or to change sycl/test/lit.site.cfg.py.in so that %llvm_build_lib_dir takes LLVM_LIBDIR_SUFFIX into account.

Both approaches fix the two tests.
Updating the tests to use %sycl_libs_dir instead of %llvm_build_lib_dir seems the least intrusive change.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 29, 2020

All the devicelib tests fail in a similar way

********************
FAIL: SYCL :: devicelib/c99_complex_math_fp64_test.cpp (109 of 260)
******************** TEST 'SYCL :: devicelib/c99_complex_math_fp64_test.cpp' FAILED ********************
Script:
--
: 'RUN: at line 2';   /data/user/fwyzard/sycl/minimal/bin/clang --driver-mode=g++ -fsycl -c /data/user/fwyzard/sycl/llvm/sycl/test/devicelib/c99_complex_math_fp64_test.cpp -o /data/user/fwyzard/sycl/minimal/tools/sycl/test/devicelib/Output/c99_complex_math_fp64_test.cpp.tmp.o
: 'RUN: at line 3';   /data/user/fwyzard/sycl/minimal/bin/clang --driver-mode=g++ -fsycl /data/user/fwyzard/sycl/minimal/tools/sycl/test/devicelib/Output/c99_complex_math_fp64_test.cpp.tmp.o /data/user/fwyzard/sycl/minimal/./lib64/libsycl-complex-fp64.o -o /data/user/fwyzard/sycl/minimal/tools/sycl/test/devicelib/Output/c99_complex_math_fp64_test.cpp.tmp.out
--
Exit Code: 1

Command Output (stdout):
--
$ ":" "RUN: at line 2"
$ "/data/user/fwyzard/sycl/minimal/bin/clang" "--driver-mode=g++" "-fsycl" "-c" "/data/user/fwyzard/sycl/llvm/sycl/test/devicelib/c99_complex_math_fp64_test.cpp" "-o" "/data/user/fwyzard/sycl/minimal/tools/sycl/test/devicelib/Output/c99_complex_math_fp64_test.cpp.tmp.o"
$ ":" "RUN: at line 3"
$ "/data/user/fwyzard/sycl/minimal/bin/clang" "--driver-mode=g++" "-fsycl" "/data/user/fwyzard/sycl/minimal/tools/sycl/test/devicelib/Output/c99_complex_math_fp64_test.cpp.tmp.o" "/data/user/fwyzard/sycl/minimal/./lib64/libsycl-complex-fp64.o" "-o" "/data/user/fwyzard/sycl/minimal/tools/sycl/test/devicelib/Output/c99_complex_math_fp64_test.cpp.tmp.out"
# command stderr:
clang-11: error: no such file or directory: '/data/user/fwyzard/sycl/minimal/./lib64/libsycl-complex-fp64.o'

I'm not sure why my build does not have the files

/data/user/fwyzard/sycl/minimal/./lib64/libsycl-cmath.o
/data/user/fwyzard/sycl/minimal/./lib64/libsycl-cmath-fp64.o
/data/user/fwyzard/sycl/minimal/./lib64/libsycl-complex.o
/data/user/fwyzard/sycl/minimal/./lib64/libsycl-complex-fp64.o

Would it be a reasonable alternative to link directly with libsycl.so instead of looking for the individual .o files ?

For example, patches like this

diff --git a/sycl/test/devicelib/c99_complex_math_fp64_test.cpp b/sycl/test/devicelib/c99_complex_math_fp64_test.cpp
index c039025b111..8f3aab174b0 100644
--- a/sycl/test/devicelib/c99_complex_math_fp64_test.cpp
+++ b/sycl/test/devicelib/c99_complex_math_fp64_test.cpp
@@ -1,6 +1,5 @@
 // UNSUPPORTED: windows
-// RUN: %clangxx -fsycl -c %s -o %t.o
-// RUN: %clangxx -fsycl %t.o %sycl_libs_dir/libsycl-complex-fp64.o -o %t.out
+// RUN: %clangxx -fsycl %s -lsycl -o %t.out
 #include <CL/sycl.hpp>
 #include <cassert>
 #include <complex.h>

seem to fix all the devicelib-related unexpected failures.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 30, 2020

I fixed the build configuration to

cmake \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DLLVM_ENABLE_ASSERTIONS=ON\
  -DLLVM_ENABLE_EH=ON \
  -DLLVM_ENABLE_PIC=ON \
  -DLLVM_ENABLE_RTTI=ON \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
  -DLLVM_EXTERNAL_PROJECTS="sycl;llvm-spirv;opencl-aot;xpti;libdevice" \
  -DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_BASE/llvm/sycl \
  -DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_BASE/llvm/llvm-spirv \
  -DLLVM_EXTERNAL_XPTI_SOURCE_DIR=$SYCL_BASE/llvm/xpti \
  -DLLVM_EXTERNAL_LIBDEVICE_SOURCE_DIR=$SYCL_BASE/llvm/libdevice \
  -DLLVM_ENABLE_PROJECTS='clang;llvm-spirv;sycl;opencl-aot;xpti;libdevice;libclc' \
  -DLIBCLC_TARGETS_TO_BUILD='nvptx64--;nvptx64--nvidiacl' \
  -DSYCL_BUILD_PI_CUDA=ON \
  -DSYCL_ENABLE_WERROR=OFF \
  -DSYCL_ENABLE_XPTI_TRACING=ON \
  -DCMAKE_INSTALL_PREFIX=$SYCL_BASE/minimal/install \
  -DSYCL_INCLUDE_TESTS=ON \
  -DLLVM_ENABLE_DOXYGEN=OFF \
  -DLLVM_ENABLE_SPHINX=OFF \
  -DBUILD_SHARED_LIBS=OFF \
  -DLLVM_BUILD_LLVM_DYLIB=OFF \
  -DLLVM_LINK_LLVM_DYLIB=OFF \
  -DLLVM_LIBDIR_SUFFIX=64 \
  -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1 \
  -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF \
  $SYCL_BASE/llvm/llvm

Now

@bader
Copy link
Contributor

bader commented Oct 11, 2020

There are no unexpected check-sycl-cuda test results anymore, so I'm closing this issue.
It's recommended to use configuration script from the buildbot directory for cmake configuration.

@bader bader closed this as completed Oct 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end
Projects
None yet
Development

No branches or pull requests

2 participants