Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on Linux x86_64 (cmake build, git 8f7e986184afb) #1806

Closed
ayounes-nviso opened this issue Oct 10, 2018 · 10 comments · Fixed by #1808
Closed

crash on Linux x86_64 (cmake build, git 8f7e986184afb) #1806

ayounes-nviso opened this issue Oct 10, 2018 · 10 comments · Fixed by #1808

Comments

@ayounes-nviso
Copy link

Hello,

With latest develop (8f7e986) with cmake build I can reproduce a crash on various ubuntu platform (16.04, 17.10 and 18.04); there is no crash on an earlier version (02ef20a), here is the backtrace:

Thread 1 "nv3dfi_video_cl" received signal SIGILL, Illegal instruction.
0x00007fffe4b11383 in sgemm_nn (args=0x7fffffffc4c0, range_m=0x0, range_n=0x0,
sa=0x7fffdcf1d000, sb=0x7fffdd03d000, dummy=0)
at src/driver/level3/level3.c:254
254 if ( alpha[0] == ZERO
(gdb) bt
#0 0x00007fffe4b11383 in sgemm_nn (args=0x7fffffffc4c0, range_m=0x0, range_n=0x0, sa=0x7fffdcf1d000, sb=0x7fffdd03d000, dummy=0)
at src/driver/level3/level3.c:254
#1 0x00007fffe4b1112c in cblas_sgemm (order=CblasRowMajor, TransA=CblasNoTrans, TransB=CblasNoTrans, m=10, n=4200, k=27, alpha=1, a=0x555555d22340, lda=27, b=0x555556236110, ldb=4200, beta=0, c=0x5555561e1b90, ldc=4200)
at src/interface/gemm.c:422

@ayounes-nviso
Copy link
Author

In Release mode I have a different backtrace.
additional info: I am compiling and using a static library libopenblas.a

Program received signal SIGILL, Illegal instruction.
0x00007fffe4d27fa2 in alloc_mmap ()
(gdb) bt
#0 0x00007fffe4d27fa2 in alloc_mmap ()
#1 0x00007fffe4d283ab in blas_memory_alloc ()
#2 0x00007fffe4cec06f in gotoblas_init ()
#3 0x00007ffff7de5733 in call_init (env=0x7fffffffdbb8, argv=0x7fffffffdb88, argc=5, l=) at dl-init.c:72
#4 0x00007ffff7de5733 in _dl_init (main_map=0x7ffff7ffe170, argc=5, argv=0x7fffffffdb88, env=0x7fffffffdbb8) at dl-init.c:119
#5 0x00007ffff7dd60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2

@martin-frbg
Copy link
Collaborator

What is your hardware ? Luckily "good" and "bad" versions appear to be only six days apart so this should be easy to track down, but the only changes directly affecting SGEMM were for SkylakeX only.

@ayounes-nviso
Copy link
Author

Currently on old Dell laptop, Sandy Bridge I think:
Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz

@martin-frbg
Copy link
Collaborator

Weird. Both compiling and running happens on this system ? (Slight hope that you might have built the library on Haswell without including DYNAMIC_ARCH support for other/older processors)

@brada4
Copy link
Contributor

brada4 commented Oct 10, 2018

https://valid.x86.fr/ynbz2s
extmodel is not handled, but should be Sandybridge from the picture?

@ayounes-nviso
Copy link
Author

Yes, compiling and running happens on this same system and DYNAMIC_ARCH is set to ON.

But the same crash happened on another system (Xeon or something) so I don't think this crash is specific to Sandybridge.

My guess is that it is related to the recent cmake changes for x64 ?

@ayounes-nviso
Copy link
Author

The crash also happened yesterday on those platforms:
Docker running on i7 9xx (Nehalem Class Core i7), gcc-6
Ubuntu 17.10 running on i7-6500U CPU (Skylake?), gcc-7

@martin-frbg
Copy link
Collaborator

Err, it is now dawning on me that the change to add -march-skylakex-avx512 "where required" may actually be adding it unconditionally rather than for that specific target...wonder why this did not break the CI builds though. Could you try removing the two lines from system_check.cmake that I added in #1798 ?

@ayounes-nviso
Copy link
Author

If I remove those 2 lines I get back the compilation failure (see #1797)

@martin-frbg
Copy link
Collaborator

Duh. But these lines actually belong in system.cmake, like

if (DEFINED TARGET AND ${TARGET} STREQUAL "SKYLAKEX" AND NOT NO_AVX512)
  set (CCOMMON_OPT "${CCOMMON_OPT} -march=skylake-avx512")
  set (FCOMMON_OPT "${FCOMMON_OPT} -march=skylake-avx512")
endif()

somewhere around line 42, before the if (DEFINED TARGET)
At least I think so until proven otherwise ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants