Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile OpenBLAS run test Segmentation fault #1097

Closed
xrjrunning opened this issue Feb 13, 2017 · 19 comments
Closed

Compile OpenBLAS run test Segmentation fault #1097

xrjrunning opened this issue Feb 13, 2017 · 19 comments

Comments

@xrjrunning
Copy link

xrjrunning commented Feb 13, 2017

I have searched related issue, couldn't resolve the problem. Any help or clue will be highly appreciated.
when compiling OpenBLAS on virtual machine, using
Command make TARGET=NEHALEM .

run into error
gfortran -g -Wall -m64 -g -o sblat3 sblat3.o ../libopenblas_nehalemp-r0.2.20.dev.a -lm -lpthread -lm -lpthread
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7fed09697162
#1 0x7fed0969789e
#2 0x7fed089959df
#3 0x7fed089e5781
#4 0x7fed089e53f5
#5 0x7fed096994a9
#6 0x403d93
#7 0x7fed0898284c
#8 0x401828

add make option DEBUG=1 NO_LAPACK=1 , gdb info
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:42
42 ../sysdeps/x86_64/multiarch/../strlen.S: no such file or directory.
in ../sysdeps/x86_64/multiarch/../strlen.S
Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-17.el6.x86_64
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:42
#1 0x00007ffff6e2a3f6 in __strdup (s=0xffffffffffffedfa <Address 0xffffffffffffedfa out of bounds>) at strdup.c:41
#2 0x00007ffff7ade4aa in _gfortrani_find_addr2line () at ../../../libgfortran/runtime/main.c:178
#3 0x0000000000403d94 in main ()
#4 0x00007ffff6dc784d in __libc_start_main (main=0x403d63

, argc=1, ubp_av=0x7fffffffe488, init=, fini=, rtld_fini=,
stack_end=0x7fffffffe478) at libc-start.c:258
#5 0x0000000000401829 in _start ()

my environment:
os:CentOS release 6.5 2.6.32-642.6.2.el6.x86_64
gcc:4.8.5
gfortran:4.8.5
cpuinfo:
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 2
siblings : 1
core id : 0
cpu cores : 1
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 3
siblings : 1
core id : 0
cpu cores : 1
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 4
siblings : 1
core id : 0
cpu cores : 1
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 5
siblings : 1
core id : 0
cpu cores : 1
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 6
siblings : 1
core id : 0
cpu cores : 1
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.3.0
stepping : 3
microcode : 1
cpu MHz : 2599.998
cache size : 4096 KB
physical id : 7
siblings : 1
core id : 0
cpu cores : 1
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ss syscall nx lm rep_good unfair_spinlock pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm tpr_shadow vnmi flexpriority ept
bogomips : 5199.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

@brada4
Copy link
Contributor

brada4 commented Feb 13, 2017

Please unmask CPUID in KVM/QEMU for automatic CPU detection.
Your CPU has AVX, probably TARGET=Sandybridge is better match.

@xrjrunning
Copy link
Author

Thanks, @brada4 I 'm using some EC2 host , don't know how to unmask CPUID in KVM/QEMU, I have tried TARGET=Sandybridge, run into the same problem.

@martin-frbg
Copy link
Collaborator

Very strange. How much memory does your EC2 instance have (though this should not be an issue) ? The backtrace looks to be from the innards of gfortran, trying to print the source line where the original error occured. If your setup starts qemu itself, you can try passing it the "-cpu host" option, but if you are only connecting to an already running qemu you are probably out of luck.

@brada4
Copy link
Contributor

brada4 commented Feb 13, 2017

running that debuginfo-install will add debuginfos to have correct glibc/libgcc backtraces (next run probably will ask for more)

@brada4
Copy link
Contributor

brada4 commented Feb 13, 2017

Libgcc is 4.4.7 while you claim gcc is 4.8.5.

@xrjrunning
Copy link
Author

@martin-frbg ,my EC2 instance have 16GB memory , memory may not be the cause, and I'm only connecting to an already running qemu.

@xrjrunning
Copy link
Author

@brada4 ,gcc version is indeed 4.8.5, some other guy upgrade from 4.4.7 . I will check why Libgcc is 4.4.7.
gcc --version
gcc (GCC) 4.8.5
Copyright © 2015 Free Software Foundation, Inc.

@brada4
Copy link
Contributor

brada4 commented Feb 14, 2017

Fix your compiler. Builtin centos 6 4.4.7 works perfectly fine.

@xrjrunning
Copy link
Author

It seems to be concerned with the compiler,gfortran can compile a simple fortran example program,but run into the same problem when run the compiled program, I don't know why , reinstall gcc ,and the problem remains. Does anyone have some clue? I have struggled with this problem for one day.

Starting program: /root/test
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
31 ../sysdeps/x86_64/multiarch/../strlen.S: no such file or directory.
in ../sysdeps/x86_64/multiarch/../strlen.S
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:31
#1 0x00007ffff70473f6 in __strdup (s=0xffffffffffffee11 <Address 0xffffffffffffee11 out of bounds>) at strdup.c:41
#2 0x00007ffff7ade4aa in _gfortrani_find_addr2line () at ../../../libgfortran/runtime/main.c:178
#3 0x00000000004008d9 in main ()
#4 0x00007ffff6fe484d in __libc_start_main (main=0x4008aa

, argc=1, ubp_av=0x7fffffffe488, init=, fini=, rtld_fini=,
stack_end=0x7fffffffe478) at libc-start.c:258
#5 0x00000000004006f9 in _start ()

@martin-frbg
Copy link
Collaborator

What does "gfortran -v " return ? gfortran or gcc-gfortran could be a separate package in your distribution so reinstalling gcc alone may not be sufficient

@xrjrunning
Copy link
Author

@martin-frbg gfortran -v returns below:
gfortran -v

COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-unknown-linux-gnu/4.8.5/lto-wrapper
Target:x86_64-unknown-linux-gnu
configuration:../configure --prefix=/usr
thread model:posix
gcc version 4.8.5 (GCC)

I reinstall gcc package from source contains gfortran.

@martin-frbg
Copy link
Collaborator

libgcc package also the correct version now ? (Though the initial system hint about installing the debuginfo for version "4.4.7-17" may have been misleading if gcc was updated from source rather than via the package mechanism). Does running "ldd" on your test program show libraries from the expected path ?

@brada4
Copy link
Contributor

brada4 commented Feb 14, 2017

Probably you have two copies of compilers and libraries still around. Redhat and centos provides newer compilers via 'devtoolset' like at https://softwarecollections.org/
Test program crashes way before openblas gets loaded.

@xrjrunning
Copy link
Author

@martin-frbg ,libgcc is the correct version now.ldd showes libraries from the expected path.
the test fortran program runs Segmentation fault,ldd result:
ldd test
linux-vdso.so.1 (0x00007ffdfd9e6000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fedb75e8000)
libm.so.6 => /lib64/libm.so.6 (0x00007fedb72e9000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fedb70d3000)
libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fedb6e98000)
libc.so.6 => /lib64/libc.so.6 (0x00007fedb6ae8000)
/lib64/ld-linux-x86-64.so.2 (0x00007fedb7913000)
the hellowolrd c program runs successfully, ldd result:
ldd hello
linux-vdso.so.1 (0x00007ffc7e4cb000)
libc.so.6 => /lib64/libc.so.6 (0x00007f2383d8e000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2384150000)

the content of test fortran program
program main

write(,) "\nThis is the first test program : Fortran 95,by Gfortran\n"

stop
end

@xrjrunning
Copy link
Author

@brada4 ,yes ,the test program concerns with the fortran environment ,even simple fortran example without openblas runs failed, probably two copies of compilers and libraries still around,but ldd result doesn't show that.

@brada4
Copy link
Contributor

brada4 commented Feb 15, 2017

I suggest re-setting system and getting back standard working fortran compiler.
replacing RPM-installed libraries with homebrew always leads to RPM and YUM errors later (nothing to do with OpenBLAS, just a lesson of life)
You can download openblas packaged by fedora (using standard compiler):
$sudo yum install epel-release
$ sudo yum install openblas-devel

@martin-frbg
Copy link
Collaborator

If you cannot remove the compiler mixup for some reason, look in /usr/lib/x86_64-linux-gnu (or wherever you expect the 4.8.5 compiler to be installed) if if there is a libgfortran.so.3 that differs from what is in /usr/lib64 (and if so, try export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libgfortran.so.3 before running your test program).

@xrjrunning
Copy link
Author

@martin-frbg @brada4 Thanks very much for all your help, I think the cause of the problem is the incompatiblity between CentOS6.5 and glibc2.17 . When I try glibc2.18, all the system commands segment fault. So I would try re-seting system, maybe try CentOS 7.
Thanks for your help,guys!

@martin-frbg
Copy link
Collaborator

Closing as not an OpenBLAS bug then, good luck :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants