Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent segfault from getenv #716

Closed
dedalusj opened this issue Dec 11, 2015 · 12 comments
Closed

Intermittent segfault from getenv #716

dedalusj opened this issue Dec 11, 2015 · 12 comments
Assignees

Comments

@dedalusj
Copy link

I experience intermittent segmentation fault from OpenBlas when used through numpy.

I got a core dump and the backtrace is the following:

#0  0x00007fba2b950d5d in getenv () from /lib64/libc.so.6
#1  0x00007fba21ce7e21 in blas_set_parameter () from /opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#2  0x00007fba21ce6d91 in blas_memory_alloc () from /opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#3  0x00007fba21ce74e5 in blas_thread_server () from /opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#4  0x00007fba2c0cef18 in start_thread () from /lib64/libpthread.so.0
#5  0x00007fba2b9fdb2d in clone () from /lib64/libc.so.6

I noticed the same backtrace coming from various code paths.

The version of OpenBlas is 0.2.14. If it makes your life easier I got it through the conda packaging system and their build number is 3.

@brada4
Copy link
Contributor

brada4 commented Dec 11, 2015

There should be one more line on top of backtrace (bt), and (gdb) thread apply all bt .. may be more useable.
It looks like your libc.so.6 is corrupt, or openblas is built against radically different libc version.

@martin-frbg
Copy link
Collaborator

Could also be that the actual memory corruption happens before this, and the code only blows up at the next malloc. Can you update your OpenBLAS to 0.2.15 (or even git develop branch snapshot), and/or run your code from valgrind in the hope of catching any earlier corruption ?

@xianyi
Copy link
Collaborator

xianyi commented Dec 11, 2015

Interesting. I never meet segfault from getenv before.

@brada4
Copy link
Contributor

brada4 commented Dec 11, 2015

@martin-frbg this thread is new, some other already is in weird state.

@martin-frbg
Copy link
Collaborator

@brada4 yes I see that. My take is that memory management got trashed before this point - which went unnoticed until the next allocation, which happens to be the tiny amount of memory needed for the OMP_NUM_THREADS value or whatever getenv() is sent to fetch on creation of a new thread. So the current backtrace only tells us that there is a serious problem "somewhere", perhaps not even through OpenBLAS' fault.

@dedalusj
Copy link
Author

I have pasted the result of "(gdb) thread apply all bt" below. Also we installed the debug symbols for glibc in the Amazon EC2 instances so we got better info about crash in getenv

Thread 2 (Thread 0x7f3911c4c700 (LWP 27553)):
#0  0x00007f3910d62255 in _xstat () from /lib64/libc.so.6
#1  0x00007f3911766420 in stat (__statbuf=0x7fff271b06a0, __path=0x235a210 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/arrayprint")
    at /usr/include/sys/stat.h:436
#2  isdir (path=0x235a210 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/arrayprint") at Python/import.c:133
#3  find_module (fullname=0x23527e0 "numpy.core.arrayprint", subname=<optimized out>, path=<optimized out>, buf=0x235b220 "arrayprint", buflen=4097, p_fp=0x7fff271b1750, p_loader=0x7fff271b1758)
    at Python/import.c:1501
#4  0x00007f3911767f48 in import_submodule (mod=0x7f3908bedf68, subname=0x23527eb "arrayprint", fullname=0x23527e0 "numpy.core.arrayprint") at Python/import.c:2693
#5  0x00007f39117681f4 in load_next (mod=0x7f3908bedf68, altmod=0x7f3908bedf68, p_name=<optimized out>, buf=0x23527e0 "numpy.core.arrayprint", p_buflen=0x7fff271b1810) at Python/import.c:2519
#6  0x00007f3911768820 in import_module_level (level=<optimized out>, fromlist=0x7f3908329960, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2228
#7  PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3908329960, level=<optimized out>) at Python/import.c:2292
#8  0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#9  0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#10 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f390832b770, kw=<optimized out>) at Python/ceval.c:4219
#11 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#12 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3908345930, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#13 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#14 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x2327130 "numpy.core.numeric", co=0x7f3908345930, 
    pathname=0x23063c0 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/numeric.pyc") at Python/import.c:713
#15 0x00007f39117671ce in load_source_module (name=0x2327130 "numpy.core.numeric", 
    pathname=0x23063c0 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/numeric.pyc", fp=<optimized out>) at Python/import.c:1103
#16 0x00007f3911767f81 in import_submodule (mod=0x7f3908bedf68, subname=0x7f3908c07564 "numeric", fullname=0x2327130 "numpy.core.numeric") at Python/import.c:2704
#17 0x00007f39117684bc in ensure_fromlist (mod=0x7f3908bedf68, fromlist=0x7f3908c0f0d0, buf=0x2327130 "numpy.core.numeric", buflen=10, recursive=0) at Python/import.c:2610
#18 0x00007f391176895c in import_module_level (level=<optimized out>, fromlist=0x7f3908c0f0d0, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2273
#19 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3908c0f0d0, level=<optimized out>) at Python/import.c:2292
#20 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#21 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#22 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3908e2ed10, kw=<optimized out>) at Python/ceval.c:4219
#23 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#24 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3908bebb30, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#25 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#26 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x23000c0 "numpy.core", co=0x7f3908bebb30, 
    pathname=0x23030f0 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/__init__.pyc") at Python/import.c:713
#27 0x00007f39117671ce in load_source_module (name=0x23000c0 "numpy.core", pathname=0x23030f0 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/__init__.pyc", 
    fp=<optimized out>) at Python/import.c:1103
#28 0x00007f3911767a2a in load_package (name=0x23000c0 "numpy.core", pathname=<optimized out>) at Python/import.c:1170
#29 0x00007f3911767f81 in import_submodule (mod=0x7f3908c0e210, subname=0x23000c6 "core", fullname=0x23000c0 "numpy.core") at Python/import.c:2704
#30 0x00007f39117681f4 in load_next (mod=0x7f3908c0e210, altmod=0x7f3908c0e210, p_name=<optimized out>, buf=0x23000c0 "numpy.core", p_buflen=0x7fff271b23b0) at Python/import.c:2519
#31 0x00007f3911768860 in import_module_level (level=<optimized out>, fromlist=0x7f3911a05cd0 <_Py_NoneStruct>, locals=<optimized out>, globals=<optimized out>, name=0x7f3908bedf07 "numeric")
    at Python/import.c:2236
#32 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3911a05cd0 <_Py_NoneStruct>, level=<optimized out>) at Python/import.c:2292
#33 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#34 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#35 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3908e2ecb0, kw=<optimized out>) at Python/ceval.c:4219
#36 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#37 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3908c0a5b0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#38 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#39 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x22f7e50 "numpy.lib.type_check", co=0x7f3908c0a5b0, 
    pathname=0x22fbe90 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/lib/type_check.pyc") at Python/import.c:713
#40 0x00007f39117671ce in load_source_module (name=0x22f7e50 "numpy.lib.type_check", 
    pathname=0x22fbe90 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/lib/type_check.pyc", fp=<optimized out>) at Python/import.c:1103
#41 0x00007f3911767f81 in import_submodule (mod=0x7f3908bed8d8, subname=0x22f7e5a "type_check", fullname=0x22f7e50 "numpy.lib.type_check") at Python/import.c:2704
#42 0x00007f39117681f4 in load_next (mod=0x7f3908bed8d8, altmod=0x7f3908bed8d8, p_name=<optimized out>, buf=0x22f7e50 "numpy.lib.type_check", p_buflen=0x7fff271b2950) at Python/import.c:2519
#43 0x00007f3911768820 in import_module_level (level=<optimized out>, fromlist=0x7f3908c02790, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2228
#44 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3908c02790, level=<optimized out>) at Python/import.c:2292
#45 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#46 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#47 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3908e2eb30, kw=<optimized out>) at Python/ceval.c:4219
#48 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#49 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3908beb530, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#50 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#51 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x22f3b00 "numpy.lib", co=0x7f3908beb530, 
    pathname=0x22f6b30 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/lib/__init__.pyc") at Python/import.c:713
#52 0x00007f39117671ce in load_source_module (name=0x22f3b00 "numpy.lib", pathname=0x22f6b30 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/lib/__init__.pyc", 
    fp=<optimized out>) at Python/import.c:1103
#53 0x00007f3911767a2a in load_package (name=0x22f3b00 "numpy.lib", pathname=<optimized out>) at Python/import.c:1170
#54 0x00007f3911767f81 in import_submodule (mod=0x7f3908c0e210, subname=0x22f3b06 "lib", fullname=0x22f3b00 "numpy.lib") at Python/import.c:2704
#55 0x00007f39117681f4 in load_next (mod=0x7f3908c0e210, altmod=0x7f3908c0e210, p_name=<optimized out>, buf=0x22f3b00 "numpy.lib", p_buflen=0x7fff271b2f40) at Python/import.c:2519
#56 0x00007f3911768860 in import_module_level (level=<optimized out>, fromlist=0x7f3908c02110, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2236
#57 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3908c02110, level=<optimized out>) at Python/import.c:2292
#58 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#59 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#60 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3908e2e830, kw=<optimized out>) at Python/ceval.c:4219
#61 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#62 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3908beb430, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#63 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#64 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x22b7380 "numpy.add_newdocs", co=0x7f3908beb430, 
    pathname=0x2290560 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/add_newdocs.pyc") at Python/import.c:713
#65 0x00007f39117671ce in load_source_module (name=0x22b7380 "numpy.add_newdocs", pathname=0x2290560 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/add_newdocs.pyc", 
    fp=<optimized out>) at Python/import.c:1103
#66 0x00007f3911767f81 in import_submodule (mod=0x7f3908c0e210, subname=0x7f3908bf5fe4 "add_newdocs", fullname=0x22b7380 "numpy.add_newdocs") at Python/import.c:2704
#67 0x00007f39117684bc in ensure_fromlist (mod=0x7f3908c0e210, fromlist=0x7f3908be8650, buf=0x22b7380 "numpy.add_newdocs", buflen=5, recursive=0) at Python/import.c:2610
#68 0x00007f391176895c in import_module_level (level=<optimized out>, fromlist=0x7f3908be8650, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2273
#69 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3908be8650, level=<optimized out>) at Python/import.c:2292
#70 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#71 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#72 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3911afffb0, kw=<optimized out>) at Python/ceval.c:4219
#73 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#74 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3909508230, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#75 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#76 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x2222130 "numpy", co=0x7f3909508230, 
    pathname=0x22b6370 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/__init__.pyc") at Python/import.c:713
#77 0x00007f39117671ce in load_source_module (name=0x2222130 "numpy", pathname=0x22b6370 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/__init__.pyc", 
    fp=<optimized out>) at Python/import.c:1103
#78 0x00007f3911767a2a in load_package (name=0x2222130 "numpy", pathname=<optimized out>) at Python/import.c:1170
#79 0x00007f3911767f81 in import_submodule (mod=0x7f3911a05cd0 <_Py_NoneStruct>, subname=0x2222130 "numpy", fullname=0x2222130 "numpy") at Python/import.c:2704
#80 0x00007f3911768235 in load_next (mod=0x7f39095047f8, altmod=0x7f3911a05cd0 <_Py_NoneStruct>, p_name=<optimized out>, buf=0x2222120 "sonar_intensity.numpy", p_buflen=0x7fff271b3ae0)
    at Python/import.c:2523
#81 0x00007f3911768820 in import_module_level (level=<optimized out>, fromlist=0x7f3911a05cd0 <_Py_NoneStruct>, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2228
#82 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3911a05cd0 <_Py_NoneStruct>, level=<optimized out>) at Python/import.c:2292
#83 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#84 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#85 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f39092cb838, kw=<optimized out>) at Python/ceval.c:4219
#86 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#87 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f39095020b0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#88 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#89 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x7f3908e538c4 "sonar_intensity.determine_bins", co=0x7f39095020b0, 
    pathname=0x7f3911aeda64 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/sidescananalysis-9.7.1.29-py2.7.egg/sonar_intensity/determine_bins.py") at Python/import.c:713
#90 0x00007f39117a052e in zipimporter_load_module (obj=<optimized out>, args=<optimized out>) at ./Modules/zipimport.c:360
#91 0x00007f391169ed23 in PyObject_Call (func=0x7f3908e47d40, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#92 0x00007f391169ee11 in call_function_tail (callable=0x7f3908e47d40, args=0x7f3908e33090) at Objects/abstract.c:2578
#93 0x00007f39116a31b8 in PyObject_CallMethod (o=<optimized out>, name=<optimized out>, format=<optimized out>) at Objects/abstract.c:2653
#94 0x00007f3911767f81 in import_submodule (mod=0x7f39095047f8, subname=0x22259e0 "determine_bins", fullname=0x22259d0 "sonar_intensity.determine_bins") at Python/import.c:2704
#95 0x00007f39117681f4 in load_next (mod=0x7f39095047f8, altmod=0x7f39095047f8, p_name=<optimized out>, buf=0x22259d0 "sonar_intensity.determine_bins", p_buflen=0x7fff271b4140) at Python/import.c:2519
#96 0x00007f3911768860 in import_module_level (level=<optimized out>, fromlist=0x7f3911aa06d0, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2236
#97 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3911aa06d0, level=<optimized out>) at Python/import.c:2292
#98 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#99 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#100 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f39092cb7e0, kw=<optimized out>) at Python/ceval.c:4219
#101 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#102 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3911a9d0b0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#103 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#104 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x7f39094f4fa4 "sonar_intensity", co=0x7f3911a9d0b0, 
    pathname=0x7f3911aedd44 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/sidescananalysis-9.7.1.29-py2.7.egg/sonar_intensity/__init__.py") at Python/import.c:713
#105 0x00007f39117a052e in zipimporter_load_module (obj=<optimized out>, args=<optimized out>) at ./Modules/zipimport.c:360
#106 0x00007f391169ed23 in PyObject_Call (func=0x7f3908e47d88, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#107 0x00007f391169ee11 in call_function_tail (callable=0x7f3908e47d88, args=0x7f39094f0350) at Objects/abstract.c:2578
#108 0x00007f39116a31b8 in PyObject_CallMethod (o=<optimized out>, name=<optimized out>, format=<optimized out>) at Objects/abstract.c:2653
#109 0x00007f3911767f81 in import_submodule (mod=0x7f3911a05cd0 <_Py_NoneStruct>, subname=0x22b98e0 "sonar_intensity", fullname=0x22b98e0 "sonar_intensity") at Python/import.c:2704
#110 0x00007f39117681f4 in load_next (mod=0x7f3911a05cd0 <_Py_NoneStruct>, altmod=0x7f3911a05cd0 <_Py_NoneStruct>, p_name=<optimized out>, buf=0x22b98e0 "sonar_intensity", p_buflen=0x7fff271b47a0)
    at Python/import.c:2519
#111 0x00007f3911768820 in import_module_level (level=<optimized out>, fromlist=0x7f3911b10b10, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2228
#112 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3911b10b10, level=<optimized out>) at Python/import.c:2292
#113 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#114 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#115 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3911b01c58, kw=<optimized out>) at Python/ceval.c:4219
#116 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#117 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3911b02030, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#118 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#119 0x00007f3911764a82 in PyImport_ExecCodeModuleEx (name=0x7f3911b1655c "sonar_signal", co=0x7f3911b02030, 
    pathname=0x7f3911af2444 "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/sidescananalysis-9.7.1.29-py2.7.egg/sonar_signal.py") at Python/import.c:713
#120 0x00007f39117a052e in zipimporter_load_module (obj=<optimized out>, args=<optimized out>) at ./Modules/zipimport.c:360
#121 0x00007f391169ed23 in PyObject_Call (func=0x7f3911b0d560, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#122 0x00007f391169ee11 in call_function_tail (callable=0x7f3911b0d560, args=0x7f3911af6390) at Objects/abstract.c:2578
#123 0x00007f39116a31b8 in PyObject_CallMethod (o=<optimized out>, name=<optimized out>, format=<optimized out>) at Objects/abstract.c:2653
#124 0x00007f3911767f81 in import_submodule (mod=0x7f3911a05cd0 <_Py_NoneStruct>, subname=0x22004a0 "sonar_signal", fullname=0x22004a0 "sonar_signal") at Python/import.c:2704
#125 0x00007f39117681f4 in load_next (mod=0x7f3911a05cd0 <_Py_NoneStruct>, altmod=0x7f3911a05cd0 <_Py_NoneStruct>, p_name=<optimized out>, buf=0x22004a0 "sonar_signal", p_buflen=0x7fff271b4e00)
    at Python/import.c:2519
#126 0x00007f3911768820 in import_module_level (level=<optimized out>, fromlist=0x7f3911b52c90, locals=<optimized out>, globals=<optimized out>, name=0x0) at Python/import.c:2228
#127 PyImport_ImportModuleLevel (name=<optimized out>, globals=<optimized out>, locals=<optimized out>, fromlist=0x7f3911b52c90, level=<optimized out>) at Python/import.c:2292
#128 0x00007f391174814f in builtin___import__ (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at Python/bltinmodule.c:49
#129 0x00007f391169ed23 in PyObject_Call (func=0x7f3911c3efc8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2546
#130 0x00007f3911748633 in PyEval_CallObjectWithKeywords (func=0x7f3911c3efc8, arg=0x7f3911b3f788, kw=<optimized out>) at Python/ceval.c:4219
#131 0x00007f391174d29e in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2622
#132 0x00007f3911752a2e in PyEval_EvalCodeEx (co=0x7f3911afd330, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582
#133 0x00007f3911752b42 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:669
#134 0x00007f3911773050 in run_mod (arena=0x2166370, flags=0x7fff271b5320, locals=0x7f3911be0168, globals=0x7f3911be0168, filename=<optimized out>, mod=0x2209f20) at Python/pythonrun.c:1370
#135 PyRun_FileExFlags (fp=0x220e8c0, filename=<optimized out>, start=<optimized out>, globals=0x7f3911be0168, locals=0x7f3911be0168, closeit=1, flags=0x7fff271b5320) at Python/pythonrun.c:1356
#136 0x00007f391177322f in PyRun_SimpleFileExFlags (fp=0x220e8c0, filename=0x7fff271b741e "/opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/bin/SonarSignal", closeit=1, flags=0x7fff271b5320)
    at Python/pythonrun.c:948
#137 0x00007f3911788b74 in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:645
#138 0x00007f3910cae7d5 in __libc_start_main () from /lib64/libc.so.6
#139 0x0000000000400649 in _start ()

Thread 1 (Thread 0x7f3906922700 (LWP 27554)):
#0  __GI_getenv (name=0x7f3907894dc5 "TO_BLOCK_FACTOR") at getenv.c:89
#1  0x00007f3907059e21 in blas_set_parameter () from /opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#2  0x00007f3907058d91 in blas_memory_alloc () from /opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#3  0x00007f39070594e5 in blas_thread_server () from /opt/apps/sidescananalysis-9.7.1-29-gc2e684d+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#4  0x00007f3911440f18 in start_thread (arg=0x7f3906922700) at pthread_create.c:308
#5  0x00007f3910d6fb2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

@dedalusj
Copy link
Author

It seems that OpenBlas is initialised on a thread and calls getenv to access the value of GOTO_BLOCK_FACTOR. The problem is that getenv doesn't seem to be thread safe. If another thread is calling setenv OpenBlas ends up seg faulting.

On a different occasion I got the following backtraces (there is more in thread 4 but I pasted the important bits):

Thread 4 (Thread 0x7fd3f2de9700 (LWP 12337)):
#0  0x00007fd3f1ea3ce7 in __GI___libc_realloc (oldmem=oldmem@entry=0x1d4d880, bytes=496) at malloc.c:3000
#1  0x00007fd3f1e5ff50 in __add_to_environ (name=0x7fff3eab2ce0 "AWS_IAM_HOME", value=value@entry=0x0, combined=combined@entry=0x7fd3e9d415ac "AWS_IAM_HOME=/opt/aws/apitools/iam", replace=replace@entry=1)
    at setenv.c:142
#2  0x00007fd3f1e5fdfd in putenv (string=0x7fd3e9d415ac "AWS_IAM_HOME=/opt/aws/apitools/iam") at putenv.c:78
#3  0x00007fd3f292aff7 in posix_putenv (self=<optimized out>, args=0x7fd3e9fd8f38) at ./Modules/posixmodule.c:7213
#4  0x00007fd3f28eee15 in call_function (oparg=<optimized out>, pp_stack=0x7fff3eab2e78) at Python/ceval.c:4350
#5  PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2987
#6  0x00007fd3f28efa2e in PyEval_EvalCodeEx (co=0x7fd3f2d95530, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=3, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3582

Thread 1 (Thread 0x7fd3e85af700 (LWP 12340)):
#0  __GI_getenv (name=0x7fd3e930bdc5 "TO_BLOCK_FACTOR") at getenv.c:89
#1  0x00007fd3e8ad0e21 in blas_set_parameter () from /opt/apps/sidescananalysis-9.7.1-42-g1266628+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#2  0x00007fd3e8acfd91 in blas_memory_alloc () from /opt/apps/sidescananalysis-9.7.1-42-g1266628+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#3  0x00007fd3e8ad04e5 in blas_thread_server () from /opt/apps/sidescananalysis-9.7.1-42-g1266628+dev/lib/python2.7/site-packages/numpy/core/../../../../libopenblas.so.0
#4  0x00007fd3f25ddf18 in start_thread (arg=0x7fd3e85af700) at pthread_create.c:308
#5  0x00007fd3f1f0cb2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

@c42f
Copy link

c42f commented Dec 14, 2015

Looking into this with @dedalusj, I think the detailed chain is something like:

  1. python loads libopenblas.so
  2. This calls the method gotoblas_init(), via the gcc __attribute__ ((constructor(101))) auto initialization stuff
  3. gotoblas_init() calls through to blas_thread_init()
  4. blas_thread_init() creates a bunch of worker threads using pthread_create(), passing blas_thread_server() as the start_routine function
  5. The init thread returns, and we bubble back into python code

Now the race condition - the following occur simultaneously:

  • Some random python code (boto3 in one case we encountered) causes a call to setenv()
  • One of the openblas threads gets into blas_thread_server() which hits getenv() via the call chain you see in the backtrace above.

Now, getenv() is not thread safe, so I think the bug here is ultimately in blas_set_parameter: any environment variables should be read on the main thread calling gotoblas_init(), rather than in the worker threads.

Reading the glibc source, setenv can realloc the array of environment variables; if it does this, getenv will be left with a stale pointer while traversing the environment array. To fit in with this story, in one of our core files, the variable ep is stale - see https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/stdlib/getenv.c#L89

@c42f
Copy link

c42f commented Dec 14, 2015

If all of the above is correct, I think the fix is something like calling getenv() in gotoblas_init(), stash the environment variables in some global state, and refer to those from blas_set_parameter() to avoid ever calling getenv() again.

@xianyi
Copy link
Collaborator

xianyi commented Mar 8, 2016

@c42f , thank you for the suggestion. Work on this issue.

@xianyi xianyi self-assigned this Mar 8, 2016
@c42f
Copy link

c42f commented Mar 11, 2016

@xianyi thanks a lot. I'm fairly satisfied we had the root cause in the above, but I'm afraid we never managed to get a minimal reproducible test case so testing will be difficult. @dedalusj do we still have the infrastructure in place to rerun our large scale code when the changes here flow through into numpy?

@dedalusj
Copy link
Author

@c42f Sure no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants