Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with Julia MKL build (library conflict) #443

Open
EthanAnderes opened this issue Oct 11, 2017 · 18 comments
Open

Segfault with Julia MKL build (library conflict) #443

EthanAnderes opened this issue Oct 11, 2017 · 18 comments

Comments

@EthanAnderes
Copy link

With a recent upgrade of anaconda I'm getting seg faults with PyCall (and PyPlot). Reading the other recent issues it appears some are having the same problem but my output is a bit different and the workarounds for those don't seem to be helping any. Hope I'm not adding noise to something that is already known.

If I run the code from 6423 I just get a straight seg fault.

julia> using PyCall

julia> pyimport("numpy.linalg")["inv"]([2 1; 1 2])
Segmentation fault: 11

calling directly from python works fine.

Python 3.6.2 |Anaconda, Inc.| (default, Sep 21 2017, 18:29:43)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.linalg.inv(np.matrix('2 1; 1 2'))
matrix([[ 0.66666667, -0.33333333],
        [-0.33333333,  0.66666667]])

I've tried to work around it with what appears to work for some people. However I'm still getting seg faults. In particular, I tried the following without success.

julia> Libdl.dlopen("/Users/ethananderes/Software/anaconda3/lib/libiomp5.dylib")
julia> Libdl.dlopen("/Users/ethananderes/Software/anaconda3/lib/libmkl_intel_thread.dylib")
install_name_tool -change @rpath/libiomp5.dylib @loader_path/libiomp5.dylib /Users/ethananderes/Software/anaconda3/lib/libmkl_intel_thread.dylib

install_name_tool -change @rpath/libiomp5.dylib @loader_path/libiomp5.dylib /Users/ethananderes/Software/anaconda3/lib/libiomp5.dylib

Any ideas what is going on here?

Some possibly relevant info:

julia> using PyCall

julia> PyCall.libpython
"/Users/ethananderes/Software/anaconda3/lib/libpython3.6m"

julia> versioninfo()
Julia Version 0.6.1-pre.92
Commit 389b23cf6e* (2017-10-07 01:18 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.0.0)
  CPU: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
@EthanAnderes
Copy link
Author

... reference for anyone having a similar problem. I finally found a workaround that might point to where the problem is.

If I downgrade just conda's mkl (using conda install mkl=2017.0.4) then PyCall and PyPlot no longer segfault.

Remark 1: I was only seeing the segfault with Julia compiled with MKL for BLAS and FFT (and using PyCall set to my local anaconda3 install). The downgrade of conda's mkl fixes things so it makes me wonder if there is some mkl interaction with the Julia and conda (??)

Remark 2: running conda install mkl=2017.0.4 downgrades a few other packages as well. In particular, mkl-service, numpy, scikit-learn and scipy. Just a heads up to others.

Remark 3: The only reason I took a stab at downgrading conda's mkl is that I happened to read this... just for reference to those who know more about this stuff.

@stevengj
Copy link
Member

Probably Julia and NumPy are linked to incompatible versions of MKL .... there's not much to do about this either than (a) make sure they use the same MKL versions or (b) don't use MKL in one or both of them (e.g. switch one to use OpenBLAS).

@EthanAnderes
Copy link
Author

Yeah, although I'm not sure I understand why NumPy is reaching out to the MKL library Julia is linked to, e.g. when I compile Julia with OpenBLAS, NumPy has no problem calling it's own MKL.

Anyway, I've got a working solution now and hopefully the MKL incompatibility will work itself out as I NumPy updates it's MKL ... so I'm fine with closing this. Thanks!

@stevengj
Copy link
Member

stevengj commented Jan 17, 2018

When Julia is compiled with OpenBLAS, all of the BLAS symbols have a special suffix so that they don't conflict with other BLAS libraries (JuliaLang/julia#8734). We can't (easily) do this with MKL because we don't compile MKL ourselves.

@stevengj stevengj changed the title Seg fault: 11 with anaconda upgrade Segfault with Julia MKL build (library conflict) Jan 17, 2018
@stevengj
Copy link
Member

See also #65.

@mattcbro
Copy link

I'm getting the same sort of segfault problem in mkl as soon as I try to use PyPlot,
julia> include("calsim.jl")

signal (11): Segmentation fault
while loading /data/projects/Maestro/calsim.jl, in expression starting on line 59
mkl_blas_avx2_dgemm_kernel_nocopy_NN_b0 at /home/matt/programs/juliapro/JuliaPro-0.6.2.2/Julia/bin/../lib/libmkl_avx2.so (unknown line)
Allocations: 14076406 (Pool: 14074321; Big: 2085); GC: 32

line 59 is the first call to figure() within PyPlot.
Unfortunately the downgrade of mkl using conda did not work for me. I noticed the environment shipped with julia 0.6.2.2 didn't have mkl installed for anaconda either, so I tried to install the recommended mkl to see if that would fix it.

As a result I have a broken PyCall. I'm going to try the non MKL version since the MKL version of juliapro-0.6.2.2 is not working for me.

@carstenbauer
Copy link
Contributor

@stevengj Has it been verified that linking NumPy and Julia to the same MKL libraries works smoothly? Because I just tried exactly this and still get Seg faults.

(I built Julia and linked it against my manually installed MKL. I built numpy and linked it against the same MKL installation (specified the path in site.cfg and verified it afterwards in python shell).)

@stevengj
Copy link
Member

stevengj commented Jul 17, 2018

@crstnbr, doesn't Julia link the ILP64 MKL interface by default, whereas numpy uses the LP64 interface (numpy/numpy#5906)?

I think you probably need to compile Julia with USE_BLAS64=0 in order to use the LP64 MKL if you want it to be compatible with Numpy.

@JobJob
Copy link
Contributor

JobJob commented Dec 18, 2018

@crstnbr I had your exact problem - same MKL lib used in numpy as was used to compile julia, but segfault with:

using PyCall
py"""
import numpy as np

A1 = np.random.random((3,3))
A2 = np.random.random((3,3))
np.matmul(A1,A2)
"""

even though the same code worked in Python (i.e. the version of Python that build PyCall found as set via ENV["PYTHON"])

It now seems to be fixed by adding USE_BLAS64=0 to my to my Make.user file (as per the comment immediately above) and recompiling julia, i.e. just a make -j8 julia. I didn't even have to make clean.

Thanks @stevengj !

@JobJob
Copy link
Contributor

JobJob commented Dec 19, 2018

Oh and the other thing I had to do earlier was add the directory that has libiomp5.so on my LD_LIBRARY_PATH, the path was different to the mkl path, with something like

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/compilers_and_libraries_2018.2.199/linux/compiler/lib/intel64_lin

in my ~/.profile

@tkf
Copy link
Member

tkf commented Dec 19, 2018

Can the ABI incompatibility be detected before segfault? Does Julia have runtime access for the build option USE_BLAS64? What about for Numpy? It would be nice if we can print an informative error.

I guess we can at least run Julia subprocess at build time and run Numpy operation that is known to segfault for incompatible ABI. But it's not a great option as Numpy can be updated at any moment. Also, numpy in virtualenv #578 cannot be supported this way.

@carstenbauer
Copy link
Contributor

carstenbauer commented Dec 19, 2018

@JobJob: Thanks for your post! I can confirm that USE_BLAS64 = 0 solved this issue! It even worked (as far as I checked) with standard numpy binaries.

However, I can't find much information on what USE_BLAS64 = 0 actually does and what side effects, apart from solving this issue, it has. Does anyone know more or can point me somewhere?

@stevengj
Copy link
Member

stevengj commented Dec 19, 2018

@crstnbr, USE_BLAS64=0 means that it assumes that the BLAS (matrix-multiplication etc.) library is compiled to use 32-bit integers for its interfaces. This means that you can't do linear-algebra operations on matrices or vectors with more than 2³¹–1 (≈2×10⁹) elements, even on a 64-bit machine.

(The reason for this mess is that the BLAS interface was defined to use integer sizes in Fortran in the days when everyone thought that the default integer size would match the address size on the machine. In the upgrade to 64-bit architectures, however, integer stayed 32 bits, and so did the default BLAS interface. Most BLAS libraries give the option of compiling with 64-bit integers instead, but since the symbols are not renamed the 64-bit and 32-bit libraries conflict. With OpenBLAS, we solved this problem by renaming the symbols, but this was not an option with MKL.)

@JobJob
Copy link
Contributor

JobJob commented Dec 19, 2018

And Numpy uses the 32-bit integers (LP64), whereas Julia without USE_BLAS64=0 uses 64-bit integers (ILP64) right?

I saw you linked this issue somewhere numpy/numpy#5906 - oh just above :D, got a bit lost in the nest of related issues I waded through while trying to solve this.

@RoyiAvital
Copy link

RoyiAvital commented Apr 22, 2020

@stevengj , Regarding your saying:

@crstnbr, USE_BLAS64=0 means that it assumes that the BLAS (matrix-multiplication etc.) library is compiled to use 32-bit integers for its interfaces. This means that you can't do linear-algebra operations on matrices or vectors with more than 2³¹–1 (≈2×10⁹) elements, even on a 64-bit machine.

What about Sparse Matrix? In that case, how do we count the elements? As the number of non zero elements or the size of the Matrix?

@carstenbauer
Copy link
Contributor

Sparse matrices aren't handled by LAPACK/BLAS but SuiteSparse, so they shouldn't be effected by this. In the case of a regular matrix, it's the size of the matrix that matters.

(Please correct me if I'm wrong.)

@ghost
Copy link

ghost commented Aug 28, 2021

Hi guys. I also face this problem. Can anyone tell me how to set USE_BLAS64 = 0? I just have very limited programming knowledge and I am new to julia. Thanks.

@carstenbauer
Copy link
Contributor

Easy solution should be to use Julia 1.7 (beta) and MKL.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants