Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet MKL conflicts with other MKL optimized packages #12661

Closed
leezu opened this issue Sep 25, 2018 · 4 comments
Closed

MXNet MKL conflicts with other MKL optimized packages #12661

leezu opened this issue Sep 25, 2018 · 4 comments

Comments

@leezu
Copy link
Contributor

leezu commented Sep 25, 2018

Description

For MXNet with MKL for BLAS, when importing and using mxnet in python but also using other python packages that separately link with MKL, the program will crash as having two MKLs linked is not supported by Intel. An example for a package being incompatible with MXNet with MKL is Numpy. This may be due to MKL being statically linked?

This is a serious issue as it makes usage of optimized MXNet build with optimized numpy build impossible.

Environment info (Required)

% python diagnose.py                                                                                                                                                         2m 12s ~ ip-172-31-91-127
----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 7.2.0
Build        : ('default', 'Jan 16 2018 18:10:19')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 10.0.1
Directory    : /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.0
Directory    : /home/ubuntu/.local/lib/python3.6/site-packages/mxnet
Commit Hash   : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
----------System Info----------
Platform     : Linux-4.4.0-1067-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-91-127
release      : 4.4.0-1067-aws
version      : #77-Ubuntu SMP Mon Aug 27 13:22:03 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2699.804
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.13
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0020 sec, LOAD: 0.3378 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1504 sec, LOAD: 0.3767 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1830 sec, LOAD: 0.3789 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0292 sec, LOAD: 0.1328 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0022 sec, LOAD: 0.0899 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0036 sec, LOAD: 0.0250 sec.


I'm using Python.

Error Message:

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade perfo
rmance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoidi
ng static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable K
MP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more
information, please see http://www.intel.com/software/products/support/.
zsh: abort (core dumped)  python test.py

Examining the core file shows that the crash occurs once numpy tries to execute a MKL optimzed routine (after mxnet had already loaded MKL):

#0  0x00007f902cb0a428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f902cb0c02a in __GI_abort () at abort.c:89
#2  0x00007f8e93927c53 in __kmp_abort_process () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#3  0x00007f8e939162fb in __kmp_fatal () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#4  0x00007f8e93926068 in __kmp_register_library_startup() () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#5  0x00007f8e93926d86 in __kmp_middle_initialize () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#6  0x00007f8e93910dae in omp_get_num_procs@OMP_1.0 () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#7  0x00007f8e9176989d in mkl_serv_domain_get_max_threads () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#8  0x00007f8e9182a81c in mkl_blas_dsyrk () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#9  0x00007f8e90bcc35b in dsyrk_ () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
#10 0x00007f8e90bfeb44 in cblas_dsyrk () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
#11 0x00007f902780683c in syrk.constprop () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#12 0x00007f9027905a9a in cblas_matrixproduct () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#13 0x00007f90278d8c19 in PyArray_MatrixProduct2 () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#14 0x00007f90278d9a58 in array_matrixproduct () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so

Minimum reproducible example

Steps to reproduce

In [1]: import mxnet as mx

In [2]: mxnd = mx.nd.zeros((1000,1000))

In [3]: mx.nd.dot(mxnd, mxnd)
Out[3]:

[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ...,
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
<NDArray 1000x1000 @cpu(0)>

In [4]: import numpy as np

In [5]: npnd = np.zeros((1000,1000))

In [6]: np.dot(npnd, npnd)
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
zsh: abort (core dumped)  ipython

@vandanavk
Copy link
Contributor

@mxnet-label-bot [Build, MKL]

@fhieber
Copy link
Contributor

fhieber commented Sep 25, 2018

Related to #8532 ?

@vandanavk
Copy link
Contributor

@fhieber Did the PR #11148 fix the issue for you?

@leezu
Copy link
Contributor Author

leezu commented Sep 25, 2018

@fhieber Yes, lets track in #8532 only.

@leezu leezu closed this as completed Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants