mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL #8532

fhieber · 2017-11-03T15:22:32Z

We have observed crashes with any mkl-enabled pip package of mxnet-0.12.0 in combination with numpy if installed through conda (which by default also uses MKL).

In this case, mxnet trainings crash with the following error message:

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Numpy from conda links against the libmkl_rt.so, distributed through conda:

libmkl_rt.so => /opt/conda/lib/python3.6/site-packages/numpy/core/../../../../libmkl_rt.so (0x00007f05256e0000)
	libmkl_rt.so => /opt/conda/lib/python3.6/site-packages/numpy/linalg/../../../../libmkl_rt.so (0x00007f367e1d5000)
	libmkl_rt.so => /opt/conda/lib/python3.6/site-packages/numpy/linalg/../../../../libmkl_rt.so (0x00007fd39e751000)

whereas MXNet links to its own .so:

ldd /opt/conda/lib/python3.6/site-packages/mxnet/libmxnet.so
[...]
	libmklml_intel.so => /opt/conda/lib/python3.6/site-packages/mxnet/libmklml_intel.so (0x00007f8c85b94000)
	libiomp5.so => /opt/conda/lib/python3.6/site-packages/mxnet/libiomp5.so (0x00007f8c857f1000)
[...]

This prevents people from using numpy w/ MKL in combination with mxnet-mkl==0.12.0.

The text was updated successfully, but these errors were encountered:

zhreshold · 2017-11-03T18:24:26Z

@szha

szha · 2017-11-03T21:40:55Z

@ykim362 what's your recommendation on this? MKL2018 doesn't distribute .a, which means that I can't hide any of its symbols when loading.

srochel · 2017-11-04T17:07:47Z

@b0noI - how do we resolve the issue on DLAMI?

ykim362 · 2017-11-04T19:34:27Z

@szha I am discussing with MKL team to figure out how to handle this situation. I will get back to you, soon.

ykim362 · 2017-11-06T23:17:04Z

@fhieber Could you provide your numpy (conda) version?

cjolivier01 · 2018-01-06T00:20:15Z

@szha, Can we link libiomp5.a statically and then strip the symbols from the final libmxnet.so? I am theorizing (I think correctly) that two completely separate versions of openmp can coexist when they apply only to two different modules.

larroy · 2018-01-06T00:31:25Z

@cjolivier01 how can this work, when the data structures passed to the functions of the library change? This would only work if both versions of openmp would guarantee binary compatibility. Is that the case?

cjolivier01 · 2018-01-06T00:37:43Z

@larroy Not sure what you're trying to say here. If the symbols are stripped from libmxnet.so, then the libnumpy.so (or whatever it's called)'s omp.so and libmxnet.so (with its internal omp pool) won't see each other at all, much less call into each other.

What you'll get is numpy using a separate thread pool than libmxnet.so, which probably doesn't matter much because they generally won't be running in parallel anyway.

ykim362 · 2018-01-06T00:44:14Z

@cjolivier01 When I discussed this issue with MKL team, it is not much feasible to make static MKLML given that it has lots of binary dependencies. Another aspect is that there is a possibility that two separate OpenMP calls (from numpy and from MXNet with MKLML) could degrade the performance. We are working on several experimentations. I would like to update the result next week.

larroy · 2018-01-06T13:09:50Z

@cjolivier01 I see. I thought the problem was linking with the same symbols. I found this related to the issue: https://software.intel.com/en-us/inference-engine-devguide-known-issues

ykim362 · 2018-01-10T01:24:21Z

@cjolivier01 From the experimentations, the behavior varies case by case. Some combinations of versions work fine. Some are not. So, the workaround @larroy pointed (LD_PRELOAD env variable) might be a good option. Also, it is recommended to use MKL-enabled MXNet with non-MKL python to get the best performance for neural network workloads.

szha · 2018-01-10T02:16:40Z

@ykim362 our goal is to be able to provide good out-of-the-box experience for MKL-enabled mxnet. Though the workaround might be good for experienced users, we cannot control the environment in which mxnet runs for pip users. Thus, I'm still leaning towards the way @cjolivier01 proposed above if possible.

Do you mind sharing more on the dependencies of mklml to help figure out the feasibility of static-linking?

ykim362 · 2018-01-10T23:08:35Z

@szha I totally agree with you regarding the OOB experience! Actually, our concern is that static linking doesn't solve the problem. How about trying rpath compile option?

@fhieber Could you provide some more detailed information to re-generate this situation? The behavior varies, so I would like to test with the exactly same environment.

szha · 2018-01-10T23:23:42Z

@ykim362 could you elaborate? Currently I'm using '${ORIGIN}' in rpath for mkl builds already, and shipping mkl shared objects along with libmxnet.so. I don't think this prevents other omp from being loaded first and hence doesn't solve our problem. Are you referring to a different solution that's based on rpath?

rsdubtso · 2018-01-11T00:36:00Z

MKL dev here. Can somebody please verify that the pip packages are not linked to OpenMP (Intel's libiomp5) statically? If any one of them is, then this would be a problem. If every single package links to OpenMP dynamically, this should probably work (disclaimer: this needs to be verified; the libmkl_rt.so on Linux loads libiomp5.so via dlopen and I'm not 100% sure what would happen if the library is loaded multiple times for example using RTLD_LOCAL...)

szha · 2018-01-11T01:06:16Z

@rsdubtso Hi. I can verify that mxnet pip package is linked dynamically to libiomp5.so. We include libiomp5.so in the pip package using the rpath trick described above.

rsdubtso · 2018-01-11T03:40:32Z

Hm. Seems like the numpy package also uses dynamic linking to libiomp5. Do you have a small repro that I could try? (Please do include all the necessary package installation steps -- I'm pretty new to all this stuff...)

PS. I also tried installing mxnet for python2.7 conda and it seems to be missing libmxnet.so completely... should I be using conda for python3?

ykim362 · 2018-01-22T22:23:11Z

@fhieber Could you provide the conda version you used or some more detailed environment when you got this problem? mxnet-mkl==0.12.0 and 0.12.1 work fine with conda in my tests.

fhieber · 2018-01-23T10:21:39Z

@ykim362 I just tried this again in a Docker image with mxnet-cu90mkl==1.0.0.post4 and the following conda info output:

               platform : linux-64
          conda version : 4.3.30
       conda is private : False
      conda-env version : 4.3.30
    conda-build version : not installed
         python version : 3.6.3.final.0
       requests version : 2.18.4
       root environment : /opt/conda  (writable)
    default environment : /opt/conda
       envs directories : /opt/conda/envs
                          /root/.conda/envs
          package cache : /opt/conda/pkgs
                          /root/.conda/pkgs
           channel URLs : https://repo.continuum.io/pkgs/main/linux-64
                          https://repo.continuum.io/pkgs/main/noarch
                          https://repo.continuum.io/pkgs/free/linux-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/linux-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/linux-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : None
             netrc file : None
           offline mode : False
             user-agent : conda/4.3.30 requests/2.18.4 CPython/3.6.3 Linux/4.9.58-18.51.amzn1.x86_64 / glibc/2.17
                UID:GID : 0:0

I still observe the same crash:

> OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
> OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

pengzhao-intel · 2018-03-14T05:41:49Z

@fhieber @rsdubtso is correct. It caused by multiple omp lib. The workaround is to set KMP_DUPLICATE_LIB_OK=TRUE .

I am considering the final solution now. Will come back soon :)

Error log:

[INFO:sockeye.training] Using bucketing. Default max_seq_len=(60, 60)
[INFO:main] Optimizer: adam
[INFO:main] Optimizer Parameters: {'wd': 0.0, 'learning_rate': 0.0003, 'lr_scheduler': LearningRateSchedulerPlateauReduce(reduce_factor=0.50, reduce_num_not_improved=0), 'clip_gradient': 1.0, 'rescale_grad': 1.0}
[INFO:main] kvstore: device
[INFO:main] Gradient Compression: None
[INFO:main] Decode and Evaluate Device(s): cpu(0)
[INFO:sockeye.model] Saved config to "/home/chenxiny/newstest/wmt_model/config"
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Aborted

After setting the environment, the problem is gone.
New log:

[INFO:sockeye.utils] Sockeye version 1.16.2 commit 762ce78e4e49b9ba5d14eb0a48d97f19c8807707
[INFO:sockeye.utils] MXNet version 1.1.0
[INFO:sockeye.utils] Command: /home/chenxiny/anaconda3/lib/python3.6/site-packages/sockeye/train.py -s corpus.tc.BPE.de -t corpus.tc.BPE.en -vs newstest2016.tc.BPE.de -vt newstest2016.tc.BPE.en --num-embed 256 --rnn-num-hidden 512 --rnn-attention-type dot --max-seq-len 60 --decode-and-evaluate 500 -o wmt_model
[INFO:sockeye.utils] Arguments: Namespace(allow_missing_params=False, batch_size=64, batch_type='sentence', bucket_width=10, checkpoint_frequency=1000, cnn_activation_type='glu', cnn_hidden_dropout=0.0, cnn_kernel_width=(3, 5), cnn_num_hidden=512, cnn_positional_embedding_type='learned', cnn_project_qkv=False, conv_embed_add_positional_encodings=False, conv_embed_dropout=0.0, conv_embed_max_filter_width=8, conv_embed_num_filters=(200, 200, 250, 250, 300, 300, 300, 300), conv_embed_num_highway_layers=4, conv_embed_output_dim=None, conv_embed_pool_stride=5, decode_and_evaluate=500, decode_and_evaluate_device_id=None, decode_and_evaluate_use_cpu=False, decoder='rnn', device_ids=[-1], disable_device_locking=False, embed_dropout=(0.0, 0.0), embed_weight_init='default', encoder='rnn', fill_up='replicate', gradient_clipping_threshold=1.0, gradient_clipping_type='abs', gradient_compression_threshold=0.5, gradient_compression_type=None, initial_learning_rate=0.0003, keep_last_params=-1, kvstore='device', label_smoothing=0.0, layer_normalization=False, learning_rate_decay_optimizer_states_reset='off', learning_rate_decay_param_reset=False, learning_rate_half_life=10, learning_rate_reduce_factor=0.5, learning_rate_reduce_num_not_improved=3, learning_rate_schedule=None, learning_rate_scheduler_type='plateau-reduce', learning_rate_warmup=0, lock_dir='/tmp', loss='cross-entropy', loss_normalization_type='valid', max_num_checkpoint_not_improved=8, max_num_epochs=None, max_seq_len=(60, 60), max_updates=None, metrics=['perplexity'], min_num_epochs=None, momentum=None, monitor_pattern=None, monitor_stat_func='mx_default', no_bucketing=False, num_embed=(256, 256), num_layers=(1, 1), num_words=(50000, 50000), optimized_metric='perplexity', optimizer='adam', optimizer_params=None, output='wmt_model', overwrite_output=False, params=None, prepared_data=None, quiet=False, rnn_attention_coverage_num_hidden=1, rnn_attention_coverage_type='count', rnn_attention_in_upper_layers=False, rnn_attention_mhdot_heads=None, rnn_attention_num_hidden=None, rnn_attention_type='dot', rnn_attention_use_prev_word=False, rnn_cell_type='lstm', rnn_context_gating=False, rnn_decoder_hidden_dropout=0.0, rnn_decoder_state_init='last', rnn_dropout_inputs=(0.0, 0.0), rnn_dropout_recurrent=(0.0, 0.0), rnn_dropout_states=(0.0, 0.0), rnn_encoder_reverse_input=False, rnn_first_residual_layer=2, rnn_forget_bias=0.0, rnn_h2h_init='orthogonal', rnn_num_hidden=512, rnn_residual_connections=False, seed=13, shared_vocab=False, source='corpus.tc.BPE.de', source_vocab=None, target='corpus.tc.BPE.en', target_vocab=None, transformer_activation_type='relu', transformer_attention_heads=8, transformer_dropout_act=0.0, transformer_dropout_attention=0.0, transformer_dropout_prepost=0.0, transformer_feed_forward_num_hidden=2048, transformer_model_size=512, transformer_positional_embedding_type='fixed', transformer_postprocess=('drn', 'drn'), transformer_preprocess=('', ''), use_cpu=False, use_tensorboard=False, validation_source='newstest2016.tc.BPE.de', validation_target='newstest2016.tc.BPE.en', weight_decay=0.0, weight_init='xavier', weight_init_scale=2.34, weight_init_xavier_factor_type='in', weight_init_xavier_rand_type='uniform', weight_normalization=False, weight_tying=False, weight_tying_type='trg_softmax', word_min_count=(1, 1))
[INFO:sockeye.utils] Attempting to acquire 1 GPUs of 1 GPUs. The requested devices are: [-1]
[INFO:sockeye.utils] Acquired GPU 0.
[INFO:main] Training Device(s): GPU [0]
[INFO:sockeye.vocab] Building vocabulary from dataset(s): ['corpus.tc.BPE.de']
[INFO:sockeye.vocab] Vocabulary: types: 22309/22309/22309/22313 (initial/min_pruned/max_pruned/+special) [min_frequency=1, max_num_types=50000]
[INFO:sockeye.vocab] Building vocabulary from dataset(s): ['corpus.tc.BPE.en']
[INFO:sockeye.vocab] Vocabulary: types: 16757/16757/16757/16761 (initial/min_pruned/max_pruned/+special) [min_frequency=1, max_num_types=50000]
[INFO:sockeye.data_io] ===============================
[INFO:sockeye.data_io] Creating training data iterator
[INFO:sockeye.data_io] ===============================
[INFO:sockeye.data_io] 193369 sequences of maximum length (60, 60) in '/home/chenxiny/newstest/corpus.tc.BPE.de' and '/home/chenxiny/newstest/corpus.tc.BPE.en'.
[INFO:sockeye.data_io] Mean training target/source length ratio: 0.99 (+-0.45)
[INFO:sockeye.data_io] Tokens: source 4563838 target 4466824
[INFO:sockeye.data_io] Vocabulary coverage: source 100% target 100%
[INFO:sockeye.data_io] 193369 sequences across 6 buckets
[INFO:sockeye.data_io] 6631 sequences did not fit into buckets and were discarded
[INFO:sockeye.data_io] Bucket (10, 10): 32318 samples in 505 batches of 64, ~362.2 tokens/batch.
[INFO:sockeye.data_io] Bucket (20, 20): 46870 samples in 733 batches of 64, ~949.5 tokens/batch.
[INFO:sockeye.data_io] Bucket (30, 30): 49430 samples in 773 batches of 64, ~1511.9 tokens/batch.
[INFO:sockeye.data_io] Bucket (40, 40): 35885 samples in 561 batches of 64, ~2070.4 tokens/batch.
[INFO:sockeye.data_io] Bucket (50, 50): 19636 samples in 307 batches of 64, ~2618.3 tokens/batch.

szha · 2018-04-06T05:59:23Z

@pengzhao-intel any update?

szha · 2018-04-06T06:02:53Z

Unless intel provides a static library for libiomp5 built with -fPIC flag, or drop the dependency on it, I don't see anything we can do from the mxnet side. Suggestions are welcome.

pengzhao-intel · 2018-04-06T15:27:31Z

@szha, @ashokei is talking with MKL team for using libgomp in mxnet so it can avoid the dependency of libiomp.

@ashokei any update and please correct me if my understanding is not correct.

szha · 2018-04-06T17:04:23Z

@pengzhao-intel thanks for the update. @ashokei let us know how things progress and how this change impacts performance.

ashokei · 2018-04-10T00:35:52Z

@zheng-da is your openmp linking issue related to this in any way; Does your solution/workaround fix this ?

zheng-da · 2018-04-10T01:19:25Z

no, i don't have a workaround so far.

tdomhan · 2018-06-26T13:48:37Z

any updates on this?

pengzhao-intel · 2018-06-27T01:11:12Z

@tdomhan after several investigations, I realized it's not easy to resolve this issue under the current setting. We have to change some building logic.
Do you still encounter this error? Does the workaround KMP_DUPLICATE_LIB_OK=TRUE make sense?
I will raise the internal discussion with MKL team and back soon.

tdomhan · 2018-06-27T07:05:56Z

"may cause crashes or silently produce incorrect results" in the message stated by Felix above made me not so eager to try this out, rather than using the non-MKL version of MXNet.

pengzhao-intel · 2018-06-27T07:21:26Z

Understand, even we don't encounter the crashes and incorrect results by the workaround.
I am working on this item now and hope we can resolve this issue clearly :)

tdomhan · 2018-06-27T08:39:08Z

thanks a lot for looking into this! :)

pengzhao-intel · 2018-07-04T08:53:58Z

The numpy from conda includes the mkl package which conflicts with MXNET.
The good news is the latest mkl package resolved this issue from mkl-2018.0.3.

Thus, updating the numpy in conda to 1.14 will resolve this issue.

@tdomhan @fhieber please try again in your environment.
Feel free to let me know if you encountered any other issues.

The following packages will be downloaded:
package | build
---------------------------|-----------------
mkl-2018.0.3 | 1 198.7 MB
The following NEW packages will be INSTALLED:
blas: 1.0-mkl
mkl_fft: 1.0.1-py36h3010b51_0
mkl_random: 1.0.1-py36h629b387_0
numpy-base: 1.14.5-py36hdbf6ddf_0
The following packages will be UPDATED:
ca-certificates: 2017.08.26-h1d4fec5_0 --> 2018.03.07-0
certifi: 2017.7.27.1-py36h8b7b77e_0 --> 2018.4.16-py36_0
conda: 4.4.11-py36_0 --> 4.5.4-py36_0
llvmlite: 0.20.0-py36_0 --> 0.23.2-py36hdbcaa40_0
mkl: 2018.0.0-hb491cac_4 --> 2018.0.3-1
numba: 0.35.0-np113py36_10 --> 0.38.1-py36h04863e7_0
numpy: 1.13.3-py36ha12f23b_0 --> 1.14.5-py36hcd700cb_0
openssl: 1.0.2l-h077ae2c_5 --> 1.0.2o-h20670df_0

pengzhao-intel · 2018-07-19T01:00:35Z

@fhieber @tdomhan could you help confirm the fix?
If the problem is solved, we can close this bug finally ;)

fhieber · 2018-09-28T07:40:25Z

I finally got back to this after a while. I no longer observe the libomp-related error mentioned in the original issue, but I am observing process deadlocks with the following numpy/mxnet configuration in Sockeye:

conda list | grep numpy
numpy                     1.15.1           py36h6a91979_0
numpy-base                1.15.1           py36h8a80b8c_0
conda list | grep mkl
blas                      1.0                         mkl
mkl                       2019.0                      118
mkl_fft                   1.0.4            py36h5d10147_1
mkl_random                1.0.1            py36h5d10147_1
mxnet-mkl                 1.3.0.post0               <pip>

If mkl-optimized numpy is installed via anaconda (as shown above) and using mxnet-mkl==1.3.0.post0 on a Mac laptop, the Sockeye subprocess spawned at a checkpoint (to decode the validation data set), is unable to spawn and the main process deterministically hangs. When debugging, it seems that it fails to spawn the subprocess.
However, when using either mxnet==1.3.0.post0 (no mkl) or pip-installed numpy (no mkl), everything works just fine.

pengzhao-intel · 2018-09-28T08:20:52Z

@fhieber thanks for the feedback.
Could you give me a test case to reproduce this issue?

fhieber · 2018-09-28T15:30:35Z

Thanks @pengzhao-intel, here is a minimal example to reproduce the issue.
You can run this:

conda install mkl
conda install numpy
pip install mxnet-mkl
git clone https://github.com/awslabs/sockeye.git
cd sockeye
python -m sockeye.train --num-layers 1 -s setup.py -t setup.py -vs setup.py -vt setup.py -o test_model --batch-size 5 --batch-type sentence --num-embed 8 --transformer-model-size 8 --overwrite-output --use-cpu --transformer-attention-heads 1 --checkpoint-frequency 100 --decode-and-evaluate 2

(this will train a tiny model on the setup.py file, but will hang once reached 100 updates and spawns a CheckpointDecoder subprocess to decode 2 sentences of the validation data.
This will hang with the last log line being:

[INFO:sockeye.training] Starting process: Decoder-1

If you set --decode-and-evaluate 0, no decoder subprocess will be started at each checkpoint, and training runs fine.

If you run

conda uninstall mkl
conda uninstall numpy
pip install numpy

and run the same training with --decode-and-evaluate > 0, no hanging will occur.

Likewise, if you replace mxnet-mkl with mxnet:

pip uninstall mxnet-mkl
pip install mxnet

and run the same training, no hanging will occur.

pengzhao-intel · 2018-09-29T04:47:37Z

@fhieber thanks for the details information.

Because this is a new issue, do you mind close this one and open a new one?
BTW, the issue is observed from sockeye first, I suggest we can create an issue in sockeye repo first and go back MXNet repo when we figure out the specific bug.

fhieber · 2018-10-01T18:24:57Z

@pengzhao-intel this is not a Sockeye issue. I posted a minimal reproducible example in #12710

lebeg · 2018-11-22T12:38:04Z

Possibly related #12160

…-cu90 without MKL due to apache/mxnet#8532 cr https://cr.amazon.com/r/7923806/

pengzhao-intel · 2019-03-14T12:20:31Z

The problem has been fixed with the latest numpy in conda.

Closing, feel free to re-open if there is any other issue.

fhieber mentioned this issue Nov 9, 2017

Changing to non-mkl mxnet version. awslabs/sockeye#193

Merged

cjolivier01 added Bug MKL labels Mar 12, 2018

szha added Installation and removed Bug labels Apr 6, 2018

tdomhan mentioned this issue Jun 27, 2018

Switch to MKL versions of MXNet awslabs/sockeye#442

Merged

6 tasks

larroy mentioned this issue Jun 27, 2018

libomp.so dependency (need REAL fix) #11417

Closed

lebeg mentioned this issue Jul 16, 2018

[MXNET-679] Refactor handling BLAS libraries with cmake #11148

Closed

20 tasks

mihaic mentioned this issue Aug 1, 2018

Conda build fails on MacOS (Abort trap: 6) brainiak/brainiak#377

Closed

fhieber mentioned this issue Sep 25, 2018

MXNet MKL conflicts with other MKL optimized packages #12661

Closed

fhieber mentioned this issue Oct 1, 2018

Subprocess Deadlock with mxnet-mkl #12710

Closed

artemsok pushed a commit to artemsok/sockeye that referenced this issue Jan 21, 2019

Remove libgomp1 from sockeye deps docker images. Change back to mxnet…

277ed30

…-cu90 without MKL due to apache/mxnet#8532 cr https://cr.amazon.com/r/7923806/

pengzhao-intel closed this as completed Mar 14, 2019

TaoLv mentioned this issue Feb 26, 2020

OpenMP Error #17641

Closed

akarbown mentioned this issue Dec 1, 2020

[RFC] MXNet: Improve the build with DNNL, MKL, and OpenMP. #19610

Closed

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL #8532

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL #8532

Comments

fhieber commented Nov 3, 2017

zhreshold commented Nov 3, 2017

szha commented Nov 3, 2017

srochel commented Nov 4, 2017

ykim362 commented Nov 4, 2017

ykim362 commented Nov 6, 2017

cjolivier01 commented Jan 6, 2018

larroy commented Jan 6, 2018

cjolivier01 commented Jan 6, 2018 • edited Loading

ykim362 commented Jan 6, 2018

larroy commented Jan 6, 2018

ykim362 commented Jan 10, 2018

szha commented Jan 10, 2018

ykim362 commented Jan 10, 2018

szha commented Jan 10, 2018

rsdubtso commented Jan 11, 2018

szha commented Jan 11, 2018

rsdubtso commented Jan 11, 2018

ykim362 commented Jan 22, 2018

fhieber commented Jan 23, 2018

pengzhao-intel commented Mar 14, 2018 • edited Loading

szha commented Apr 6, 2018

szha commented Apr 6, 2018

pengzhao-intel commented Apr 6, 2018 • edited Loading

szha commented Apr 6, 2018

ashokei commented Apr 10, 2018

zheng-da commented Apr 10, 2018

tdomhan commented Jun 26, 2018

pengzhao-intel commented Jun 27, 2018

tdomhan commented Jun 27, 2018 • edited Loading

pengzhao-intel commented Jun 27, 2018

tdomhan commented Jun 27, 2018

pengzhao-intel commented Jul 4, 2018

pengzhao-intel commented Jul 19, 2018

fhieber commented Sep 28, 2018

pengzhao-intel commented Sep 28, 2018

fhieber commented Sep 28, 2018 • edited Loading

pengzhao-intel commented Sep 29, 2018

fhieber commented Oct 1, 2018

lebeg commented Nov 22, 2018

pengzhao-intel commented Mar 14, 2019

cjolivier01 commented Jan 6, 2018 •

edited

Loading

pengzhao-intel commented Mar 14, 2018 •

edited

Loading

pengzhao-intel commented Apr 6, 2018 •

edited

Loading

tdomhan commented Jun 27, 2018 •

edited

Loading

fhieber commented Sep 28, 2018 •

edited

Loading