Switch to MKL versions of MXNet #442

mjdenkowski · 2018-06-19T00:50:28Z

This update switches our requirements over to MKL-DNN versions of MXNet (mxnet-mkl, mxnet-cu91mkl, etc.).

Using MKL dramatically improves performance on CPUs. Starting with version 1.2.0, MXNet also makes significantly better use of the open source version, MKL-DNN. This brings performance improvements to the pip-installable MXNet versions that are on par with building MXNet from source with the full version of MKL. Newer MKL versions of MXNet are also more compatible with various setups, such as installing Python via Conda, allowing us to more easily use them with Sockeye.

Pull Request Checklist

Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
until you can check this box.
Unit tests pass (pytest)
System tests pass (pytest test/system)
Passed code style checking (./style-check.sh)
Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mjpost · 2018-06-19T09:00:40Z

.travis.yml

- - pip install -r requirements.docs.txt
+ - pip install -r requirements/requirements.txt
+ - pip install -r requirements/requirements.dev.txt
+ - pip install -r requirements/requirements.docs.txt



Since the files are now in a directory, consider renaming them to {core,dev,docs}.txt.

(Test was failing with MKL version of MXNet)

tdomhan · 2018-06-27T07:32:32Z

it seems like apache/mxnet#8532 is still not fixed and we still see crashes in environments where numpy uses MKL (such as any conda/anaconda setup). I would actually vote for reverting to the non-mkl version by default and add a section to the tutorials on speedups with MKL with the explicit hint that one needs a numpy without MKL. What do you think?

pengzhao-intel · 2018-06-27T14:13:23Z

Yes, the new MKL-DNN backend has accelerated the sockeye a lot. And recently we add more optimizations for sockeye (GNMT, transformer, etc), such as mshadow/342, mxnet/11335 and others PR (topk, copy2copy, etc) are on the way :)

For example, the CPU inference performance is 8X faster with MKL-DNN as below table.

OP	sockeye 1.18.10 GNMT, 8 layers, bs=32, beam size=10
baseline	2.8558
+topk	14.0956
+softmax	15.4043
+batch_gemm	16.1896
+copy2copy	17.6731

Thus, it's the right direction to enable MKL-DNN backend eventually.
I agree #8532 is blocking issue and we will try to resolve it soon.

Alternatively, how about enable MKL-DNN and explicitly document this known issue?

tdomhan · 2018-06-27T18:21:01Z

thanks for those numbers, that's great to hear!

mjdenkowski · 2018-06-27T20:19:45Z

Do we have a log from a specific test that's failing with a specific setup? I just re-ran the system tests with both PyPI and Conda NumPy (pip install numpy vs conda install numpy) and everything looks good. The Conda version is definitely using MKL as well:

>>> np.__config__.show()
mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']

>>> help(mxnet)
Help on package mxnet:
...
FILE
    /home/ec2-user/miniconda3/lib/python3.6/site-packages/mxnet_mkl-1.2.0-py3.6-linux-x86_64.egg/mxnet/__init__.py

For the unit and integration tests, I am seeing a weird issue of test_convolutional_embedding_encoder crashing with all versions of Python and NumPy. Commenting out that test makes everything pass, so this might be separate from the MKL issue. I'm going to take a closer look.

mjdenkowski · 2018-06-27T21:02:29Z

I'm now getting the following for both PyPI and Conda NumPy and both Conda and system (Amazon Linux) Python:

test/unit/test_encoder.py::test_convolutional_embedding_encoder[config0-out_data_shape0-out_data_length0-3] zsh: abort      pytest

Narrowing it down to this line, I get:

[20:41:33] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1024 bytes with malloc directly
terminate called after throwing an instance of 'dmlc::Error'
  what():  [20:41:33] src/engine/./threaded_engine.h:379: std::exception
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17ec9d) [0x7f624ea1ac9d]
[bt] (1) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17f068) [0x7f624ea1b068]
[bt] (2) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27abe26) [0x7f6251047e26]
[bt] (3) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27af461) [0x7f625104b461]
[bt] (4) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27ac01b) [0x7f625104801b]
[bt] (5) /usr/lib64/libstdc++.so.6(+0xbb890) [0x7f624463b890]
[bt] (6) /lib64/libpthread.so.0(+0x7de5) [0x7f62bb34cde5]
[bt] (7) /lib64/libc.so.6(clone+0x6d) [0x7f62ba97030d]

Using KMP_DUPLICATE_LIB_OK=TRUE has no effect. This looks like a separate issue from #8532. The good news is that it should be easy enough to replicate in any setup by just running pytest.

Any ideas?

mjdenkowski · 2018-06-27T21:57:32Z

It looks like the issue may be with a call to mxnet.symbol.concat, this line in encoder.py:

        # (batch_size, total_num_filters, seq_len, num_scores=1)
        conv_concat = mx.sym.concat(*conv_outputs, dim=1)

If we manually specify elements to concatenate, everything works for multiple copies of the same element, but crashes otherwise. If we replace the above, we get the following results:

All tests pass:

        conv_concat = mx.sym.concat(conv_outputs[0], conv_outputs[0], dim=1)
        total_num_filters = self.num_filters[0] + self.num_filters[0]

All tests pass:

        conv_concat = mx.sym.concat(conv_outputs[1], conv_outputs[1], dim=1)
        total_num_filters = self.num_filters[1] + self.num_filters[1]

Crashes, test aborts:

        conv_concat = mx.sym.concat(conv_outputs[0], conv_outputs[1], dim=1)
        total_num_filters = self.num_filters[0] + self.num_filters[1]

Any ideas?

xinyu-intel · 2018-06-28T01:26:31Z

@mjdenkowski Hi, I cannot reproduce your error on my CPU/GPU machines with conda environment.

(base) $ nosetests test_encoder.py:test_get_convolutional_encoder
.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK

tdomhan · 2018-06-28T12:04:37Z

I will try to find a minimal example that reproduces the issue.

tdomhan · 2018-06-28T15:22:08Z

I was indeed not able to reproduce this with mxnet-mkl and an up to date conda installation on Amazon Linux. The last time I observed this was with mxnet-cu91mkl. Will try that next.

xinyu-intel · 2018-06-29T13:50:25Z

@tdomhan I update numpy and this duplicated iomp5.so bug cannot be reproduced.

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mkl-2018.0.3               |                1       198.7 MB

The following NEW packages will be INSTALLED:

    blas:            1.0-mkl                   
    mkl_fft:         1.0.1-py36h3010b51_0      
    mkl_random:      1.0.1-py36h629b387_0      
    numpy-base:      1.14.5-py36hdbf6ddf_0     

The following packages will be UPDATED:

    ca-certificates: 2017.08.26-h1d4fec5_0      --> 2018.03.07-0         
    certifi:         2017.7.27.1-py36h8b7b77e_0 --> 2018.4.16-py36_0     
    conda:           4.4.11-py36_0              --> 4.5.4-py36_0         
    llvmlite:        0.20.0-py36_0              --> 0.23.2-py36hdbcaa40_0
    mkl:             2018.0.0-hb491cac_4        --> 2018.0.3-1           
    numba:           0.35.0-np113py36_10        --> 0.38.1-py36h04863e7_0
    numpy:           1.13.3-py36ha12f23b_0      --> 1.14.5-py36hcd700cb_0
    openssl:         1.0.2l-h077ae2c_5          --> 1.0.2o-h20670df_0

pengzhao-intel · 2018-07-19T01:02:06Z

This issue has been fixed in the latest MKL package 2018 update 3 and the new numpy 1.14 will work fine.

So, is it possible to update the dependency of numpy in the README? @tdomhan @mjdenkowski

Dependencies
Sockeye requires:

Python3
MXNet-1.2.0
numpy (>=1.14)

mjdenkowski requested review from davvil, fhieber and tdomhan as code owners June 19, 2018 00:50

davvil approved these changes Jun 20, 2018

View reviewed changes

mjpost approved these changes Jun 20, 2018

View reviewed changes

tdomhan approved these changes Jun 20, 2018

View reviewed changes

mjdenkowski added 5 commits June 20, 2018 10:58

Update requirements to mxnet-*mkl.

48a02f9

Update documentation and setup for requirements dir.

da88c1d

Use isclose to slightly relax equality check for coverage test.

987834e

(Test was failing with MKL version of MXNet)

Add requirements to MANIFEST.in.

f006ec1

Update changelog.

a855930

mjdenkowski force-pushed the mkl branch from 7ae868f to a855930 Compare June 20, 2018 15:00

mjdenkowski merged commit 1e57ce4 into master Jun 20, 2018

mjdenkowski deleted the mkl branch June 20, 2018 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to MKL versions of MXNet #442

Switch to MKL versions of MXNet #442

mjdenkowski commented Jun 19, 2018

mjpost Jun 19, 2018

tdomhan commented Jun 27, 2018

pengzhao-intel commented Jun 27, 2018 •

edited

Loading

tdomhan commented Jun 27, 2018

mjdenkowski commented Jun 27, 2018

mjdenkowski commented Jun 27, 2018

mjdenkowski commented Jun 27, 2018

xinyu-intel commented Jun 28, 2018

tdomhan commented Jun 28, 2018

tdomhan commented Jun 28, 2018

xinyu-intel commented Jun 29, 2018

pengzhao-intel commented Jul 19, 2018 •

edited

Loading

Switch to MKL versions of MXNet #442

Switch to MKL versions of MXNet #442

Conversation

mjdenkowski commented Jun 19, 2018

Pull Request Checklist

mjpost Jun 19, 2018

Choose a reason for hiding this comment

tdomhan commented Jun 27, 2018

pengzhao-intel commented Jun 27, 2018 • edited Loading

tdomhan commented Jun 27, 2018

mjdenkowski commented Jun 27, 2018

mjdenkowski commented Jun 27, 2018

mjdenkowski commented Jun 27, 2018

xinyu-intel commented Jun 28, 2018

tdomhan commented Jun 28, 2018

tdomhan commented Jun 28, 2018

xinyu-intel commented Jun 29, 2018

pengzhao-intel commented Jul 19, 2018 • edited Loading

pengzhao-intel commented Jun 27, 2018 •

edited

Loading

pengzhao-intel commented Jul 19, 2018 •

edited

Loading