Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to MKL versions of MXNet #442

Merged
merged 5 commits into from
Jun 20, 2018
Merged

Switch to MKL versions of MXNet #442

merged 5 commits into from
Jun 20, 2018

Conversation

mjdenkowski
Copy link
Contributor

This update switches our requirements over to MKL-DNN versions of MXNet (mxnet-mkl, mxnet-cu91mkl, etc.).

Using MKL dramatically improves performance on CPUs. Starting with version 1.2.0, MXNet also makes significantly better use of the open source version, MKL-DNN. This brings performance improvements to the pip-installable MXNet versions that are on par with building MXNet from source with the full version of MKL. Newer MKL versions of MXNet are also more compatible with various setups, such as installing Python via Conda, allowing us to more easily use them with Sockeye.

Pull Request Checklist

  • Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
    until you can check this box.
  • Unit tests pass (pytest)
  • System tests pass (pytest test/system)
  • Passed code style checking (./style-check.sh)
  • Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
  • Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

- pip install -r requirements.docs.txt
- pip install -r requirements/requirements.txt
- pip install -r requirements/requirements.dev.txt
- pip install -r requirements/requirements.docs.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the files are now in a directory, consider renaming them to {core,dev,docs}.txt.

@tdomhan
Copy link
Contributor

tdomhan commented Jun 27, 2018

it seems like apache/mxnet#8532 is still not fixed and we still see crashes in environments where numpy uses MKL (such as any conda/anaconda setup). I would actually vote for reverting to the non-mkl version by default and add a section to the tutorials on speedups with MKL with the explicit hint that one needs a numpy without MKL. What do you think?

@pengzhao-intel
Copy link

pengzhao-intel commented Jun 27, 2018

Yes, the new MKL-DNN backend has accelerated the sockeye a lot. And recently we add more optimizations for sockeye (GNMT, transformer, etc), such as mshadow/342, mxnet/11335 and others PR (topk, copy2copy, etc) are on the way :)

For example, the CPU inference performance is 8X faster with MKL-DNN as below table.

OP sockeye 1.18.10 GNMT, 8 layers, bs=32, beam size=10
baseline 2.8558
+topk 14.0956
+softmax 15.4043
+batch_gemm 16.1896
+copy2copy 17.6731

Thus, it's the right direction to enable MKL-DNN backend eventually.
I agree #8532 is blocking issue and we will try to resolve it soon.

Alternatively, how about enable MKL-DNN and explicitly document this known issue?

@tdomhan
Copy link
Contributor

tdomhan commented Jun 27, 2018

thanks for those numbers, that's great to hear!

@mjdenkowski
Copy link
Contributor Author

Do we have a log from a specific test that's failing with a specific setup? I just re-ran the system tests with both PyPI and Conda NumPy (pip install numpy vs conda install numpy) and everything looks good. The Conda version is definitely using MKL as well:

>>> np.__config__.show()
mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ec2-user/miniconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/miniconda3/include']

>>> help(mxnet)
Help on package mxnet:
...
FILE
    /home/ec2-user/miniconda3/lib/python3.6/site-packages/mxnet_mkl-1.2.0-py3.6-linux-x86_64.egg/mxnet/__init__.py

For the unit and integration tests, I am seeing a weird issue of test_convolutional_embedding_encoder crashing with all versions of Python and NumPy. Commenting out that test makes everything pass, so this might be separate from the MKL issue. I'm going to take a closer look.

@mjdenkowski
Copy link
Contributor Author

I'm now getting the following for both PyPI and Conda NumPy and both Conda and system (Amazon Linux) Python:

test/unit/test_encoder.py::test_convolutional_embedding_encoder[config0-out_data_shape0-out_data_length0-3] zsh: abort      pytest

Narrowing it down to this line, I get:

[20:41:33] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1024 bytes with malloc directly
terminate called after throwing an instance of 'dmlc::Error'
  what():  [20:41:33] src/engine/./threaded_engine.h:379: std::exception
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17ec9d) [0x7f624ea1ac9d]
[bt] (1) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17f068) [0x7f624ea1b068]
[bt] (2) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27abe26) [0x7f6251047e26]
[bt] (3) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27af461) [0x7f625104b461]
[bt] (4) /usr/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27ac01b) [0x7f625104801b]
[bt] (5) /usr/lib64/libstdc++.so.6(+0xbb890) [0x7f624463b890]
[bt] (6) /lib64/libpthread.so.0(+0x7de5) [0x7f62bb34cde5]
[bt] (7) /lib64/libc.so.6(clone+0x6d) [0x7f62ba97030d]

Using KMP_DUPLICATE_LIB_OK=TRUE has no effect. This looks like a separate issue from #8532. The good news is that it should be easy enough to replicate in any setup by just running pytest.

Any ideas?

@mjdenkowski
Copy link
Contributor Author

It looks like the issue may be with a call to mxnet.symbol.concat, this line in encoder.py:

        # (batch_size, total_num_filters, seq_len, num_scores=1)
        conv_concat = mx.sym.concat(*conv_outputs, dim=1)

If we manually specify elements to concatenate, everything works for multiple copies of the same element, but crashes otherwise. If we replace the above, we get the following results:

All tests pass:

        conv_concat = mx.sym.concat(conv_outputs[0], conv_outputs[0], dim=1)
        total_num_filters = self.num_filters[0] + self.num_filters[0]

All tests pass:

        conv_concat = mx.sym.concat(conv_outputs[1], conv_outputs[1], dim=1)
        total_num_filters = self.num_filters[1] + self.num_filters[1]

Crashes, test aborts:

        conv_concat = mx.sym.concat(conv_outputs[0], conv_outputs[1], dim=1)
        total_num_filters = self.num_filters[0] + self.num_filters[1]

Any ideas?

@xinyu-intel
Copy link
Contributor

@mjdenkowski Hi, I cannot reproduce your error on my CPU/GPU machines with conda environment.

(base) $ nosetests test_encoder.py:test_get_convolutional_encoder
.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK

@tdomhan
Copy link
Contributor

tdomhan commented Jun 28, 2018

I will try to find a minimal example that reproduces the issue.

@tdomhan
Copy link
Contributor

tdomhan commented Jun 28, 2018

I was indeed not able to reproduce this with mxnet-mkl and an up to date conda installation on Amazon Linux. The last time I observed this was with mxnet-cu91mkl. Will try that next.

@xinyu-intel
Copy link
Contributor

@tdomhan I update numpy and this duplicated iomp5.so bug cannot be reproduced.

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    mkl-2018.0.3               |                1       198.7 MB

The following NEW packages will be INSTALLED:

    blas:            1.0-mkl                   
    mkl_fft:         1.0.1-py36h3010b51_0      
    mkl_random:      1.0.1-py36h629b387_0      
    numpy-base:      1.14.5-py36hdbf6ddf_0     

The following packages will be UPDATED:

    ca-certificates: 2017.08.26-h1d4fec5_0      --> 2018.03.07-0         
    certifi:         2017.7.27.1-py36h8b7b77e_0 --> 2018.4.16-py36_0     
    conda:           4.4.11-py36_0              --> 4.5.4-py36_0         
    llvmlite:        0.20.0-py36_0              --> 0.23.2-py36hdbcaa40_0
    mkl:             2018.0.0-hb491cac_4        --> 2018.0.3-1           
    numba:           0.35.0-np113py36_10        --> 0.38.1-py36h04863e7_0
    numpy:           1.13.3-py36ha12f23b_0      --> 1.14.5-py36hcd700cb_0
    openssl:         1.0.2l-h077ae2c_5          --> 1.0.2o-h20670df_0    

@pengzhao-intel
Copy link

pengzhao-intel commented Jul 19, 2018

This issue has been fixed in the latest MKL package 2018 update 3 and the new numpy 1.14 will work fine.

So, is it possible to update the dependency of numpy in the README? @tdomhan @mjdenkowski

Dependencies
Sockeye requires:

Python3
MXNet-1.2.0
numpy (>=1.14)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants