-
Notifications
You must be signed in to change notification settings - Fork 6.8k
NumPy BLAS Clashing with MXNet BLAS #18855
Comments
The problem here is that we can never know what BLAS libraries are linked into other Python packages that our users happen to use. NumPy may be a popular one, but other packages may also link different BLAS implementations. The same issue applies to the OpenMP dependency. Did you statically link BLAS implementation? You can refer to #17751. For dynamic linking, the loader will load the implementation referenced by the first package that is loaded and not load other BLAS implementations if the BLAS symbols are already available. Maybe importing mxnet first triggers importing numpy, causing the numpy BLAS implementation to be loaded? |
is this from pip package or building from source? for pip, we statically link blas/lapacke and stripped all symbols that are not exposed. this means that the blas/lapacke functions are not in the symbol table. |
This if for building from source. Are we planing to support large tensors by default in the future? Currently it's controlled by a flag and I think the distributed whl's don't have it turned on |
Right it was a dynamic link. Is there a way to link it statically when building from source? |
@Zha0q1 When building pypi wheel. openBLAS is linked statically and is inside We can change the build from source instructions for mxnet.2.0 to always build openBLAS from source and statically link it. @leezu @szha what do you think ? |
Alright I tried the static build scripts under /tools
It looks like libmxnet is still dependent on libopenblas. Am I doing this wrong or should I config the script in some ways? @leezu |
As discussed offline, we should avoid requiring all users to build openblas from source, especially as distributions start to provide To avoid symbol version clashes as reported in these issues, we must require the suffixed symbol names or static link. We can work with cmake upstream on a standard approach to detecting 64bit suffixed openblas as part of https://cmake.org/cmake/help/latest/module/FindBLAS.html We can backport the upstream recommended approach to our https://github.com/apache/incubator-mxnet/tree/master/cmake/upstream To support older distributions that do not ship with Static linking is a great solution for the pip distribution build, but I don't think we should settle on a solution that requires static linking, because there are already better solution established as best practice (ie. symbol suffix) and this will restrict the versatility of MXNet source build and make it harder to package MXNet as part of Debian or other distributions. |
@leezu Thanks for initiating the discussion on debian science. My solution was to have build instructions for everyone to build openblas from source(static linking would avoid issues for cloud users using DLAMI on EC2 that causes issues with openBLAS installation). That way build instructions are identical regardless of the distro selection. Most customers use pip install so for them there isn't a difference in experience. Since distros will start to provide _64 versions of libraries we don't need to worry about those then. |
It's great to have those instructions where required. But we should also look at the trend and build a solution that aligns with the work of others.
We may need to add support for those, it's not automatic. For example, https://cmake.org/cmake/help/latest/module/FindBLAS.html does not currently distinguish the 64 and 32bit versions as the maintainers may not be aware of the use-case. This doesn't all have to happen at once. We just need to ensure that our approach remains compatible with what we eventually would like to achieve. |
Makes sense |
@Zha0q1 you may want to check if these steps are still there in new build scripts and add as necessary |
If you want to make this work via static linking, you may need to use
See also https://stackoverflow.com/questions/7216973/is-there-a-downside-to-using-bsymbolic-functions But instead we can just workaround the issue by adopting the 64_ suffix convention. |
Discussed offline and we need to re-enable the symbol whitelisting in cmake builds. They previously existed in make-based builds: |
Thanks! @szha @leezu I just looked into adding a suffix to cblas/lapack calls within our code base. There are ~50 places in a handful of different files that all need to be changed. This makes me think if this change is too heavy. Also 1. openblas is used in TVM, so they will need to make the same change to be consistent with us. 2. like @leezu mentioned finding 64 openblas is not well supported and distros also don't have a unified solution to it. For our purpose of supporting large tensors in 2.0, if we could link 64 openblas statically for our soon-to-come release, I would still think that’s the best solution. |
BTW int 32 blas will work for tensors with size > INT_MAX (2**31 - 1 ), it's when a dim is > INT_MAX we must use int 64 blas, because in the function declarations they use int 32 for stride |
Why not follow the approach in numpy and define a macro at a central place? https://github.com/numpy/numpy/pull/15069/files#diff-4538717a0246e7d9363e76a2e3fc835e
You can edit https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindOpenBLAS.cmake. You will need to edit this file in any case, even if you chose not to rely on symbol suffixes. That's because ILP64 openblas would typically be named
There are two different cases: One for the staticbuild for pip, where static linkage will be preferred. For the general CMakeLists.txt, why should we restrict our users to static linkage?
How do you ensure this consistency without symbol suffix? Does TVM support ILP64? If TVM expects standard 32bit blas but you link ILP64 blas with the same symbol names, wouldn't there be issues?
If |
I created a simple PR. I am trying to learn about the scope of this change as I go |
@sandeep-krishnamurthy would you take a look? |
For reference, here are the things I tried:
|
One more thing is that currently the static build script defaults to dynamically linking openblas. We would need to make it a static link; I can share my changes (build openblas 32 => 64; dynamic link => static link) once we have a final consensus on how to support openblas 64 in the future. Also, opencv would fail to link with this error:
For my poc build I had to turn off opencv for the mxnet build. |
There's a step that disable/remove shared object for dynamic linking. |
Yes, I did that on my machine and was able to build the wheel. I think we should add that back too |
This change enables symbol exclusion of statically linked MKL libraries to avoid the name clashing issue.
This change enables symbol exclusion of statically linked MKL libraries to avoid the name clashing issue.
Fixes MKL symbol name clashing issue (#18855).
Both NumPy and MXNet are dependent on BLAS. When they are linked to different BLAS libraries there will be a name clashing issue. Effectively, only functions from NumPy's BLAS will be used by both NumPy and MXNet.
According to https://stackoverflow.com/questions/47891872/how-to-use-non-mkl-numpy-under-anaconda, anaconda will by default ship MKL-dependent NumPy. This is also the case on DLAMI 30:
I first ran into this issue while working on adding large tensor support to linalg operators, where I used a manually built int 64 version of Open BLAS. I used this simple test script:
On my machine (DLAMI 30 Ubuntu 18) Open BLAS is built with
DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 USE_OPENMP=1 INTERFACE64=1 BINARY=64 NO_SHARED=0 NO_LAPACK=0
and MXNet is built withUSE_BLAS="open" USE_INT64_TENSOR_SIZE=1
. Numpy is pre-installed with MKL optimization.Ideally,
linalg.syrk
would invoke Open BLAScblas_ssyrk
(my build, 64 bit int), but in reality because of the name clashing, MKLcblas_ssyrk
(32 bit int) is called instead. This will lead to:Using GDB we can see we are indeed calling into MKL
cblas_ssyrk
:Reinstalling NumPy and linking it to my Open BLAS build resolved the issue for me.
So the problem with this name clashing issue is that regardless of what BLAS we build MXNet with, we are stuck with the BLAS that NumPy is configured to use. While in most cases, such as supporting large tensor i.e. 64-bit indexing, it's fine to configure them to use the same BLAS lib (int 64 Open BLAS in my case), I wonder if there is special use case where we actually want different BLAS for NumPy and MXNet?
My guess would be "no", but still we should be aware of this issue as well as the extra step to reconfig NumPy and MXNet to the correct BLAS, and we probably need to note so in our build tutorial
This same issue is also noted on NumPy's build-from-source page: https://numpy.org/devdocs/user/building.html. Open BLAS support building with function prefixes and suffixes and NumPy can recognize suffixes like "64_" when built with 64 bit int support. We could do something like this potentially, adding a suffix/prefix to BLAS functions and use those names in MXNet, but again it's much easier to link NumPy and MXNet to the same BLAS
The text was updated successfully, but these errors were encountered: