-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenBLAS crash from GSL (only with very large matrix) #2387
Comments
In the particular case I'm debugging now, the crash occurred in thread 29, with the backtrace: #0 0x00007ffff6b6eae0 in dscal_kernel_8_zero () from /usr/lib64/libopenblas.so.0 |
Probably #1698 - meaning you would need to increase the BUFFER_SIZE in common_x86_64.h |
OK. I can test this now. I also saw some mention of increasing the stack limit in #1698. Should I do that as well? ulimit -a tells me my stack size is currently 8192 Kbytes. Any guidance for how much I should increase the buffer size, to be safe? |
Informational note: increasing the stack size from 8 MB to 64 MB did not fix the crash. Next, I'll rebuild the library with an aggressive increase in the buffer size. |
You are using single-threaded openblas from multiple threads or how? |
I'm using multi-threaded OpenBLAS, which is being invoked from a single thread of my application. |
I'm still debugging the crash. When I changed the BUFFER_SIZE from "32 << 20" to "1024UL << 20", that did, in fact, stop the crash. But, after the change, it seemed as if the code was going to run indefinitely, so I terminated the program. (The run time was extraordinarily large, even compared to other large matrices of comparable size.) I'll look deeper into what's happening today. |
Yep the whole thing is fishy, which is why #1698 is still open. (Though I did not see excessive run times with that test case). With 128 threads and a huge matrix it could be that you hit someother bug as well - e.g. there may be a race condition in level3_thread.c (#2363, helgrind warnings on code protected by memory barriers only). |
Update: I swapped out OpenBLAS with the Intel MKL, and this also solves the crash. With the MKL, the runtime is also longer than I expected (appears to be the same as with the hacked OpenBLAS I built). I'm investigating the runtime in depth now, but the preliminary evidence suggests that increasing the buffer size did fix the problem for OpenBLAS. |
Can you produce a profiler report, |
After further investigation, there's no problem at all with the speed. The computation took longer than I expected because it hit a corner case which required the computation of all the eigenvalues of a symmetric matrix (which I've now improved to compute only a single eigenvalue). |
The conclusion is the following. There's a bug in OpenBLAS which causes a segmentation fault, but that crash can be avoided by increasing the BUFFER_SIZE, as suggested above. Alternatively, the crash can also be avoided by swapping out OpenBLAS with another BLAS library, like the Intel MKL. |
I understand how incredibly difficult it is to maintain any software, and much more so when the code is very highly optimized, portable, and extremely low level. (And, on top of it all, to do it essentially for free!) But if the OpenBLAS has not been thoroughly tested on architectures with > 32 cores (as I saw in a recent post here), and many people have reported problems in that regime, then this caveat should be made much clearer to prospective users on the OpenBLAS website. Computing on machines with many CPU nodes is now very common (anyone can get access to such machines via AWS EC2), and the CPU limitations of the library will be an important consideration for many scientists and engineers trying to decide which open-source library to use. Even though the change in the BUFFER_SIZE has fixed the crash for me, I worry about whether I should continue using the library for very large computations, or whether I should just switch over to the Intel MKL, which is now available for free as well. What do you, as the maintainers of OpenBLAS, recommend? |
I am not into marketing - use whatever you like but be sure to check results (for plausibilty at least) wherever you can. Do not expect MKL (or anything else) to be bug-free just because it is supplied by a big corporation. Contributors to free software may be amateurs when it comes to that particular kind of task, but more often than not they will be professionals in a related discipline. All I can say is that I can not vouch for OpenBLAS performance on big systems at the moment, and a few issues have been reported lately (which are out in the open here, though not all may turn out to be actual OpenBLAS bugs). (But if you read the MKL end user license, I bet it will be fit for decorative purpose only as well 😄 ) OpenBLAS is now used by very many programs (knowingly or unknowingly), so it could perhaps even be claimed that it must be perfect if there are only around 130 bugs reported against it. (Most of them involve specific, even outdated hardware, or build problems with a particular compiler or operating system). I may not be up to date on the licensing terms of MKL (and the other bits related to the Intel compilers) - at least until a few years ago, it used to be non-redistributable and any hint of grant money or other payments received in relation to one's work disallowed use of the free version. |
Could you elaborate on
You still did not elaborate on the size of input matrices. Those pesky debug symbols, you know. |
@brada4 a matrix size of 42k by 42k was mentioned from the very start, in #1698 it was a 3 by 100.000.000 array (more manageable in terms of memory but still too big for the default BUFFER_SIZE). As it seems there are no other side effects, this setting could be exposed in Makefile.rule (and CMakeLists.txt) - I guess not everybody would require (nor want to malloc) a gigabyte-sized buffer although "common" memory sizes on x86_64 have certainly improved since the days of GotoBLAS. (The x86_64 default was copied from GotoBLAS2 of 2009, |
There is one STACK_ALLOC instance in interface/gemv.c which in turn makes note about being too fragile for optimisations. |
Lowercase "buffer_size" in that file is something else entirely, and is only on the order of the sum of the matrix dimensions (and won't actually go on the stack if it gets too big for that). |
I have a technical question now on the patched build of the library I made. I built the library using Now, in the makefile for my application, I have -lblas on the link line, so I also made the sym link I noticed that, if I remove the libopenblas symlinks, my application fails at startup because it can't find the libopenblas .so files. Any idea why that is? The program should be looking up libblas.so, not libopenblas.so. I'm running on RHEL Fedora, which doesn't have an alternatives mechanism like ubuntu (as far as I know). I then checked the older version of libopenblas that I got from my package manager. In /usr/lib64, there are many shared objects and symlinks, but two of them are: I'm a little confused at this point. I'm asking this question because I want to ensure I haven't made some mistake with my custom build of the library. |
Check with |
OK, I see from ldd that my application was built against libopenblas.so.0, not libblas.so, which explains that observation. |
Aha, now I see what happened. The following command This means that the library I built, libopenblasp-r0.3.8.dev.so, has the the DT_SONAME field set to "libopenblas.so.0". (This must be specified somewhere in the OpenBLAS installation logic.) Now, whenever a program is linked against a shared object, the library loader will check the DT_SONAME field of that .so file. If it's set, then the loader will load the shared object with that name, NOT the name of the library that was specified on the link line! So I linked my application against lblas (/usr/lib64libblas), which is a symlink to /usr/lib64/libopenblasp-r0.3.8.dev.so. Then, when I run my application, the loader actually looks for the file /usr/lib64/libopenblas.so.0, because that's the DT_SONAME in /usr/lib64/libopenblasp-r0.3.8.dev.so. This means that the libopenblas.so.0 file must always be present, always the loader will fail. |
Yes, sorry for not being more specific earier - this naming is by convention, to include the exact major version of the library required (in case of ABI-breaking changes in later library versions). This is set in exports/Makefile |
One last question before I happily close this issue, with thanks for all the help and insight I've received. I have other applications which rely on OpenBLAS, besides the one discussed here. In those applications, different application threads (C++ threads, using std::threads) independently call routines which invoke OpenBLAS methods. In these cases, I set the environment variable OPENBLAS_NUM_THREADS to a number (still greater than one) to avoid thread contention between my application threads and the threads needed by OpenBLAS. Is this thread-safe, in both versions 0.2.19 and 0.3.8 of the library? (The separate threads are all operating on different variables with non-overlapping memory addresses--they're not reading or writing the same data.) I checked that the library was thread-safe in this way before I developed those applications, but now I just want to double-check with the experts themselves. |
In that case it is advisable to build 0.3.8 with USE_LOCKING=1 - any version before 0.2.20 is unlikely to have been thread safe at all. (Finding out about this and trying to fix it was what got me involved in OpenBLAS development) |
I thought the USE_LOCKING option was only needed if one builds the single-threaded version of OpenBLAS. But I always build/use the multi-threaded version. Should I still build 0.3.8 with USE_LOCKING if I’m only building multi-threaded OpenBLAS? |
Meh - I misread, sorry about that. The comment about 0.2.19 is valid however. |
OK. Got it. I'm closing this issue now, with many thanks for your help and support. |
I've encountered a crash in OpenBLAS, in both versions 0.2.19 (from the RHEL Fedora EPEL repo) and in 0.3.8 (which I built myself). The crash occurs on an AWS x1e.32xlarge instance, running the RHEL Fedora OS that ships with the Amazon Linux AMI. The instance has about 4 TB of RAM and 128 CPUs.
The stack trace is:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6b6eae0 in dscal_kernel_8_zero () from /usr/lib64/libopenblas.so.0
#0 0x00007ffff6b6eae0 in dscal_kernel_8_zero () from /usr/lib64/libopenblas.so.0
#1 0x00007ffff6b6ed34 in dscal_k_HASWELL () from /usr/lib64/libopenblas.so.0
#2 0x00007ffff5e17bff in symv_kernel () from /usr/lib64/libopenblas.so.0
#3 0x00007ffff5ff3549 in exec_blas () from /usr/lib64/libopenblas.so.0
#4 0x00007ffff5e17eb6 in dsymv_thread_U () from /usr/lib64/libopenblas.so.0
#5 0x00007ffff5dd4a46 in cblas_dsymv () from /usr/lib64/libopenblas.so.0
#6 0x00007ffff79b7d80 in gsl_blas_dsymv (Uplo=Uplo@entry=CblasLower, alpha=alpha@entry=1.1044668107764084, A=A@entry=0x7fffffffc800, X=X@entry=0x7fffffffc7a0,
beta=beta@entry=0, Y=Y@entry=0x7fffffffc7d0) at blas.c:782
#7 0x00007ffff7a1d732 in gsl_linalg_symmtd_decomp (A=A@entry=0x6b1eac50, tau=tau@entry=0x7fffffffc970) at symmtd.c:97
#8 0x00007ffff79c07e9 in gsl_eigen_symm (A=A@entry=0x6b1eac50, eval=eval@entry=0x6b1f5cf0, w=w@entry=0x6b12f280) at symm.c:126
You can see that the crash occurs during a call to the GSL function gsl_eigen_symm. The input to gsl_eigen_symm, in this case, is a 42898x42898 symmetric matrix.
At the time of the crash, there are 128 threads. Most of the threads are in either dscal_kernel_8_zero or dsymv_U_HASWELL at the time of the crash.
I saw some old posts about a crash with a similar stack trace, but it wasn't clear to me whether the underlying issue was fixed. Is this a known issue with a simple workaround?
The text was updated successfully, but these errors were encountered: