Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'BLAS : Program is Terminated...' Even When Not Cross-Compiled #938

Closed
u55 opened this issue Aug 2, 2016 · 5 comments
Closed

'BLAS : Program is Terminated...' Even When Not Cross-Compiled #938

u55 opened this issue Aug 2, 2016 · 5 comments

Comments

@u55
Copy link

u55 commented Aug 2, 2016

Hi OpenBLAS developers,

I recently came across a test failure in python's scipy package, scipy/scipy#6422, when scipy is compiled with openblas using the configuration:

"FC=gfortran USE_OPENMP=0 USE_THREAD=1 MAJOR_VERSION=3 NO_LAPACK=0 BUILD_LAPACK_DEPRECATED=1"

The scipy test runs several matrix math operations in 20 python threads (within a single process controlled by python's Global Interpreter Lock) and produces the error message
'BLAS : Program is Terminated. Because you tried to allocate too many memory regions' multiple times and then segfaults.

Having read the FAQ for openblas, I recompiled openblas with the additional configuration option 'NUM_THREADS=64', which seems to fix the problem. However, this seems to me to be an openblas bug, based on a comment from @jeromerobert in issue #889, who said this about 'NUM_THREADS':

... It should be automatically detected by the build system. The only reason to manually set NUM_THREADS is to build OpenBLAS for an other machine which have more physical cores than the current machine.

Since I am compiling openblas myself on the same machine that is using scipy, this suggests to me that openblas is not detecting the right value of NUM_THREADS automatically for my CPU during the build process. However, I find

...
CORE=SANDYBRIDGE
LIBCORE=sandybridge
NUM_CORES=8
...

written in Makefile.conf, which suggests that openblas is able to detect the correct number of logical cores of my CPU. So I don't understand why the 'NUM_THREADS=64' configuration is necessary. Can you please confirm if this is an openblas bug, or if it is a "feature"?

Thanks.

@martin-frbg
Copy link
Collaborator

As I read #889, jeromerobert (and theoractice later in the thread) was suggesting that running such a big number of threads is not expected to bring any performance benefits, hence OpenBLAS' automatic setup is for a much smaller limit.
(I do note that the #define NUM_BUFFERS (MAX_CPU_NUMBER * 2) is unchanged from Kazushige Goto's work on the original GotoBLAS of ~ 10 years ago, so maybe doing some benchmarks with varying thread numbers may provide arguments for changing this default on current hardware. On the other hand, Sandybridge is not brand-new either and I suspect one of the developers may have done just that for such architecture already.)

@brada4
Copy link
Contributor

brada4 commented Aug 4, 2016

If python creates 20 threads it eceeds all cpu cores you have (four)
Pthread openblas is not thread safe. You should use openmp version to stay away from crashes or single-thread version if you want to cpntrol threading in python side.

@u55
Copy link
Author

u55 commented Aug 4, 2016

@brada4 Are you sure that the pthread build of openblas is unsafe to use with python threads (which are NOT the same as multiprocesses)? According to the comments in scipy's site.cfg.example file, the pthread build of openblas should be safe to use with python threading but not python multiprocessing.

Also according to the comments in scipy's site.cfg.example file, openblas does not work with GNU openmp, as of gcc-4.9. Has this been fixed in recent gcc versions?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 4, 2016

Hmm. That comment in site.cfg.example was added to numpy's version of the file two years ago, apparently in response to their ticket numpy/numpy#654 which makes reference to a mailing list discussion from three years ago. (And the discussion was not much of a discussion, more like "i have this problem" - "yeah i know, just run with one thread only").
I see zero background of what platforms and which versions of both openblas, python and their own codes they might have tested in conjunction with gcc-4.9 so probably worth trying for yourself. Personally I have not experienced problems with openblas+openmp but I do not use numpy/scipy

Edit: could be that the numpy comment is related to #85 - same user taking part there and in the aforementioned discussion though timeframe not quite right

@brada4
Copy link
Contributor

brada4 commented Aug 16, 2016

As if your current numpy worked so well that I am suggesting some weird hack to break it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants