Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in sgemm_kernel_direct on x86_64: Illegal instruction #2526

Closed
mattip opened this issue Mar 22, 2020 · 16 comments
Closed

segfault in sgemm_kernel_direct on x86_64: Illegal instruction #2526

mattip opened this issue Mar 22, 2020 · 16 comments

Comments

@mattip
Copy link
Contributor

mattip commented Mar 22, 2020

Over in NumPy, we are getting a segfault using commit ddcbed6. xref numpy/numpy#15796. I can reliably trigger this in numpy on my x86_64 laptop via

git checkout master
bash tools/pypy_test.sh

It does require sudo to install some helper packages, then downloads a pre-built OpenBLAS, and runs NumPy tests. I don't think this is a threading problem, since I don't see the [New Thread 0x7fffee89b700 (LWP 14552)] message around this call like I do when threads are active. Could there be a buffer overrun somewhere?

I am not sure it is a OpenBLAS error, since it only happens on PyPy. Any hints to debug would be welcome.

@martin-frbg
Copy link
Collaborator

Don't think any non-Haswell instructions could have crept up in that file recently, could it be an alignment issue ?

@wjc404
Copy link
Contributor

wjc404 commented Mar 22, 2020

Maybe this is caused by a DYNAMIC_ARCH=1 compilation on a SKX CPU and then run on a HSW (or lower) CPU. The /interface/gemm.c was compiled with USE_SGEMM_KERNEL_DIRECT=1 thus linked to sgemm_kernel_direct() function which can only run on CPUs supporting avx512.

@martin-frbg
Copy link
Collaborator

Ah yes you are right - then this must have been lurking for a while... technically the interface codes should not be assuming anything about the cpu I think. Now how to resolve this, apart from disabling this in the DYNAMIC_ARCH case as a stopgap measure ? Perhaps that sgemm_direct_performant function can be lifted out of the skylakex kernel and/or overridden in driver/others/dynamic.c

@wjc404
Copy link
Contributor

wjc404 commented Mar 22, 2020

I guess that the simplest way is to compare "gotoblas" struct pointer to the address of gotoblas_SKYLAKEX before calling sgemm_kernel_direct(), when DYNAMIC_ARCH is enabled.

@mattip
Copy link
Contributor Author

mattip commented Mar 22, 2020

Going with that theory: this would explain why the azure CI is showing these failures while the travis CI is not: the original OpenBLAS libraries are compiled on travis machines, which I think have avx512 CPUs.

@martin-frbg
Copy link
Collaborator

@wjc404 thanks for your suggestion, glad this is what I arrived at just a little later (now hope I did it correctly)

@martin-frbg
Copy link
Collaborator

Hmm. Wouldn't that fail only if the DYNAMIC_ARCH build was done on an AVX512 host without setting TARGET to something less powerful ? (In which case the build would be hosed anyway as the compliler could have generated AVX512 instructions anywhere in the common code).

@isuruf
Copy link
Contributor

isuruf commented Mar 22, 2020

I guess this is consistent with the following from the README.

The TARGET option can be used in conjunction with DYNAMIC_ARCH=1 to specify which cpu model should be assumed for all the common code in the library, usually you will want to set this to the oldest model you expect to encounter.

Maybe TARGET should be PRESCOTT or something like that if not specified in a DYNAMIC_ARCH x86 build?

@martin-frbg
Copy link
Collaborator

There probably will be "legitimate" cases where the default TARGET can be the build host. (And eventually there will be something newer than SKYLAKEX, so even defaulting to something non-AVX512 when building on SKX may not be a good idea in the future)

@mattip
Copy link
Contributor Author

mattip commented Mar 23, 2020

The theory that is it is due to running on Azure machines when OpenBLAS is compiled on Travis-ci was debunked in numpy/numpy#15809. Forcing the buffers to be misaligned does not seem to trigger a "illegal instruction. It may be some kind of race condition. Are there environment variables I can use to help debug this from a release build of OpenBLAS?

@martin-frbg
Copy link
Collaborator

Not sure how that was debunked, but the sgemm_kernel_direct function should only ever get called when the build was done on an AVX512-capable machine without TARGET being explicitly set to some older cpu.

@martin-frbg
Copy link
Collaborator

To actually answer your question (sorry), you can set OPENBLAS_CORETYPE at runtime to override automatic cpu detection (e.g. export OPENBLAS_CORETYPE=NEHALEM), and OPENBLAS_NUM_THREADS (or OMP_NUM_THREADS if using OpenMP) to limit the number of threads to less than the available (semi)cores in the system. As noted above I believe neither is likely to have any influence on this issue.

@martin-frbg
Copy link
Collaborator

Merged #1527 now

@martin-frbg
Copy link
Collaborator

Assumed fixed by #2533 (though it is probably still not a good idea to build DYNAMIC_ARCH on SKX without specifying a lower-capability TARGET for the common codes)

@mattip
Copy link
Contributor Author

mattip commented Apr 1, 2020

Maybe TARGET should be PRESCOTT or something like that if not specified in a DYNAMIC_ARCH x86 build

@isuruf this makes sense. Is PRESCOTT going to be the lowest current CPU NumPy is likely to encounter?

@isuruf
Copy link
Contributor

isuruf commented Apr 1, 2020

Is PRESCOTT going to be the lowest current CPU NumPy is likely to encounter?

On x86_64, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants