PyPy + NumPy + SVX causes a segfault #2705

mattip · 2020-07-03T13:41:08Z

Not sure what is going on, maybe someone here has an idea. I don't have access to x64 Skylake SVX hardware to try this out. Over at NumPy, using OpenBLAS 0.3.10, we are seeing intermittent segfaults when testing with PyPy 3.6-v7.3.1. xref numpy/numpy#16737. Any thoughts? We recently improved the CPU detection done in NumPy in preparation for refactoring our SIMD code, which is why we now know the CPU architecture that segfaults.

martin-frbg · 2020-07-03T14:20:23Z

That matmul is basically DGEMM, right ? If so, the change in #2646 could play a role (though I wonder why our tests would not catch this - if this is actually OpenBLAS' fault, the build-time test/dblat3 should have segfaulted as well, as it runs dgemm for various matrix sizes including 63x63)

mattip · 2020-07-03T14:41:26Z

The strange thing about this is it only seems to happen on PyPy (which has a JIT): I wonder if somehow some registers or floating point flags are getting messed up

martin-frbg · 2020-07-03T14:55:46Z

Can't really help you with PyPy or its JIT, but did you start seeing this only with 0.3.10 (which would suggest a connection with recent changes in OpenBLAS) ?

mattip · 2020-07-03T15:20:22Z

We had segfaults with PyPy, reported in #2526 that dissapeared after ~~#1527~~ #2527. Then after merging 0.3.10 they started up again.

Edit: wrong PR.

mattip · 2020-07-03T15:22:24Z

BTW, #2526 mentioned NumPy should be setting TARGET, which is not done.

martin-frbg · 2020-07-03T16:02:56Z

Without TARGET, OpenBLAS would get built for whatever the build host is - which can mess up DYNAMIC_ARCH builds by adding AVX512 instructions to the common code. 0.3.9 (when compiled on SKX) caused an additional problem by missing a runtime check for AVX512 capability in the SGEMM kernel (which should be fixed by #2527 in 0.3.10 - I do not think a later change could have overwritten this fix, but I will check this)

mattip · 2020-07-03T16:10:45Z

Since the machine that segfaults reports it is SVX capable, I don’t think TARGET is the problem

martin-frbg · 2020-07-03T21:28:43Z

So what exactly do I need to (install and) do to reproduce this ?

mattip · 2020-07-04T05:41:43Z

After you git clone numpy, this is the script run by CI. You may want to comment out the sudo linesnear the top. The script downloads and puts OpenBLAS into a tmpdir, then adjust the build to use it, then builds and tests numpy via ./runtests.py. After it has run once, you can rerun a test with

pypy3/bin/pypy3 runtests.py — path/to/file -k testname

which will hand off arguments after the double dash to pytest

wjc404 · 2020-07-04T13:44:11Z

I've just cloned numpy and followed the steps shown by Prof. Matti Picus to build it with OpenBLAS-0.3.10 in a temporary directory.

wang@wang-Z390-M-GAMING:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz
Stepping: 4
CPU MHz: 1200.240
CPU max MHz: 3801.0000
CPU min MHz: 1200.0000
BogoMIPS: 7599.80
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 16896K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

wang@wang-Z390-M-GAMING:~/numpy$ pypy3/bin/pypy3 tools/openblas_support.py --check_version
OpenBLAS get_config returned b'OpenBLAS 0.3.10 DYNAMIC_ARCH NO_AFFINITY SkylakeX MAX_THREADS=64'
b'OpenBLAS 0.3.10'

Then I ran the code in numpy/numpy#16737 without observing something stucked.

wang@wang-Z390-M-GAMING:~$ numpy/pypy3/bin/pypy3
Python 3.6.9 (2ad108f17bdb, Apr 07 2020, 02:59:05)
[PyPy 7.3.1 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import numpy as np
>>>> shape = (50, 50)
>>>> a = np.random.randn(*shape)
>>>> a = np.matmul(a.T, a)
>>>> exit()

wang@wang-Z390-M-GAMING:~/numpy$ pypy3/bin/pypy3 runtests.py
Building, see build.log...
Build OK
NumPy version 1.20.0.dev0+9298eeb
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F* AVX512CD* AVX512_KNL? AVX512_SKX* AVX512_CNL?
"A list of '.', 'x' and 's', not shown here"
[100%]
11449 passed, 160 skipped, 111 deselected, 77 xfailed, 11 xpassed in 273.63s (0:04:33)

wang@wang-Z390-M-GAMING:~/numpy$ pypy3/bin/pypy3 numpy/linalg/tests/test_linalg.py
wang@wang-Z390-M-GAMING:~/numpy$

martin-frbg · 2020-07-04T15:57:59Z

No hang or other "unexpected" failure observed on my system either (Xeon-W 2123, gcc 9.3, python 3.6.9, script summary "11460 passed, 149 skipped, 111 deselected, 77 xfailed, 11 xpassed")

mattip · 2020-07-04T19:51:10Z

@wjc404 (I am not a professor, but thanks), @martin-frbg thanks for checking. The failing machine is reporting

NumPy version 1.20.0.dev0+4fd295c
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* 
AVX512F* \
AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CNL?

Compared to @wjc404's machine there is an additional AVX512_KNM? . The question mark means "Dispatched features that are not supported by the running machine end with ?" where the asterix means "Supported dispatched features by the running machine end with *". I am not sure what to make of that, but it seems the CI machine that segfaults is subtly different from these machines?

martin-frbg · 2020-07-04T21:30:32Z

I'd read that as "the software knows how to pass along the specific AVX512 feature sets of Knights Landing, Knights Mill and Cannon Lake, but this particular hardware does not support them" ? For the record, mine reports

NumPy version 1.20.0.dev0+9298eeb
NumPy relaxed strides checking option: True
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2*
 AVX512F* \
 AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CLX? AVX512_CNL? AVX512_ICL?

so the ones with a question mark are at least not relevant for proper functioning (I think AVX512CD is currently the bare minimum the OpenBLAS SKX kernels require, and an unsupported AVX512 instruction would probably cause SIGILL rather than SIGSEGV)

brada4 · 2020-07-05T14:14:20Z

In a linked thread I find that a * a' was failing while a * a is not.

brada4 · 2020-07-05T16:18:06Z

Does the sample fail only in long test batch,i.e. some corruption accumulated previously, or simple standalone script, like set array and multiply also fails?

mattip · 2020-07-09T05:41:20Z

Closing as "can't reproduce". Thanks all for digging into this, sorry for the time invested with no conclusion.

mattip closed this as completed Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyPy + NumPy + SVX causes a segfault #2705

PyPy + NumPy + SVX causes a segfault #2705

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020

mattip commented Jul 3, 2020 •

edited

Loading

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020 •

edited

Loading

mattip commented Jul 4, 2020

wjc404 commented Jul 4, 2020 •

edited

Loading

martin-frbg commented Jul 4, 2020

mattip commented Jul 4, 2020

martin-frbg commented Jul 4, 2020

brada4 commented Jul 5, 2020 •

edited

Loading

brada4 commented Jul 5, 2020

mattip commented Jul 9, 2020

PyPy + NumPy + SVX causes a segfault #2705

PyPy + NumPy + SVX causes a segfault #2705

Comments

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020

mattip commented Jul 3, 2020 • edited Loading

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020

mattip commented Jul 3, 2020

martin-frbg commented Jul 3, 2020 • edited Loading

mattip commented Jul 4, 2020

wjc404 commented Jul 4, 2020 • edited Loading

martin-frbg commented Jul 4, 2020

mattip commented Jul 4, 2020

martin-frbg commented Jul 4, 2020

brada4 commented Jul 5, 2020 • edited Loading

brada4 commented Jul 5, 2020

mattip commented Jul 9, 2020

mattip commented Jul 3, 2020 •

edited

Loading

martin-frbg commented Jul 3, 2020 •

edited

Loading

wjc404 commented Jul 4, 2020 •

edited

Loading

brada4 commented Jul 5, 2020 •

edited

Loading