You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What occurs in my case is that bli_saxpyv_zen_int10 is compiled with flags for AVX, and GCC makes the assumption that the incoming stack pointer is 32-byte aligned. Then, because of the debug flags, horrible code is emitted which spills vector registers (__m256) to the stack. This spilling uses aligned move instructions which require 32-byte alignment. GCC dutifully 32-byte aligns all offsets from the stack pointer to satisfy this constraint. Here is the catch: if the incoming stack point is NOT actually 32-byte aligned (as it isn't in BLIS, since the calling code is not compiled for AVX but for general x86_64), then we encounter a segfault.
It has proven quite difficult to actually force GCC to align the stack using e.g. andq $-32, %rsp. The only thing that seems to work is -mincoming-stack-boundary=3 (not 4!). I will add this to the default haswell flags since so far I can only confirm the problem on a mac. I'll see about Linux shortly.
The text was updated successfully, but these errors were encountered:
The perfect storm happens:
CFLAGS="-g -O0"
)haswell
configurationWhat occurs in my case is that
bli_saxpyv_zen_int10
is compiled with flags for AVX, and GCC makes the assumption that the incoming stack pointer is 32-byte aligned. Then, because of the debug flags, horrible code is emitted which spills vector registers (__m256
) to the stack. This spilling uses aligned move instructions which require 32-byte alignment. GCC dutifully 32-byte aligns all offsets from the stack pointer to satisfy this constraint. Here is the catch: if the incoming stack point is NOT actually 32-byte aligned (as it isn't in BLIS, since the calling code is not compiled for AVX but for general x86_64), then we encounter a segfault.It has proven quite difficult to actually force GCC to align the stack using e.g.
andq $-32, %rsp
. The only thing that seems to work is-mincoming-stack-boundary=3
(not 4!). I will add this to the defaulthaswell
flags since so far I can only confirm the problem on a mac. I'll see about Linux shortly.The text was updated successfully, but these errors were encountered: