You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If -ffast-math is included on the gcc compile line, the resulting code segfaults. This option seems desirable because it might improve mkFit's timing performance, by enabling a higher degree of vectorization. (On the other hand, there are indications that increased vectorization might be affecting physics performance negatively, based on comparisons of icc-vectorized with icc-unvectorized and gcc; see the comments by @mmasciov on issue #346.)
Background: using godbolt.org, I acted upon a months-old suggestion from @makortel and experimented with a very simple code consisting of a single loop that calls sin(). I found in agreement with him that the -ffast-math option is sufficient to induce gcc to insert calls to the vectorized sin() function from libmvec, given a recent enough glibc. The necessary suboptions were found to be (by @dan131riley, and confirmed by me) -fno-math-errno -ffinite-math-only -fno-rounding-math -funsafe-math-optimizations. Without these 4 suboptions, gcc is constrained not to vectorize this loop. Here is a link where anyone can reproduce those experiments: https://godbolt.org/z/qT9j7bvx1
The specific suboption of -ffast-math that breaks mkFit seems to be -funsafe-math-optimizations.
A further clue is that if intrinsics are disabled and auto-vectorization is limited by -DMPT_SIZE=1, the resulting code does NOT segfault with -ffast-math. The specific make and run commands that were successful on lnx7188, after doing source xeon_scripts/init-env.sh, were
We have known for some time that vectorization alters aspects of math operations, which may affect the precision of intermediate results. This is likely the reason for past variations in numbers of builtcands (issue #267), and may be at work again here.
The text was updated successfully, but these errors were encountered:
UPDATE: even though -ffast-math enables some types of vectorized math, it is NOT any vectorization-related optimizations in gcc that break the code, when -ffast-math is specified. If I totally eliminate all vectorization by (1) changing Makefile.config to default to -fno-tree-vectorize -fopt-info-vec, (2) removing ftree-vectorize and -fopenmp-simd from CXXFLAGS in Makefile.config , and (3) defining USE_INTRINSICS:="-DMPT_SIZE=8" for Matriplex (so, there aren't even any vector intrinsics!), the resulting code still segfaults with CPPUSERFLAGS=-ffast-math, or even just CPPUSERFLAGS=-funsafe-math-optimizations.
In other words, the key to preventing segfaults seems to be just USE_INTRINSICS:="-DMPT_SIZE=1". Such a definition reduces Matriplex to a bunch of pointless copying without conveying any vector-size advantages. However, this definition by itself does not do anything to prevent (or enable) vectorization globally. It appears that nontrivial Matriplex types (MPT_SIZE>1) are somehow incompatible with -ffast-math and more specifically with -funsafe-math-optimizations (since MPT_SIZE=4 and 2 don't work either); and this is independent of whether or not there are vector instructions in the code.
If
-ffast-math
is included on the gcc compile line, the resulting code segfaults. This option seems desirable because it might improve mkFit's timing performance, by enabling a higher degree of vectorization. (On the other hand, there are indications that increased vectorization might be affecting physics performance negatively, based on comparisons of icc-vectorized with icc-unvectorized and gcc; see the comments by @mmasciov on issue #346.)Background: using godbolt.org, I acted upon a months-old suggestion from @makortel and experimented with a very simple code consisting of a single loop that calls sin(). I found in agreement with him that the
-ffast-math
option is sufficient to induce gcc to insert calls to the vectorized sin() function from libmvec, given a recent enough glibc. The necessary suboptions were found to be (by @dan131riley, and confirmed by me)-fno-math-errno -ffinite-math-only -fno-rounding-math -funsafe-math-optimizations
. Without these 4 suboptions, gcc is constrained not to vectorize this loop. Here is a link where anyone can reproduce those experiments: https://godbolt.org/z/qT9j7bvx1The specific suboption of
-ffast-math
that breaks mkFit seems to be-funsafe-math-optimizations
.A further clue is that if intrinsics are disabled and auto-vectorization is limited by
-DMPT_SIZE=1
, the resulting code does NOT segfault with-ffast-math
. The specific make and run commands that were successful on lnx7188, after doingsource xeon_scripts/init-env.sh
, wereWe have known for some time that vectorization alters aspects of math operations, which may affect the precision of intermediate results. This is likely the reason for past variations in numbers of builtcands (issue #267), and may be at work again here.
The text was updated successfully, but these errors were encountered: