You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ cat .git/refs/heads/master
e7726a4c165cc45ac117e9eabd8761013a26640e
$ uname -a
Linux DESKTOP-E3P7TC0 3.10.0-1160.62.1.el7.x86_64 #1 SMP Wed Mar 23 09:04:02 UTC 2022 x86_64 GNU/Linux
$ cat /etc/system-release
Red Hat Enterprise Linux Workstation release 7.9 (Maipo)
Diagnostic Information
$ gdb cpp/bolt-build/bolt
(gdb) run
Starting program: /shared/src/bolt/cpp/bolt-build/bolt
brute force sgemm test, n = 1...
brute force sgemm test, n = 2...
brute force sgemm test, n = 5...
brute force sgemm test, n = 14...
Program received signal SIGSEGV, Segmentation fault.
0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
874 return *(__m256 *)__P;
(gdb) up
#1 (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1,
out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
394 sums[mm] = _mm256_load_ps(out_ptr);
(gdb) p out_ptr
$1 = (float *) 0x886cf8
(gdb) bt
#0 0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
#1 (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1,
out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
#2 0x000000000059ff9e in sgemm_colmajor (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0) at /home/user/src/bolt/cpp/src/utils/avx_utils.cpp:18
#3 0x00000000005ee949 in _test_sgemm_colmajor<-1, -1> (N=14, D=1, M=2, simple_entries=false) at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:54
#4 0x00000000005ec3f5 in ____C_A_T_C_H____T_E_S_T____100 () at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:155
#5 0x00000000005bb56e in Catch::FreeFunctionTestCase::invoke (this=0x86ef90) at /home/user/src/bolt/cpp/test/external/catch.hpp:5507
#6 0x00000000005aa337 in Catch::TestCase::invoke (this=0x889280) at /home/user/src/bolt/cpp/test/external/catch.hpp:6389
#7 0x00000000005b972b in Catch::RunContext::runCurrentTest (this=0x7fffffffd560, redirectedCout="", redirectedCerr="") at /home/user/src/bolt/cpp/test/external/catch.hpp:5131
#8 0x00000000005b8737 in Catch::RunContext::runTest (this=0x7fffffffd560, testCase=...) at /home/user/src/bolt/cpp/test/external/catch.hpp:5001
#9 0x00000000005ba095 in Catch::Runner::runTests (this=0x7fffffffd810) at /home/user/src/bolt/cpp/test/external/catch.hpp:5275
#10 0x00000000005bae1b in Catch::Session::run (this=0x7fffffffdb10) at /home/user/src/bolt/cpp/test/external/catch.hpp:5395
#11 0x00000000005bace8 in Catch::Session::run (this=0x7fffffffdb10, argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/external/catch.hpp:5378
#12 0x00000000005ae8bb in main (argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/main.cpp:22
$ valgrind cpp/bolt-build/bolt
==5087== Memcheck, a memory error detector
==5087== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5087== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright info
==5087== Command: cpp/bolt-build/bolt
==5087==
brute force sgemm test, n = 1...
brute force sgemm test, n = 2...
brute force sgemm test, n = 5...
brute force sgemm test, n = 14...
==5087==
==5087== Process terminating with default action of signal 11 (SIGSEGV)
==5087== General Protection Fault
==5087== at 0x5A1AF9: _mm256_load_ps (avxintrin.h:874)
==5087== by 0x5A1AF9: void (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2>(float const*, float const*, int, int, int, float*, bool, int, int, int, int) (avx_utils.hpp:394)
==5087== by 0x59FF9D: sgemm_colmajor(float const*, float const*, int, int, int, float*) (avx_utils.cpp:18)
==5087== by 0x5EE948: void _test_sgemm_colmajor<-1, -1>(int, int, int, bool) (test_avx_utils.cpp:54)
==5087== by 0x5EC3F4: ____C_A_T_C_H____T_E_S_T____100() (test_avx_utils.cpp:155)
==5087== by 0x5BB56D: Catch::FreeFunctionTestCase::invoke() const (catch.hpp:5507)
==5087== by 0x5AA336: Catch::TestCase::invoke() const (catch.hpp:6389)
==5087== by 0x5B972A: Catch::RunContext::runCurrentTest(std::string&, std::string&) (catch.hpp:5131)
==5087== by 0x5B8736: Catch::RunContext::runTest(Catch::TestCase const&) (catch.hpp:5001)
==5087== by 0x5BA094: Catch::Runner::runTests() (catch.hpp:5275)
==5087== by 0x5BAE1A: Catch::Session::run() (catch.hpp:5395)
==5087== by 0x5BACE7: Catch::Session::run(int, char* const*) (catch.hpp:5378)
==5087== by 0x5AE8BA: main (main.cpp:22)
Work to Resolve
I pursued this just a little bit before changing tasks. Following the control flow, it looked to me like the unaligned memory arose from passing a matrix with a nonaligned stride greater than 8, to sgemm_colmajor. I've come up with the below so far, but have not yet tested and debugged it and make frequent mistakes so likely something is wrong. The patch is pasted here from a tmux pane and then hand edited, so may need manual application.
diff --git a/cpp/test/test_avx_utils.cpp b/cpp/test/test_avx_utils.cpp
index 4ec4e00..34c318a 100644
--- a/cpp/test/test_avx_utils.cpp
+++ b/cpp/test/test_avx_utils.cpp
@@ -44,6 +44,8 @@ void _test_sgemm_colmajor(int N, int D, int M, bool simple_entries=false) {
B.setRandom();
}
C = (C.array() + -999).matrix(); // value we won't accidentally get
+ int aligned_rows = C.rows() - (C.rows() % (-32 / int(sizeof(float))));
+ C.resize(aligned_rows, C.cols());
The text was updated successfully, but these errors were encountered:
Summary
The avx sgemm test fails at n=14 on my system, apparently due to a general protection fault reading unaligned memory.
Problem Information
_mm256_load_ps
raises a general protection fault if an address is passed to it that is not aligned to 32 bytes.Documentation: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_load_ps&ig_expand=4279
System information
Diagnostic Information
Work to Resolve
I pursued this just a little bit before changing tasks. Following the control flow, it looked to me like the unaligned memory arose from passing a matrix with a nonaligned stride greater than 8, to
sgemm_colmajor
. I've come up with the below so far, but have not yet tested and debugged it and make frequent mistakes so likely something is wrong. The patch is pasted here from a tmux pane and then hand edited, so may need manual application.The text was updated successfully, but these errors were encountered: