Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old AVX test fails due to GPF/sigsegv #35

Open
xloem opened this issue Jun 26, 2022 · 0 comments
Open

Old AVX test fails due to GPF/sigsegv #35

xloem opened this issue Jun 26, 2022 · 0 comments

Comments

@xloem
Copy link

xloem commented Jun 26, 2022

Summary

The avx sgemm test fails at n=14 on my system, apparently due to a general protection fault reading unaligned memory.

Problem Information

_mm256_load_ps raises a general protection fault if an address is passed to it that is not aligned to 32 bytes.
Documentation: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_load_ps&ig_expand=4279

System information

$ cat .git/refs/heads/master
e7726a4c165cc45ac117e9eabd8761013a26640e
$ uname -a
Linux DESKTOP-E3P7TC0 3.10.0-1160.62.1.el7.x86_64 #1 SMP Wed Mar 23 09:04:02 UTC 2022 x86_64 GNU/Linux
$ cat /etc/system-release
Red Hat Enterprise Linux Workstation release 7.9 (Maipo)

Diagnostic Information

$ gdb cpp/bolt-build/bolt
(gdb) run
Starting program: /shared/src/bolt/cpp/bolt-build/bolt 
brute force sgemm test, n = 1...
brute force sgemm test, n = 2...
brute force sgemm test, n = 5...
brute force sgemm test, n = 14...

Program received signal SIGSEGV, Segmentation fault.
0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
874       return *(__m256 *)__P;
(gdb) up
#1  (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1, 
    out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
394                             sums[mm] = _mm256_load_ps(out_ptr);
(gdb) p out_ptr
$1 = (float *) 0x886cf8
(gdb) bt
#0  0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
#1  (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1, 
    out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
#2  0x000000000059ff9e in sgemm_colmajor (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0) at /home/user/src/bolt/cpp/src/utils/avx_utils.cpp:18
#3  0x00000000005ee949 in _test_sgemm_colmajor<-1, -1> (N=14, D=1, M=2, simple_entries=false) at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:54
#4  0x00000000005ec3f5 in ____C_A_T_C_H____T_E_S_T____100 () at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:155
#5  0x00000000005bb56e in Catch::FreeFunctionTestCase::invoke (this=0x86ef90) at /home/user/src/bolt/cpp/test/external/catch.hpp:5507
#6  0x00000000005aa337 in Catch::TestCase::invoke (this=0x889280) at /home/user/src/bolt/cpp/test/external/catch.hpp:6389
#7  0x00000000005b972b in Catch::RunContext::runCurrentTest (this=0x7fffffffd560, redirectedCout="", redirectedCerr="") at /home/user/src/bolt/cpp/test/external/catch.hpp:5131
#8  0x00000000005b8737 in Catch::RunContext::runTest (this=0x7fffffffd560, testCase=...) at /home/user/src/bolt/cpp/test/external/catch.hpp:5001
#9  0x00000000005ba095 in Catch::Runner::runTests (this=0x7fffffffd810) at /home/user/src/bolt/cpp/test/external/catch.hpp:5275
#10 0x00000000005bae1b in Catch::Session::run (this=0x7fffffffdb10) at /home/user/src/bolt/cpp/test/external/catch.hpp:5395
#11 0x00000000005bace8 in Catch::Session::run (this=0x7fffffffdb10, argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/external/catch.hpp:5378
#12 0x00000000005ae8bb in main (argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/main.cpp:22
$ valgrind cpp/bolt-build/bolt
==5087== Memcheck, a memory error detector                                                     
==5087== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.                       
==5087== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright info                
==5087== Command: cpp/bolt-build/bolt                                                          
==5087==                                                                                       
brute force sgemm test, n = 1...                                                               
brute force sgemm test, n = 2...                                                               
brute force sgemm test, n = 5...                                                                                                                                                              
brute force sgemm test, n = 14...                                                                                                                                                             
==5087==                                                                                                                                                                                      
==5087== Process terminating with default action of signal 11 (SIGSEGV)                                                                                                                       
==5087==  General Protection Fault                                                                                                                                                            
==5087==    at 0x5A1AF9: _mm256_load_ps (avxintrin.h:874)                                                                                                                                     
==5087==    by 0x5A1AF9: void (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2>(float const*, float const*, int, int, int, float*, bool, int, int, int, int) (avx_utils.hpp:394)      
==5087==    by 0x59FF9D: sgemm_colmajor(float const*, float const*, int, int, int, float*) (avx_utils.cpp:18)                                                                                 
==5087==    by 0x5EE948: void _test_sgemm_colmajor<-1, -1>(int, int, int, bool) (test_avx_utils.cpp:54)                                                                                       
==5087==    by 0x5EC3F4: ____C_A_T_C_H____T_E_S_T____100() (test_avx_utils.cpp:155)                                                                                                           
==5087==    by 0x5BB56D: Catch::FreeFunctionTestCase::invoke() const (catch.hpp:5507)                                                                                                         
==5087==    by 0x5AA336: Catch::TestCase::invoke() const (catch.hpp:6389)                                                                                                                     
==5087==    by 0x5B972A: Catch::RunContext::runCurrentTest(std::string&, std::string&) (catch.hpp:5131)
==5087==    by 0x5B8736: Catch::RunContext::runTest(Catch::TestCase const&) (catch.hpp:5001)
==5087==    by 0x5BA094: Catch::Runner::runTests() (catch.hpp:5275)
==5087==    by 0x5BAE1A: Catch::Session::run() (catch.hpp:5395)
==5087==    by 0x5BACE7: Catch::Session::run(int, char* const*) (catch.hpp:5378)
==5087==    by 0x5AE8BA: main (main.cpp:22)

Work to Resolve

I pursued this just a little bit before changing tasks. Following the control flow, it looked to me like the unaligned memory arose from passing a matrix with a nonaligned stride greater than 8, to sgemm_colmajor. I've come up with the below so far, but have not yet tested and debugged it and make frequent mistakes so likely something is wrong. The patch is pasted here from a tmux pane and then hand edited, so may need manual application.

diff --git a/cpp/test/test_avx_utils.cpp b/cpp/test/test_avx_utils.cpp
index 4ec4e00..34c318a 100644
--- a/cpp/test/test_avx_utils.cpp
+++ b/cpp/test/test_avx_utils.cpp
@@ -44,6 +44,8 @@ void _test_sgemm_colmajor(int N, int D, int M, bool simple_entries=false) {
         B.setRandom();
     }
     C = (C.array() + -999).matrix();  // value we won't accidentally get
+    int aligned_rows = C.rows() - (C.rows() % (-32 / int(sizeof(float))));
+    C.resize(aligned_rows, C.cols());
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant