You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It might not be a bug, but it's better to be a warning for user.
Description
g++-9 with -O2 above and avx512 enabled will compile the argsort_n_vec(of x86-simd-sort) with some mmx instructions (which use registers MM0~MM7 are ususally alias to x87 ST(0) ~ ST(7)), but won't clear x87 fpu state.
So operations on long double, which used x87 instructions default, will perform different result (NAN as mainly).
The worse thing is that long double is used by prime_hashtable_policy which is the default backend of unorderd_map in g++-9.
If we call unordered_map::rehash after argsort/argselect(of x86-simd-sort), rehash(size_t(-1)) was performed, and std::bad_alloc was raised.
Unfortunately g++-9 is end of life. But if someone can upgrade g++ to newer, this problem will disappered.
Workaround
Any one below can work, but also with their own side effect.
compile with -mno-mmx
make long double not to use x87 with -mlong-double-64 or -mlong-double-128
POC
Environment
os: ubuntu 18.04/20.04 with gcc-9.4 or ubuntu 22.04/24.04 with gcc-9.5 or gcc 9.5 docker offical image (it shall not be an os case.)
cpu: Intel 13700H with sde, Intel 11700B, AMD 7950x, Intel Gold Xeon 6145. ( it shall not be a hardware case.)
for (int i = numVecs / 2, j = 0; i < numVecs; i++, j++) {
uint64_t num_to_read
= std::min((uint64_t)std::max(0, N - i * keyType::numlanes),
(uint64_t)keyType::numlanes);
mmx instructions g++-9 generated.
1bad: 0f ef ed pxor %mm5,%mm5
1bc9: 0f ef f6 pxor %mm6,%mm6
1c09: 0f ef e4 pxor %mm4,%mm4
1c41: 0f ef db pxor %mm3,%mm3
1c71: 0f ef ff pxor %mm7,%mm7
1c85: 0f ef c9 pxor %mm1,%mm1
1c98: 0f ef c0 pxor %mm0,%mm0
1ccf: 0f ef d2 pxor %mm2,%mm2
db88: 0f ef c0 pxor %mm0,%mm0
db9b: 0f ef db pxor %mm3,%mm3
dbae: 0f ef d2 pxor %mm2,%mm2
dbc0: 0f ef c9 pxor %mm1,%mm1
Another poc without unordered_map
#include<cstdio>
#include"x86-simd-sort/src/x86simdsort-static-incl.h"doublecalc(size_t size) {
auto ptr = newdouble[size]{1};
auto arg = x86simdsortStatic::argsort(ptr, size, false, true);
#ifdef PATCH
asmvolatile ("FNINIT");
#endifauto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
intmain() {
double ret = calc(65); // [65, 256]size_t a = 1ull;
double ret_1 = a / ret;
longdouble lret_1 = a / (longdouble)ret;
printf("ret=%lf, 1/ret=%lf, 1/(long double)ret=%Lf\n", ret, ret_1, lret_1);
returnstd::isnan(lret_1);
}
screenshot
screenshot after patch
The code to check that g++-10 is not safe either.
compile g++-10 -std=c++17 -Wall -O2 -march=skylake-avx512 -o poc poc.cc and run ./poc 10
#include <cstdio>#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{1};
auto arg = x86simdsortStatic::argsort(ptr, size, false, true);#ifdef PATCH
// asm volatile ("FNINIT");
asm volatile ("EMMS");#endif
auto ret = ptr[arg[0]];
delete[] ptr;return ret;
}
int main(int argc, const char * argv[]) {
double ret = calc(256); // [65, 256]
size_t a = std::atoll(argv[1]);
double ret_a = a / ret;
long double lret_a = a / (long double)ret;
printf("ret=%lf, %lu/ret=%lf, %lu/(long double)ret=%Lf\n", ret, a, ret_a, a, lret_a);return std::isnan(lret_a);
}
The text was updated successfully, but these errors were encountered:
cmsxbc
changed the title
compile and use argsort/argselect with avx512 by g++9 is not safe.
compile and use argsort/argselect with avx512 by g++9/10 is not safe.
May 27, 2024
thanks for letting us know. Probably good to a add note in the README section to build using -mno-mmx flags when using g++ v9. Would you like to add a PR with that note?
It might not be a bug, but it's better to be a warning for user.
Description
g++-9
with-O2
above and avx512 enabled will compile theargsort_n_vec
(of x86-simd-sort) with some mmx instructions (which use registersMM0~MM7
are ususally alias to x87 ST(0) ~ ST(7)), but won't clear x87 fpu state.So operations on
long double
, which used x87 instructions default, will perform different result (NAN as mainly).The worse thing is that
long double
is used byprime_hashtable_policy
which is the default backend ofunorderd_map
ing++-9
.If we call
unordered_map::rehash
afterargsort/argselect
(ofx86-simd-sort
),rehash(size_t(-1))
was performed, andstd::bad_alloc
was raised.Unfortunately g++-9 is end of life. But if someone can upgrade g++ to newer, this problem will disappered.
Workaround
Any one below can work, but also with their own side effect.
-mno-mmx
-mlong-double-64
or-mlong-double-128
POC
Environment
Code
compile below code with
screenshot
simple patch
compile the code with below, and no std::bad_alloc was thrown.
gdb script to check x87 fpu is used and the state is not cleared
the
ST(0)
value should be1
.screenshot without patch
the
ST(0)
value is-NAN
which means there something wrong with x87 fpu.screenshot after patch
the
ST(0)
value is1
as it expected to be.Further reading, what code is genered by g++9, and where is it in argsort.
source code in argsort_n_vec
x86-simd-sort/src/xss-network-keyvaluesort.hpp
Lines 501 to 506 in 2315766
mmx instructions g++-9 generated.
Another poc without unordered_map
screenshot
screenshot after patch
The code to check that g++-10 is not safe either.
compile
g++-10 -std=c++17 -Wall -O2 -march=skylake-avx512 -o poc poc.cc
and run./poc 10
The text was updated successfully, but these errors were encountered: