-
-
Notifications
You must be signed in to change notification settings - Fork 69
Description
It might not be a bug, but it's better to be a warning for user.
Description
g++-9 with -O2 above and avx512 enabled will compile the argsort_n_vec(of x86-simd-sort) with some mmx instructions (which use registers MM0~MM7 are ususally alias to x87 ST(0) ~ ST(7)), but won't clear x87 fpu state.
So operations on long double, which used x87 instructions default, will perform different result (NAN as mainly).
The worse thing is that long double is used by prime_hashtable_policy which is the default backend of unorderd_map in g++-9.
If we call unordered_map::rehash after argsort/argselect(of x86-simd-sort), rehash(size_t(-1)) was performed, and std::bad_alloc was raised.
Unfortunately g++-9 is end of life. But if someone can upgrade g++ to newer, this problem will disappered.
Workaround
Any one below can work, but also with their own side effect.
- compile with
-mno-mmx - make long double not to use x87 with
-mlong-double-64or-mlong-double-128
POC
Environment
- os: ubuntu 18.04/20.04 with gcc-9.4 or ubuntu 22.04/24.04 with gcc-9.5 or gcc 9.5 docker offical image (it shall not be an os case.)
- cpu: Intel 13700H with sde, Intel 11700B, AMD 7950x, Intel Gold Xeon 6145. ( it shall not be a hardware case.)
Code
compile below code with
g++-9 -std=c++17 -Wall -O2 -march=skylake-avx512 -o poc poc.cc// filename: poc.cc
#include <unordered_map>
#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{0};
auto arg = x86simdsortStatic::argselect(ptr, 0, size);
#ifdef PATCH
asm volatile ("FNINIT"); // clear x87 fpu state.
#endif
auto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
int main() {
std::unordered_map<size_t, size_t> amap;
amap.clear();
calc(65); // [65, 256]
amap.rehash(1); // <- crash here
return 0;
}screenshot
simple patch
compile the code with below, and no std::bad_alloc was thrown.
g++-9 -std=c++17 -Wall -O2 -DPATCH -march=skylake-avx512 -o poc poc.ccgdb script to check x87 fpu is used and the state is not cleared
the ST(0) value should be 1.
gdb \
-ex "b *$(objdump -d --prefix-address poc | grep fildll | awk '{$2 = substr($2, 2, length($2)-2); print $2}')" \
-ex 'r' \
-ex 'print "ST(0) before do size_t to long double convert" ' \
-ex 'print $st0' \
-ex 'print "The size_t value to be converted" ' \
-ex 'print *(size_t *)($rsp+0x8)' \
-ex 'print "ST(0) after do size_t to long double convert" ' \
-ex 'si' \
-ex 'print $st0' \
--batch --silent ./poc | grep '\$'screenshot without patch
the ST(0) value is -NAN which means there something wrong with x87 fpu.
screenshot after patch
the ST(0) value is 1 as it expected to be.
Further reading, what code is genered by g++9, and where is it in argsort.
source code in argsort_n_vec
mmx instructions g++-9 generated.
1bad: 0f ef ed pxor %mm5,%mm5
1bc9: 0f ef f6 pxor %mm6,%mm6
1c09: 0f ef e4 pxor %mm4,%mm4
1c41: 0f ef db pxor %mm3,%mm3
1c71: 0f ef ff pxor %mm7,%mm7
1c85: 0f ef c9 pxor %mm1,%mm1
1c98: 0f ef c0 pxor %mm0,%mm0
1ccf: 0f ef d2 pxor %mm2,%mm2
db88: 0f ef c0 pxor %mm0,%mm0
db9b: 0f ef db pxor %mm3,%mm3
dbae: 0f ef d2 pxor %mm2,%mm2
dbc0: 0f ef c9 pxor %mm1,%mm1
Another poc without unordered_map
#include <cstdio>
#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{1};
auto arg = x86simdsortStatic::argsort(ptr, size, false, true);
#ifdef PATCH
asm volatile ("FNINIT");
#endif
auto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
int main() {
double ret = calc(65); // [65, 256]
size_t a = 1ull;
double ret_1 = a / ret;
long double lret_1 = a / (long double)ret;
printf("ret=%lf, 1/ret=%lf, 1/(long double)ret=%Lf\n", ret, ret_1, lret_1);
return std::isnan(lret_1);
}screenshot
screenshot after patch
The code to check that g++-10 is not safe either.
compile g++-10 -std=c++17 -Wall -O2 -march=skylake-avx512 -o poc poc.cc and run ./poc 10
#include <cstdio>
#include "x86-simd-sort/src/x86simdsort-static-incl.h"
double calc(size_t size) {
auto ptr = new double[size]{1};
auto arg = x86simdsortStatic::argsort(ptr, size, false, true);
#ifdef PATCH
// asm volatile ("FNINIT");
asm volatile ("EMMS");
#endif
auto ret = ptr[arg[0]];
delete[] ptr;
return ret;
}
int main(int argc, const char * argv[]) {
double ret = calc(256); // [65, 256]
size_t a = std::atoll(argv[1]);
double ret_a = a / ret;
long double lret_a = a / (long double)ret;
printf("ret=%lf, %lu/ret=%lf, %lu/(long double)ret=%Lf\n", ret, a, ret_a, a, lret_a);
return std::isnan(lret_a);
}



