IndexFlatL2 multithread is slower than single thread #3570
Replies: 5 comments
-
Please install Faiss with conda to make sure that the proper MKL version is installed. |
Beta Was this translation helpful? Give feedback.
-
It tried out that nthread = nb cores/2 works good for me on another server which has 16 amd processors (both training and query). Thank you so much && I wonder why the performance is bad with nthread = nb cores :-) |
Beta Was this translation helpful? Give feedback.
-
@RongchunYao the performance is likely bad because of the hyper-threading. As you know, typically the hyper-threading is about having two virtual CPU cores sharing the same compute resources of a single real core. And such a sharing is not efficient for linear-algebra ops within Faiss. So, by specifying "nthread = nb codes / 2" you make sure that there's no fight among two virtual CPU cores. |
Beta Was this translation helpful? Give feedback.
-
Thank you! |
Beta Was this translation helpful? Give feedback.
-
Hi, I recently run faiss with openblas that compiled with omp, and I set the omp thread to 32. I run the jobs in batch on some computing platform, most machines gain great acceleration, but some machine runs very slow (each machine has similar I wonder the potential reasons, could the tasks submited to the machine by other users be a great influence factor? |
Beta Was this translation helpful? Give feedback.
-
python faiss-cpu 1.7.4 installed with pip3.x
Multithread performance is pool on my 32-processor machine
model name : Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
************ nthread= 1
*********** nq= 100
========== d= 16
dataset in dimension 16, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=1.393 ms (± 0.1564)
search k= 10 t=2.679 ms (± 0.0422)
search k=100 t=6.473 ms (± 0.4788)
========== d= 32
dataset in dimension 32, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=11.656 ms (± 23.1539)
search k= 10 t=3.664 ms (± 0.4651)
search k=100 t=6.653 ms (± 0.6943)
========== d= 64
dataset in dimension 64, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=4.447 ms (± 0.4957)
search k= 10 t=4.460 ms (± 0.0903)
search k=100 t=8.210 ms (± 0.8620)
========== d= 128
dataset in dimension 128, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=7.682 ms (± 1.1851)
search k= 10 t=8.133 ms (± 1.1031)
search k=100 t=10.987 ms (± 1.5985)
restab=
1.39302 2.67902 6.4728
11.6563 3.66396 6.65313
4.44698 4.45956 8.20962
7.68209 8.13305 10.9866
*********** nq= 10000
========== d= 16
dataset in dimension 16, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.080 s (± 0.0044)
search k= 10 t=0.257 s (± 0.0085)
search k=100 t=0.564 s (± 0.0193)
========== d= 32
dataset in dimension 32, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.259 s (± 0.0097)
search k= 10 t=0.321 s (± 0.0092)
search k=100 t=0.635 s (± 0.0237)
========== d= 64
dataset in dimension 64, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.368 s (± 0.0306)
search k= 10 t=0.410 s (± 0.0379)
search k=100 t=0.681 s (± 0.0412)
========== d= 128
dataset in dimension 128, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.599 s (± 0.0144)
search k= 10 t=0.645 s (± 0.0107)
search k=100 t=0.921 s (± 0.0569)
restab=
0.0801447 0.257458 0.56392
0.259316 0.321337 0.635152
0.368472 0.410237 0.680965
0.599093 0.644711 0.921228
************ nthread= 32
*********** nq= 100
========== d= 16
dataset in dimension 16, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=12.850 ms (± 7.3587)
search k= 10 t=326.201 ms (± 9.8362)
search k=100 t=331.151 ms (± 16.7528)
========== d= 32
dataset in dimension 32, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=181.012 ms (± 20.5017)
search k= 10 t=325.893 ms (± 12.7326)
search k=100 t=325.874 ms (± 24.1845)
========== d= 64
dataset in dimension 64, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=181.696 ms (± 14.6625)
search k= 10 t=329.945 ms (± 17.0235)
search k=100 t=329.392 ms (± 14.8352)
========== d= 128
dataset in dimension 128, with metric L2, size: Q 100 B 10000 T 0
search k= 1 t=176.828 ms (± 9.2367)
search k= 10 t=326.336 ms (± 16.2117)
search k=100 t=325.248 ms (± 13.9408)
restab=
12.8498 326.201 331.151
181.012 325.893 325.874
181.696 329.945 329.392
176.828 326.336 325.248
*********** nq= 10000
========== d= 16
dataset in dimension 16, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.027 s (± 0.0119)
search k= 10 t=0.980 s (± 0.0149)
search k=100 t=1.029 s (± 0.0168)
========== d= 32
dataset in dimension 32, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.524 s (± 0.0138)
search k= 10 t=0.986 s (± 0.0122)
search k=100 t=1.066 s (± 0.0379)
========== d= 64
dataset in dimension 64, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.572 s (± 0.0328)
search k= 10 t=0.999 s (± 0.0171)
search k=100 t=1.090 s (± 0.0780)
========== d= 128
dataset in dimension 128, with metric L2, size: Q 10000 B 10000 T 0
search k= 1 t=0.721 s (± 0.0103)
search k= 10 t=1.059 s (± 0.0262)
search k=100 t=1.147 s (± 0.0235)
restab=
0.0267251 0.979833 1.02869
0.523988 0.985733 1.0658
0.571997 0.999151 1.09039
0.721175 1.05897 1.14676
Reproduction instructions
bench_index_flat.py
I modified faiss.cvar.distance_compute_min_k_reservoir from 5 to 100
Beta Was this translation helpful? Give feedback.
All reactions