Add AVX-512 implementation for the distance and scalar quantizer functions. #3853

mulugetam · 2024-09-12T20:21:53Z

The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations.

While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for HNSW::MinimaxHeap::pop_min, which is great.

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

akashsha1 · 2024-09-13T18:30:28Z

@mnorris11 , @junjieqi , @naveentatikonda - could you help add necessary reviewers to this PR? Since this is our first contribution to faiss, we're not sure who are the right set of reviewers for this change. thanks.

mengdilin · 2024-09-13T19:07:07Z

Curious have you benchmarked this change? If so, do you have any performance numbers?

It looks like there are errors for avx512's scalar quantizer test suite: https://github.com/facebookresearch/faiss/actions/runs/10838170232/job/30084638332?pr=3853 and the change broke aarch64 compilation in https://github.com/facebookresearch/faiss/actions/runs/10838170232/job/30084638104?pr=3853

Can you try fixing those?

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

alexanderguzhva · 2024-09-15T13:42:39Z

faiss/impl/ScalarQuantizer.cpp

+        __m512 half = _mm512_set1_ps(0.5f);
+        f16 = _mm512_add_ps(f16, half);
+        __m512 one_15 = _mm512_set1_ps(1.f / 15.f);
+        return _mm512_mul_ps(f16, one_15);


use _mm512_fmadd_ps() instead

alexanderguzhva · 2024-09-15T13:43:19Z

faiss/impl/ScalarQuantizer.cpp

+#ifdef __AVX512F__
+    static FAISS_ALWAYS_INLINE __m512
+    decode_16_components(const uint8_t* code, int i) {
+        __m256 v0 = decode_8_components(code, i);


Please implement 16 components instead of 2x8.
Alternatively, please refer to https://github.com/zilliztech/knowhere/blob/main/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_avx512.h , which has it all implemented

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

facebook-github-bot · 2024-09-18T21:57:40Z

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mengdilin

Thanks for working on this! Need two minor changes to fix Meta's compilation error from unused variables.

mengdilin · 2024-09-18T22:53:27Z

faiss/impl/ScalarQuantizer.cpp

+};
+
+#endif
+
 #ifdef __AVX2__


when AVX512 is defined, AVX2 is also present in the compiler macro. This leads to unused variable errors for Meta's CI. Can you change this to

#elif defined(__AVX2__)

to silence the error? This should mirror the if avx512 elif avx2 logic in get_distance_computer

mengdilin · 2024-09-18T22:54:03Z

faiss/impl/ScalarQuantizer.cpp

+};
+
+#endif
+
 #ifdef __AVX2__


Need to do the same #elif defined(__AVX2__) here

mengdilin · 2024-09-19T00:32:51Z

Running microbenchmarking for the distance computation across different SQ types for dimension 128 in a single-threaded environment with input size of 2000, we see a average of 40% cpu time improvement :D with the exception of QT_8bit_direct where we see a 60% cpu regression on avx512 most likely due to how we dispatch between avx512 and avx2 (if dimension is not a multiple of 32 but a multiple of 16, we fall back to the slow path in avx512 instead of dispatching to avx2). That issue is not blocking, we can patch this up later on our end

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

facebook-github-bot · 2024-09-19T15:53:00Z

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mengdilin · 2024-09-19T18:29:51Z

@alexanderguzhva let me know if there is any other concern with the PR. I cross-referenced the changes here with https://github.com/zilliztech/knowhere/blob/main/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_avx512.h and they match. The changes in faiss/utils/distances_simd.cpp have existing tests e.g. test_ivfpq_indexing and test_distances_simd. If not, I will merge it

facebook-github-bot · 2024-09-20T21:30:14Z

@mengdilin merged this pull request in 4eecd91.

Summary: #3870 conflicted with changes in #3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :( === Test Plan === Tested on a local machine and compilation and C++ tests worked CI for AVX512 and conda build should succeed Pull Request resolved: #3880 Reviewed By: junjieqi Differential Revision: D63156374 Pulled By: mengdilin fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9

…tions. (facebookresearch#3853) Summary: The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations. While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for [HNSW::MinimaxHeap::pop_min](https://github.com/facebookresearch/faiss/blob/a166e13a25b2a5fe46adce4d7d06677d5199e598/faiss/impl/HNSW.cpp#L1176-L1265), which is great. Pull Request resolved: facebookresearch#3853 Test Plan: Imported from GitHub, without a `Test Plan:` line. Top of the stack D62993711 is green Reviewed By: asadoughi Differential Revision: D62989543 Pulled By: mengdilin fbshipit-source-id: 913403fadbfc512d195fe3411ee761d8ad025245

Summary: facebookresearch#3870 conflicted with changes in facebookresearch#3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :( === Test Plan === Tested on a local machine and compilation and C++ tests worked CI for AVX512 and conda build should succeed Pull Request resolved: facebookresearch#3880 Reviewed By: junjieqi Differential Revision: D63156374 Pulled By: mengdilin fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9

Add AVX-512 implementation for distance and scalar quantizer.

56bb1bb

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

facebook-github-bot added the CLA Signed label Sep 12, 2024

mnorris11 added the Platform label Sep 12, 2024

junjieqi added Performance and removed Platform labels Sep 12, 2024

mnorris11 self-requested a review September 13, 2024 18:33

mengdilin self-requested a review September 13, 2024 19:02

Fix bugs in scalar quantizer and aarch64 arch compilation.

653e9cc

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

alexanderguzhva reviewed Sep 15, 2024

View reviewed changes

Use FMA and pure avx-512 implementation for decode_16_components.

466db5d

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

naveentatikonda mentioned this pull request Sep 18, 2024

Add changes for AVX-512 support in k-NN. opensearch-project/k-NN#2110

Merged

5 tasks

mengdilin requested changes Sep 18, 2024

View reviewed changes

Remove unused avx-2 functions when compiled for avx-512.

1abfb8d

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>

facebook-github-bot closed this in 4eecd91 Sep 20, 2024

facebook-github-bot added the Merged label Sep 20, 2024

mengdilin mentioned this pull request Sep 20, 2024

[faiss] Fix CI 2.0: Compile SQ for avx512 #3880

Closed

mulugetam mentioned this pull request Nov 7, 2024

Use _mm512_popcnt_epi64 to speedup hamming distance evaluation. #4020

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVX-512 implementation for the distance and scalar quantizer functions. #3853

Add AVX-512 implementation for the distance and scalar quantizer functions. #3853

mulugetam commented Sep 12, 2024

akashsha1 commented Sep 13, 2024

mengdilin commented Sep 13, 2024

alexanderguzhva Sep 15, 2024

mulugetam Sep 16, 2024

alexanderguzhva Sep 15, 2024

mulugetam Sep 16, 2024

facebook-github-bot commented Sep 18, 2024

mengdilin left a comment

mengdilin Sep 18, 2024

mulugetam Sep 19, 2024

mengdilin Sep 18, 2024

mulugetam Sep 19, 2024

mengdilin commented Sep 19, 2024 •

edited

Loading

facebook-github-bot commented Sep 19, 2024

mengdilin commented Sep 19, 2024

facebook-github-bot commented Sep 20, 2024

Add AVX-512 implementation for the distance and scalar quantizer functions. #3853

Add AVX-512 implementation for the distance and scalar quantizer functions. #3853

Conversation

mulugetam commented Sep 12, 2024

akashsha1 commented Sep 13, 2024

mengdilin commented Sep 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 18, 2024

mengdilin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mengdilin commented Sep 19, 2024 • edited Loading

facebook-github-bot commented Sep 19, 2024

mengdilin commented Sep 19, 2024

facebook-github-bot commented Sep 20, 2024

mengdilin commented Sep 19, 2024 •

edited

Loading