-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AVX-512 implementation for the distance and scalar quantizer functions. #3853
Conversation
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@mnorris11 , @junjieqi , @naveentatikonda - could you help add necessary reviewers to this PR? Since this is our first contribution to faiss, we're not sure who are the right set of reviewers for this change. thanks. |
Curious have you benchmarked this change? If so, do you have any performance numbers? It looks like there are errors for avx512's scalar quantizer test suite: https://github.com/facebookresearch/faiss/actions/runs/10838170232/job/30084638332?pr=3853 and the change broke aarch64 compilation in https://github.com/facebookresearch/faiss/actions/runs/10838170232/job/30084638104?pr=3853 Can you try fixing those? |
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
faiss/impl/ScalarQuantizer.cpp
Outdated
__m512 half = _mm512_set1_ps(0.5f); | ||
f16 = _mm512_add_ps(f16, half); | ||
__m512 one_15 = _mm512_set1_ps(1.f / 15.f); | ||
return _mm512_mul_ps(f16, one_15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use _mm512_fmadd_ps()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
faiss/impl/ScalarQuantizer.cpp
Outdated
#ifdef __AVX512F__ | ||
static FAISS_ALWAYS_INLINE __m512 | ||
decode_16_components(const uint8_t* code, int i) { | ||
__m256 v0 = decode_8_components(code, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please implement 16 components instead of 2x8.
Alternatively, please refer to https://github.com/zilliztech/knowhere/blob/main/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_avx512.h , which has it all implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Need two minor changes to fix Meta's compilation error from unused variables.
faiss/impl/ScalarQuantizer.cpp
Outdated
}; | ||
|
||
#endif | ||
|
||
#ifdef __AVX2__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when AVX512 is defined, AVX2 is also present in the compiler macro. This leads to unused variable errors for Meta's CI. Can you change this to
#elif defined(__AVX2__)
to silence the error? This should mirror the if avx512 elif avx2
logic in get_distance_computer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
faiss/impl/ScalarQuantizer.cpp
Outdated
}; | ||
|
||
#endif | ||
|
||
#ifdef __AVX2__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to do the same #elif defined(__AVX2__)
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Running microbenchmarking for the distance computation across different SQ types for dimension 128 in a single-threaded environment with input size of 2000, we see a average of 40% cpu time improvement :D with the exception of |
Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@alexanderguzhva let me know if there is any other concern with the PR. I cross-referenced the changes here with https://github.com/zilliztech/knowhere/blob/main/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_avx512.h and they match. The changes in |
@mengdilin merged this pull request in 4eecd91. |
Summary: #3870 conflicted with changes in #3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :( === Test Plan === Tested on a local machine and compilation and C++ tests worked CI for AVX512 and conda build should succeed Pull Request resolved: #3880 Reviewed By: junjieqi Differential Revision: D63156374 Pulled By: mengdilin fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9
…tions. (facebookresearch#3853) Summary: The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations. While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for [HNSW::MinimaxHeap::pop_min](https://github.com/facebookresearch/faiss/blob/a166e13a25b2a5fe46adce4d7d06677d5199e598/faiss/impl/HNSW.cpp#L1176-L1265), which is great. Pull Request resolved: facebookresearch#3853 Test Plan: Imported from GitHub, without a `Test Plan:` line. Top of the stack D62993711 is green Reviewed By: asadoughi Differential Revision: D62989543 Pulled By: mengdilin fbshipit-source-id: 913403fadbfc512d195fe3411ee761d8ad025245
Summary: facebookresearch#3870 conflicted with changes in facebookresearch#3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :( === Test Plan === Tested on a local machine and compilation and C++ tests worked CI for AVX512 and conda build should succeed Pull Request resolved: facebookresearch#3880 Reviewed By: junjieqi Differential Revision: D63156374 Pulled By: mengdilin fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9
The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations.
While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for HNSW::MinimaxHeap::pop_min, which is great.