Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX-512 implementation for the distance and scalar quantizer functions. #3853

Closed
wants to merge 4 commits into from

Conversation

mulugetam
Copy link
Contributor

The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations.

While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for HNSW::MinimaxHeap::pop_min, which is great.

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@akashsha1
Copy link

@mnorris11 , @junjieqi , @naveentatikonda - could you help add necessary reviewers to this PR? Since this is our first contribution to faiss, we're not sure who are the right set of reviewers for this change. thanks.

@mnorris11 mnorris11 self-requested a review September 13, 2024 18:33
@mengdilin mengdilin self-requested a review September 13, 2024 19:02
@mengdilin
Copy link
Contributor

Curious have you benchmarked this change? If so, do you have any performance numbers?

It looks like there are errors for avx512's scalar quantizer test suite: https://github.com/facebookresearch/faiss/actions/runs/10838170232/job/30084638332?pr=3853 and the change broke aarch64 compilation in https://github.com/facebookresearch/faiss/actions/runs/10838170232/job/30084638104?pr=3853

Can you try fixing those?

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
__m512 half = _mm512_set1_ps(0.5f);
f16 = _mm512_add_ps(f16, half);
__m512 one_15 = _mm512_set1_ps(1.f / 15.f);
return _mm512_mul_ps(f16, one_15);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use _mm512_fmadd_ps() instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

#ifdef __AVX512F__
static FAISS_ALWAYS_INLINE __m512
decode_16_components(const uint8_t* code, int i) {
__m256 v0 = decode_8_components(code, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement 16 components instead of 2x8.
Alternatively, please refer to https://github.com/zilliztech/knowhere/blob/main/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_avx512.h , which has it all implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@facebook-github-bot
Copy link
Contributor

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mengdilin mengdilin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Need two minor changes to fix Meta's compilation error from unused variables.

};

#endif

#ifdef __AVX2__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when AVX512 is defined, AVX2 is also present in the compiler macro. This leads to unused variable errors for Meta's CI. Can you change this to

#elif defined(__AVX2__)

to silence the error? This should mirror the if avx512 elif avx2 logic in get_distance_computer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

};

#endif

#ifdef __AVX2__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do the same #elif defined(__AVX2__) here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@mengdilin
Copy link
Contributor

mengdilin commented Sep 19, 2024

Running microbenchmarking for the distance computation across different SQ types for dimension 128 in a single-threaded environment with input size of 2000, we see a average of 40% cpu time improvement :D with the exception of QT_8bit_direct where we see a 60% cpu regression on avx512 most likely due to how we dispatch between avx512 and avx2 (if dimension is not a multiple of 32 but a multiple of 16, we fall back to the slow path in avx512 instead of dispatching to avx2). That issue is not blocking, we can patch this up later on our end

Signed-off-by: Mulugeta Mammo <mulugeta.mammo@intel.com>
@facebook-github-bot
Copy link
Contributor

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mengdilin
Copy link
Contributor

@alexanderguzhva let me know if there is any other concern with the PR. I cross-referenced the changes here with https://github.com/zilliztech/knowhere/blob/main/thirdparty/faiss/faiss/impl/ScalarQuantizerCodec_avx512.h and they match. The changes in faiss/utils/distances_simd.cpp have existing tests e.g. test_ivfpq_indexing and test_distances_simd. If not, I will merge it

@facebook-github-bot
Copy link
Contributor

@mengdilin merged this pull request in 4eecd91.

facebook-github-bot pushed a commit that referenced this pull request Sep 21, 2024
Summary:
#3870 conflicted with changes in #3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :(

=== Test Plan ===
Tested on a local machine and compilation and C++ tests worked
CI for AVX512 and conda build should succeed

Pull Request resolved: #3880

Reviewed By: junjieqi

Differential Revision: D63156374

Pulled By: mengdilin

fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9
aalekhpatel07 pushed a commit to aalekhpatel07/faiss that referenced this pull request Oct 17, 2024
…tions. (facebookresearch#3853)

Summary:
The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations.

While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for [HNSW::MinimaxHeap::pop_min](https://github.com/facebookresearch/faiss/blob/a166e13a25b2a5fe46adce4d7d06677d5199e598/faiss/impl/HNSW.cpp#L1176-L1265), which is great.

Pull Request resolved: facebookresearch#3853

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Top of the stack D62993711 is green

Reviewed By: asadoughi

Differential Revision: D62989543

Pulled By: mengdilin

fbshipit-source-id: 913403fadbfc512d195fe3411ee761d8ad025245
aalekhpatel07 pushed a commit to aalekhpatel07/faiss that referenced this pull request Oct 17, 2024
Summary:
facebookresearch#3870 conflicted with changes in facebookresearch#3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :(

=== Test Plan ===
Tested on a local machine and compilation and C++ tests worked
CI for AVX512 and conda build should succeed

Pull Request resolved: facebookresearch#3880

Reviewed By: junjieqi

Differential Revision: D63156374

Pulled By: mengdilin

fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants