Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use matrix::select_k in brute_force::knn call #1463

Merged
merged 3 commits into from
May 17, 2023

Conversation

benfred
Copy link
Member

@benfred benfred commented Apr 25, 2023

No description provided.

@benfred benfred added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 25, 2023
@benfred benfred requested a review from a team as a code owner April 25, 2023 23:51
@benfred benfred marked this pull request as draft April 25, 2023 23:51
@github-actions github-actions bot added the cpp label Apr 25, 2023
@benfred
Copy link
Member Author

benfred commented Apr 26, 2023

While times for small batches improves with this change, overall the performance seems to be quite a bit worse with this change - and this PR shouldn't be merged until after #1455.

Running the neighbours benchmarks (modifying K from 32->128 to avoid hitting the fusedl2 path):

branch-23.06 (uses the faiss blockselect code)

KNN/float/int64_t/brute_force_knn/0/0/0/manual_time        75.2 ms          212 ms            9 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/0/1/0/manual_time        1992 ms         2091 ms            1 2000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/0/2/0/manual_time         273 ms         1260 ms            3 2000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/0/0/manual_time         388 ms         1028 ms            2 10000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/1/0/manual_time        9959 ms        10425 ms            1 10000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/2/0/manual_time        1362 ms         6266 ms            1 10000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/0/0/manual_time        14.0 ms         58.0 ms           50 10000#8192#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/1/0/manual_time         628 ms          663 ms            1 10000#8192#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/2/0/manual_time        47.6 ms          365 ms           15 10000#8192#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/0/0/manual_time       75.0 ms          213 ms            9 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/1/0/manual_time       1991 ms         2090 ms            1 2000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/2/0/manual_time        253 ms         1242 ms            2 2000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/0/0/manual_time        387 ms         1013 ms            2 10000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/1/0/manual_time       9954 ms        10419 ms            1 10000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/2/0/manual_time       1269 ms         6186 ms            1 10000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/0/0/manual_time       14.2 ms         58.6 ms           49 10000#8192#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/1/0/manual_time        628 ms          663 ms            1 10000#8192#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/2/0/manual_time       48.1 ms          366 ms           15 10000#8192#1000#128#MANAGED#BUILD_SEARCH

this branch (using matrix::select_k)

-----------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations
-----------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_knn/0/0/0/manual_time         457 ms          587 ms            2 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/0/1/0/manual_time        2379 ms         2481 ms            1 2000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/0/2/0/manual_time         660 ms         1651 ms            1 2000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/0/0/manual_time        2317 ms         2908 ms            1 10000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/1/0/manual_time       11917 ms        12391 ms            1 10000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/2/0/manual_time        3309 ms         8239 ms            1 10000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/0/0/manual_time        14.1 ms         58.4 ms           50 10000#8192#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/1/0/manual_time         628 ms          663 ms            1 10000#8192#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/2/0/manual_time        48.2 ms          367 ms           14 10000#8192#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/0/0/manual_time       72.6 ms          212 ms           10 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/1/0/manual_time       1990 ms         2092 ms            1 2000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/2/0/manual_time        252 ms         1237 ms            3 2000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/0/0/manual_time        373 ms         1011 ms            2 10000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/1/0/manual_time       9947 ms        10419 ms            1 10000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/2/0/manual_time       1258 ms         6188 ms            1 10000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/0/0/manual_time       14.2 ms         58.6 ms           49 10000#8192#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/1/0/manual_time        628 ms          665 ms            1 10000#8192#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/2/0/manual_time       48.3 ms          367 ms           14 10000#8192#1000#128#MANAGED#BUILD_SEARCH

@cjnolet cjnolet added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 27, 2023
@benfred
Copy link
Member Author

benfred commented Apr 27, 2023

It does seem like the matrix::select_k code might have some performance issues with the int64_t IndexType - which isn't seen with the uint32_t IndexType. As an example, looking at just a single benchmark:

23.06 (faiss block select)

KNN/float/int64_t/brute_force_knn/0/0/0/manual_time        75.2 ms          212 ms            9 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/0/0/manual_time       75.0 ms          213 ms            9 2000000#128#1000#128#NO_COPY#BUILD_SEARCH

this branch (matrix::select_k:

KNN/float/int64_t/brute_force_knn/0/0/0/manual_time         457 ms          587 ms            2 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/0/0/manual_time       72.6 ms          212 ms           10 2000000#128#1000#128#NO_COPY#BUILD_SEARCH

With the faiss block select code - there isn't much difference between using a int64_t or uint32_t IndexType, but with the matrix::select_k code there is.

@benfred
Copy link
Member Author

benfred commented May 17, 2023

Using the new select_k code from #1523 removes the perf regression we've been seeing with using matrix::select_k - and leads to a speedup in the bfknn code instead:

-----------------------------------------------------------------------------------------------
Benchmark                                                     Time             CPU   Iterations
-----------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_knn/0/0/0/manual_time        65.9 ms          177 ms           11 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/0/1/0/manual_time        1982 ms         2058 ms            1 2000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/0/2/0/manual_time         254 ms         1222 ms            3 2000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/0/0/manual_time         340 ms          894 ms            2 10000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/1/0/manual_time        9905 ms        10266 ms            1 10000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/1/2/0/manual_time        1223 ms         6068 ms            1 10000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/0/0/manual_time        13.8 ms         49.3 ms           51 10000#8192#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/1/0/manual_time         628 ms          652 ms            1 10000#8192#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/int64_t/brute_force_knn/2/2/0/manual_time        57.3 ms          368 ms           12 10000#8192#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/0/0/manual_time       64.6 ms          176 ms           11 2000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/1/0/manual_time       1981 ms         2057 ms            1 2000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/0/2/0/manual_time        253 ms         1223 ms            3 2000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/0/0/manual_time        336 ms          892 ms            2 10000000#128#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/1/0/manual_time       9903 ms        10264 ms            1 10000000#128#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/1/2/0/manual_time       1221 ms         6066 ms            1 10000000#128#1000#128#MANAGED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/0/0/manual_time       13.9 ms         49.4 ms           50 10000#8192#1000#128#NO_COPY#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/1/0/manual_time        628 ms          653 ms            1 10000#8192#1000#128#MAP_PINNED#BUILD_SEARCH
KNN/float/uint32_t/brute_force_knn/2/2/0/manual_time       57.7 ms          368 ms           12 10000#8192#1000#128#MANAGED#BUILD_SEARCH

Note: the speedup is even more pronounced with a smaller batch size (which isn't being benchmarked here).

@cjnolet cjnolet removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 17, 2023
@github-actions github-actions bot added the CMake label May 17, 2023
@benfred benfred marked this pull request as ready for review May 17, 2023 19:42
@benfred benfred requested a review from a team as a code owner May 17, 2023 19:42
@benfred
Copy link
Member Author

benfred commented May 17, 2023

/merge

@rapids-bot rapids-bot bot merged commit 6fdb041 into rapidsai:branch-23.06 May 17, 2023
@benfred benfred deleted the bfknn_select_k branch May 18, 2023 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

2 participants