Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpuCI] Forward-merge branch-21.10 to branch-21.12 [skip gpuci] #333

Merged
merged 5 commits into from
Sep 22, 2021

Conversation

GPUtester
Copy link
Contributor

Forward-merge triggered by push to branch-21.10 that creates a PR to keep branch-21.12 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

trxcllnt and others added 5 commits August 25, 2021 14:15
Removes `-g` from the compile commands generated by distutils to compile Cython files.

This will make our container images, conda packages, and python wheels smaller.
…y gmem to store intermediate distances (#324)

benchmarking with cuml python interface kNN datasets (it claims to generate gaussian distribution) tried till 200k x 128 database/query vectors.
found some different behavior on my small GPU GP107 vs on GA102(Tesla A40)
on GP107 fused L2 kNN is slower on larger datasets
on GA102 fused L2 kNN is always faster like approx **1.15x-1.5x** for all datasets I tried (except 200k x 128).

I will also have L2 expanded version of fused L2 kNN in a separate PR due to which on larger dimension like > 128 distance computation from fused L2 kNN won't become bottleneck.

There is scope to optimize the distance computation in fused L2 kNN as there is no usage of vectorized LDG/STS in it.

Overall it looks that fused L2 kNN is better on GPUs with decent compute power but not on small old GPUs like GP107.

& benchmarking with cuml cpp kNN regression tests the performance is 
On A30 (GA100) , For NN == 64, resultant Dist matrix 1M x 1M,
Fused L2 kNN = 11550ms
FAISS kNN = 23933 ms.
**Overall 2.07x faster**
And for NN == 32, it is **1.43x faster**
runtimes for NN == 32,
Fused L2 kNN = 11198ms
FAISS kNN = 16124ms

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #324
When we make a new raft version, we need to also bump the rapids-cmake version at the same time. Otherwise we will get the previous releases dependencies by mistake.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #331
@GPUtester GPUtester requested review from a team as code owners September 22, 2021 19:14
@GPUtester GPUtester merged commit eed1a1b into branch-21.12 Sep 22, 2021
@GPUtester
Copy link
Contributor Author

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants