Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate Nearest Neighbors #2780

Merged

Conversation

viclafargue
Copy link
Contributor

This PR will enable the usage of multiple KNN strategies as alternatives to the current default bruteforce method. See #574

@viclafargue viclafargue requested review from a team as code owners September 1, 2020 17:25
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm super excited to start supporting approximate algorithms in cuml. Just took a quick look and added some initial feedback

cpp/include/cuml/neighbors/knn.hpp Show resolved Hide resolved
cpp/src_prims/selection/knn.cuh Outdated Show resolved Hide resolved
cpp/src_prims/selection/knn.cuh Outdated Show resolved Hide resolved
cpp/include/cuml/neighbors/knn.hpp Outdated Show resolved Hide resolved
python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
@viclafargue viclafargue force-pushed the fea-multiple-knn-strategies branch from 22d9ffa to 132acab Compare September 3, 2020 15:38
@cjnolet
Copy link
Member

cjnolet commented Sep 14, 2020

@WXBN have you collected any timings on these indices vs the brute force indices?

@JohnZed JohnZed changed the title [WIP] Multiple KNN strategies [REVIEW] Multiple KNN strategies Sep 14, 2020
@JohnZed JohnZed added the 3 - Ready for Review Ready for review by team label Sep 14, 2020
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good so far. I did a first-pass through the code.

python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
python/cuml/neighbors/__init__.py Show resolved Hide resolved
python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
python/cuml/neighbors/nearest_neighbors.pyx Outdated Show resolved Hide resolved
python/cuml/test/test_nearest_neighbors.py Outdated Show resolved Hide resolved
python/cuml/test/test_nearest_neighbors.py Outdated Show resolved Hide resolved
python/cuml/test/test_nearest_neighbors.py Outdated Show resolved Hide resolved
@@ -36,36 +38,87 @@ enum MetricType {
METRIC_Correlation
};

struct knnIndex {
faiss::gpu::StandardGpuResources *gpu_res;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either before or shortly after this PR is merged, we need to update FAISS in cuML and use their new pluggable memory manager feature (facebookresearch/faiss#1203). While the brute-force computation uses only a very small workspace, the approximate Index variants put FAISS in complete control of the memory space of the index (through the StandardGPUResources).

@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Sep 16, 2020
@viclafargue
Copy link
Contributor Author

Sorry, I forgot about this one. Sure it's probably preferable to defer it to 0.18. Also, I'll have to check again the automatic choice of parameters and maybe move it to Python code.

@viclafargue viclafargue changed the base branch from branch-0.17 to branch-0.18 December 8, 2020 10:28
@viclafargue viclafargue added breaking Breaking change feature request New feature or request labels Dec 8, 2020
@viclafargue viclafargue changed the title [REVIEW] Multiple KNN strategies [REVIEW] Approximate Nearest Neighbors Dec 8, 2020
@viclafargue
Copy link
Contributor Author

The code was updated to current branch-0.18. Automated parameters determinations are now done from the Python layer of the code. It's easier for maintainability and testing. I also modified the automatic determination of ivfpq parameters to prioritize speed over accuracy. It's still passing tests and much faster.

@cjnolet
Copy link
Member

cjnolet commented Dec 8, 2020

@viclafargue, it looks like CI ran out of space in the tests. The tests also look pretty exhautive in terms of parameters. Any opportunities to prune the parameter grid?

14:19:05 cuml/test/test_nearest_neighbors.py::test_ivfpq_pred[8-512-10000-False-4-32-8] Faiss assertion 'err == cudaSuccess' failed in void faiss::gpu::freeMemorySpace(faiss::gpu::MemorySpace, void*) at gpu/utils/MemorySpace.cpp:72; details: Failed to cudaFree pointer 0x7f41829fe200 (error 700 an illegal memory access was encountered)

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet cjnolet added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Author Waiting for author to respond to review labels Dec 11, 2020
@cjnolet cjnolet changed the title [REVIEW] Approximate Nearest Neighbors Approximate Nearest Neighbors Dec 11, 2020
@cjnolet
Copy link
Member

cjnolet commented Dec 11, 2020

@viclafargue, I think the changelog is automatic now. Can you remove the changelog update?

@cjnolet
Copy link
Member

cjnolet commented Dec 11, 2020

Talked to Victor and offered to make the changelog update so we can get this in.

@cjnolet cjnolet added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 4 - Waiting on Author Waiting for author to respond to review 5 - Ready to Merge Testing and reviews complete, ready to merge labels Dec 11, 2020
@ajschmidt8
Copy link
Member

rerun tests

@rapids-bot rapids-bot bot merged commit bd43f32 into rapidsai:branch-0.18 Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change CUDA / C++ CUDA issue feature request New feature or request New Algorithm For tracking new algorithms that will be added to our existing collection New Prim For tracking new prims that will be added to our existing collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants