-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HNSW indexing much slower on GPU than CPU #1348
Comments
FYI @thomwolf I ran into this behavior when trying the FAISS indexing of embeddings in an nlp.Dataset. |
Actually I just noticed that |
So I did
I re-ran the CPU and GPU tests above and observing:
|
HNSW is not implemented on gpu, so |
Ha! That will explain it :) As you say I see a cloning call. I am not sure if I am reading the execution flow correctly, but would not the ensuing cloning op exhibit unexpected recursion? |
no, the storage object is a separate sub-index that actually contains the vector data (e.g., an IndexFlat). |
Summary
Indexing numpy array in vanilla HNSWFlat index works linearly on CPU (about 1000 vectors/second) but the same procedure collapses polynomially on GPU (over 20 minutes to index 3000 vectors).
Platform
OS: Linux x86_64
Faiss version: 1.6.3
Cuda version: 10.1
Pytorch version: 1.6.0-cuda101
Faiss compilation options: unknown, binary distribution
Running on:
Interface:
Reproduction instructions
4 GPUs available (P100 on GCP Compute VM)
CPU test
On CPU all is well, indexing scales linearly with the numbers of vectors to index.
single GPU test
On GPU indexing is intractable as it scales polynomially with the numbers of vectors to index.
Multi-GPU tets
Roughly similar performance/collapsing on multi-GPU.
The text was updated successfully, but these errors were encountered: