-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconstruct batch of non-sequential IDs #1163
Comments
Should be simple to implement. |
I think this would be quite useful. import faiss
import numpy as np
import time
faiss.omp_set_num_threads(1)
nb_vectors = 100000
dimension = 8
vectors = np.random.rand(nb_vectors, dimension).astype('float32')
flat_index = faiss.IndexFlatIP(8)
flat_index.add(vectors)
N = 10000
start_time = time.perf_counter()
for i in range(N):
flat_index.reconstruct(i)
end_time = time.perf_counter()
ellapsed_time = end_time - start_time
print(f"-> flat reconstruct in {ellapsed_time*1000} ms")
start_time = time.perf_counter()
flat_index.reconstruct_n(0,N)
end_time = time.perf_counter()
ellapsed_time = end_time - start_time
print(f"-> flat reconstruct_n in {ellapsed_time*1000} ms") Result:
Non-sequential ids might be a bit slower than reconstruct_n for a flat index because the memory is not contiguous, but I think it would still be much faster than a loop of reconstruct in python. |
Hello! Any news on this feature request ? Having this method would most probably indeed improve the reconstruction of n non-contiguous embeddings. |
Juste found out there is a method search_and_reconstruct which can be used to search and reconstruct vectors. This method is much faster than first searching nearest neighbors and then calling N times reconstruct.
|
Hey, Thanks! |
Summary: As requested in facebookresearch#1163 add `reconstruct_batch` that calls `reconstruct` in a C++ for loop. Differential Revision: D37717342 fbshipit-source-id: eabd62f7d65590fce9d3397708290e7bdc5a400e
Summary: Pull Request resolved: facebookresearch#2379 As requested in facebookresearch#1163 add `reconstruct_batch` that calls `reconstruct` in a C++ for loop. Reviewed By: alexanderguzhva Differential Revision: D37717342 fbshipit-source-id: 768e94c9304c09d9ae8fb8361a0602c6e2c992dc
Thanks a lot @mdouze ! Much appreciated. Should we expect a release soon, or should we build from sources to use this? |
we plan to release 1.7.3 in sept |
Please, could you add this functionality (batch reconstruct and search_and_reconstruct) with binary indexes too? |
please open a new issue for this, or better implement it as a PR yourself |
I was browsing thru the closed PR and thought it closed without merging. It turns out that the functionality was merged = at least for PQ+IVF indices (different PR?). See: index.reconstruct_batch(ids) |
Platform
Running on:
Interface:
Feature Request
The Index class contains methods for reconstructing a single observation and for reconstructing a sequential (e.g. IDs 101-200). However, there's no method for batch retrieving non-sequential IDs.
This would be a great addition. Right now we have to write a for-loop in Python, making many requests from Python to C++. Simply making a reconstruct method that uses a for-loop in C++ would be a big improvement. Later on, index-specific methods could be implemented to improve performance further if needed.
The text was updated successfully, but these errors were encountered: