Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruct batch of non-sequential IDs #1163

Closed
2 of 4 tasks
bfelbo opened this issue Mar 27, 2020 · 10 comments
Closed
2 of 4 tasks

Reconstruct batch of non-sequential IDs #1163

bfelbo opened this issue Mar 27, 2020 · 10 comments

Comments

@bfelbo
Copy link

bfelbo commented Mar 27, 2020

Platform

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Feature Request

The Index class contains methods for reconstructing a single observation and for reconstructing a sequential (e.g. IDs 101-200). However, there's no method for batch retrieving non-sequential IDs.

This would be a great addition. Right now we have to write a for-loop in Python, making many requests from Python to C++. Simply making a reconstruct method that uses a for-loop in C++ would be a big improvement. Later on, index-specific methods could be implemented to improve performance further if needed.

@mdouze
Copy link
Contributor

mdouze commented Mar 30, 2020

Should be simple to implement.

@rom1504
Copy link

rom1504 commented Feb 2, 2021

I think this would be quite useful.
Here's a benchmark showing that reconstruct_n is much faster than reconstruct https://colab.research.google.com/drive/1EpJmlrY2i6DngHc4Ok2jhb4oNZEdavcE?usp=sharing

import faiss
import numpy as np
import time

faiss.omp_set_num_threads(1)
nb_vectors = 100000
dimension = 8
vectors = np.random.rand(nb_vectors, dimension).astype('float32')

flat_index = faiss.IndexFlatIP(8)
flat_index.add(vectors)

N = 10000
start_time = time.perf_counter()
for i in range(N):
  flat_index.reconstruct(i)
end_time = time.perf_counter()
ellapsed_time = end_time - start_time

print(f"-> flat reconstruct in {ellapsed_time*1000} ms")

start_time = time.perf_counter()
flat_index.reconstruct_n(0,N)
end_time = time.perf_counter()
ellapsed_time = end_time - start_time

print(f"-> flat reconstruct_n in {ellapsed_time*1000} ms")

Result:

-> flat reconstruct in 25.576860000001034 ms
-> flat reconstruct_n in 0.5635439999878145 ms

Non-sequential ids might be a bit slower than reconstruct_n for a flat index because the memory is not contiguous, but I think it would still be much faster than a loop of reconstruct in python.

@nateagr
Copy link

nateagr commented Jun 2, 2022

Hello!

Any news on this feature request ? Having this method would most probably indeed improve the reconstruction of n non-contiguous embeddings.

@nateagr
Copy link

nateagr commented Jun 2, 2022

Juste found out there is a method search_and_reconstruct which can be used to search and reconstruct vectors. This method is much faster than first searching nearest neighbors and then calling N times reconstruct.
Just to provide a quick comparison, given a simple Flat IVF, searching and reconstructing the 200k nearest neighbors:

  • Calling search and then calling 200000 times reconstruct takes 45 secs
  • Calling search_and_reconstruct takes 1.5 secs

@urialon
Copy link

urialon commented Jul 1, 2022

Hey,
Any news regarding this feature? A batch_reconstruct would really help me as well, to speed up the implementation of our ICML paper: https://arxiv.org/pdf/2201.12431

Thanks!

mdouze added a commit to mdouze/faiss that referenced this issue Jul 8, 2022
Summary:
As requested in facebookresearch#1163
add `reconstruct_batch` that calls `reconstruct` in a C++ for loop.

Differential Revision: D37717342

fbshipit-source-id: eabd62f7d65590fce9d3397708290e7bdc5a400e
mdouze added a commit to mdouze/faiss that referenced this issue Jul 18, 2022
Summary:
Pull Request resolved: facebookresearch#2379

As requested in facebookresearch#1163
add `reconstruct_batch` that calls `reconstruct` in a C++ for loop.

Reviewed By: alexanderguzhva

Differential Revision: D37717342

fbshipit-source-id: 768e94c9304c09d9ae8fb8361a0602c6e2c992dc
facebook-github-bot pushed a commit that referenced this issue Jul 18, 2022
Summary:
Pull Request resolved: #2379

As requested in #1163
add `reconstruct_batch` that calls `reconstruct` in a C++ for loop.

Reviewed By: alexanderguzhva

Differential Revision: D37717342

fbshipit-source-id: 87680e71113d5f23235e7eae8cf65ee363134580
@mdouze mdouze closed this as completed Aug 31, 2022
@urialon
Copy link

urialon commented Aug 31, 2022

Thanks a lot @mdouze ! Much appreciated.

Should we expect a release soon, or should we build from sources to use this?

@mdouze
Copy link
Contributor

mdouze commented Sep 1, 2022

we plan to release 1.7.3 in sept

@mireklzicar
Copy link

Please, could you add this functionality (batch reconstruct and search_and_reconstruct) with binary indexes too?

@mdouze
Copy link
Contributor

mdouze commented Jul 21, 2023

please open a new issue for this, or better implement it as a PR yourself

@guillaumeguy
Copy link

guillaumeguy commented Aug 11, 2023

I was browsing thru the closed PR and thought it closed without merging. It turns out that the functionality was merged = at least for PQ+IVF indices (different PR?). See:

index.reconstruct_batch(ids)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants