Question on ProductQuantizer code sizes #2467

Frederick369 · 2022-09-13T21:52:16Z

Hi,
I had a question regarding how I should be interpreting vector codes for the ProductQuantizer.

As I was experimenting with some of the faiss.ProductQuantizer as a means of compression of data, I noticed that no matter what the values of M, and nbits are, when I would compute the codes of a given vector, I would always get centroid ids that are from 0-255.

For example, if I do
pq = faiss.ProductQuantizer(512,128,4)
pq.train(embeddings)
vec_codes = pq.compute_codes(vector)

I would expect that vec_codes to have shape (1,128) and every value of vec_codes to take values from 0-15. However, what I get instead is a shape (1,64), with values taking values from 0-255. What seems to happen instead is that the shape of the codes are changed instead -- I have no idea why.

By directly accessing the centroids via centroids = faiss.vector_to_array(pq.centroids).reshape(pq.M, pq.ksub, pq.dsub), I can see that the decoded vector does have segments that are constructed from the centroids, but I can't seem to find a clear correspondence between the centroid indices and the codes, except for the first one. (ie I would expect that for vec_reconstructed = pq.decode(vec_codes), that vec_reconstructed[4i:4i+4] == centroids[vec_codes[i]%64][i], but this only seems true for the first segment.)

I've tested this with varying values of M, and nbits, and I see this behavior of all of them.

Am I just gravely misunderstanding something here?

As a second possibly related question, are there restrictions on what values I can use for nbits? For instance, in the example here: https://github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization
the value is set at precisely 8, and is not parameterized. From other sources I've looked at, values higher than 8 are not support for indices like IVFPQ, and for GPUs, this value is also only allowed to take on certain values. Are there restrictions for the normal PQ case that may be causing above behavior?

OS: Ubuntu 20.04.2
Faiss version: 1.7.2
Running on:

CPU
GPU

Interface:

C++
Python

The text was updated successfully, but these errors were encountered:

mdouze · 2022-09-14T07:55:21Z

The PQ codes for a number of bits != 8 are packed into a bit string, of ceil(nbit * M / 8) bytes and compute_codes returns the packed representation, see issue #2285

For PQ and IVFPQ, any nbits <= 16 is supported. For GPU only 8 bits.

Frederick369 · 2022-09-14T19:04:09Z

thanks so much!

mdouze added duplicate question labels Sep 14, 2022

Frederick369 closed this as completed Sep 14, 2022

gitgithan mentioned this issue Feb 16, 2023

Wrong variable description in wiki Faiss building blocks: clustering, PCA, quantization? #2709

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on ProductQuantizer code sizes #2467

Question on ProductQuantizer code sizes #2467

Frederick369 commented Sep 13, 2022 •

edited

Loading

mdouze commented Sep 14, 2022

Frederick369 commented Sep 14, 2022

Question on ProductQuantizer code sizes #2467

Question on ProductQuantizer code sizes #2467

Comments

Frederick369 commented Sep 13, 2022 • edited Loading

mdouze commented Sep 14, 2022

Frederick369 commented Sep 14, 2022

Frederick369 commented Sep 13, 2022 •

edited

Loading