You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I had a question regarding how I should be interpreting vector codes for the ProductQuantizer.
As I was experimenting with some of the faiss.ProductQuantizer as a means of compression of data, I noticed that no matter what the values of M, and nbits are, when I would compute the codes of a given vector, I would always get centroid ids that are from 0-255.
For example, if I do pq = faiss.ProductQuantizer(512,128,4) pq.train(embeddings) vec_codes = pq.compute_codes(vector)
I would expect that vec_codes to have shape (1,128) and every value of vec_codes to take values from 0-15. However, what I get instead is a shape (1,64), with values taking values from 0-255. What seems to happen instead is that the shape of the codes are changed instead -- I have no idea why.
By directly accessing the centroids via centroids = faiss.vector_to_array(pq.centroids).reshape(pq.M, pq.ksub, pq.dsub), I can see that the decoded vector does have segments that are constructed from the centroids, but I can't seem to find a clear correspondence between the centroid indices and the codes, except for the first one. (ie I would expect that for vec_reconstructed = pq.decode(vec_codes), that vec_reconstructed[4i:4i+4] == centroids[vec_codes[i]%64][i], but this only seems true for the first segment.)
I've tested this with varying values of M, and nbits, and I see this behavior of all of them.
Am I just gravely misunderstanding something here?
As a second possibly related question, are there restrictions on what values I can use for nbits? For instance, in the example here: https://github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization
the value is set at precisely 8, and is not parameterized. From other sources I've looked at, values higher than 8 are not support for indices like IVFPQ, and for GPUs, this value is also only allowed to take on certain values. Are there restrictions for the normal PQ case that may be causing above behavior?
OS: Ubuntu 20.04.2
Faiss version: 1.7.2
Running on:
CPU
GPU
Interface:
C++
Python
The text was updated successfully, but these errors were encountered:
The PQ codes for a number of bits != 8 are packed into a bit string, of ceil(nbit * M / 8) bytes and compute_codes returns the packed representation, see issue #2285
For PQ and IVFPQ, any nbits <= 16 is supported. For GPU only 8 bits.
Hi,
I had a question regarding how I should be interpreting vector codes for the ProductQuantizer.
As I was experimenting with some of the
faiss.ProductQuantizer
as a means of compression of data, I noticed that no matter what the values of M, and nbits are, when I would compute the codes of a given vector, I would always get centroid ids that are from 0-255.For example, if I do
pq = faiss.ProductQuantizer(512,128,4)
pq.train(embeddings)
vec_codes = pq.compute_codes(vector)
I would expect that
vec_codes
to have shape(1,128)
and every value ofvec_codes
to take values from 0-15. However, what I get instead is a shape(1,64)
, with values taking values from 0-255. What seems to happen instead is that the shape of the codes are changed instead -- I have no idea why.By directly accessing the centroids via
centroids = faiss.vector_to_array(pq.centroids).reshape(pq.M, pq.ksub, pq.dsub)
, I can see that the decoded vector does have segments that are constructed from the centroids, but I can't seem to find a clear correspondence between the centroid indices and the codes, except for the first one. (ie I would expect that forvec_reconstructed = pq.decode(vec_codes)
, thatvec_reconstructed[4i:4i+4] == centroids[vec_codes[i]%64][i]
, but this only seems true for the first segment.)I've tested this with varying values of M, and nbits, and I see this behavior of all of them.
Am I just gravely misunderstanding something here?
As a second possibly related question, are there restrictions on what values I can use for nbits? For instance, in the example here: https://github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization
the value is set at precisely 8, and is not parameterized. From other sources I've looked at, values higher than 8 are not support for indices like IVFPQ, and for GPUs, this value is also only allowed to take on certain values. Are there restrictions for the normal PQ case that may be causing above behavior?
OS: Ubuntu 20.04.2
Faiss version: 1.7.2
Running on:
Interface:
The text was updated successfully, but these errors were encountered: