Skip to content

Faiss indexes

Matthijs Douze edited this page Apr 7, 2019 · 26 revisions

Summary of methods

The basic indexes are given hereafter:

Method Class name index_factory Main parameters Bytes/vector Exhaustive Comments
Exact Search for L2 IndexFlatL2 "Flat" d 4*d yes brute-force
Exact Search for Inner Product IndexFlatIP "Flat" d 4*d yes also for cosine (normalize vectors beforehand)
Hierarchical Navigable Small World graph exploration IndexHNSWFlat 'HNSWx,Flat` d, M 4*d + 8 * M no
Inverted file with exact post-verification IndexIVFFlat "IVFx,Flat" quantizer, d, nlists, metric 4*d no Take another index to assign vectors to inverted lists
Locality-Sensitive Hashing (binary flat index) IndexLSH - d, nbits nbits/8 yes optimized by using random rotation instead of random projections
Scalar quantizer (SQ) in flat mode IndexScalarQuantizer "SQ8" d d yes 4 bit per component is also implemented, but the impact on accuracy may be inacceptable
Product quantizer (PQ) in flat mode IndexPQ "PQx" d, M, nbits M (if nbits=8) yes
IVF and scalar quantizer IndexIVFScalarQuantizer "IVFx,SQ4" "IVFx,SQ8" quantizer, d, nlists, qtype SQfp16: 2 * d, SQ8: d or SQ4: d/2 no there are 2 encodings: 4 bit per dimension and 8 bit per dimension
IVFADC (coarse quantizer+PQ on residuals) IndexIVFPQ "IVFx,PQy" quantizer, d, nlists, M, nbits M+4 or M+8 no the memory cost depends on the data type used to represent ids (int or long), currently supports only nbits <= 8
IVFADC+R (same as IVFADC with re-ranking based on codes) IndexIVFPQR "IVFx,PQy+z" quantizer, d, nlists, M, nbits, M_refine, nbits_refine M+M_refine+4 or M+M_refine+8 no

The index can be constructed explicitly with the class constructor, or by using index_factory.

Cell-probe methods (IndexIVF* indexes)

A typical way to speed-up the process at the cost of loosing the guarantee to find the nearest neighbor is to employ a partitioning technique such as k-means. The corresponding algorithms are sometimes referred to as cell-probe methods.

We use a partition-based method based on Multi-probing (a reminiscent variant of best-bin KD-tree).

  • The feature space is partitioned into ncells cells.
  • The database vectors are assigned to one of these cells thanks to a hashing function (in the case of k-means, the assignment to the centroid closest to the query), and stored in an inverted file structure formed of ncells inverted lists.
  • At query time, a set of nprobe inverted lists is selected
  • The query is compared to each of the database vector assigned to these lists.

Doing so, only a fraction of the database is compared to the query: as a first approximation, this fraction is nprobe/ncells, but this approximation is usually under-estimated because the inverted lists have not equal lengths. The failure case appears when the cell of the nearest neighbor of a given query is not selected.

The constructor takes an index as a parameter (the quantizer or coarse quantizer), which is used to do the assignment to the inverted lists. The query is searched in this index, and the returned vector id(s) are the inverted list(s) that should be visited.

Cell probe method with a flat index as coarse quantizer

Typically, one would use a Flat index as coarse quantizer. The train method of the IndexIVF adds the centroids to the flat index. The nprobe is specified at query time (useful for measuring trade-offs between speed and accuracy).

NOTE: As a rule of thumb, denoting by n the number of points to be indexed, a typical way to select the number of centroids is to aim at balancing the cost of the assignment to the centroids (ncentroids * d for a plain k-means) with the number of distance computations performed when parsing the inverted lists (in the order of kprobe / ncells * n * C, where the constant accounts for the uneven distribution of the list and the fact that a single vector comparison is done more efficiently when done by batch with centroids, say C=10 to give an idea). This leads to a number of centroids of the form ncentroids = C * sqrt (n).

WARNING: partitioning methods are prone to suffer the curse of dimensionality. For truly high-dimensional data, achieving good recall requires to have a very large number of probes.

Encoding of vectors in an IndexIVF

The elements of inverted lists are encoded vectors (+ the corresponding vector id). The encoding is mainly to make the vectors more compact. Those elements are just scanned sequentially, and the search function returns the top-k smallest distances seen so far.

The available encodings are (from least to strongest compression):

  • no encoding at all (IndexIVFFlat): the vectors are stored without compression;
  • 16-bit float encoding (IndexIVFScalarQuantizer with QT_fp16): the vectors are compressed to 16-bit floats, which may cause some loss of precision;
  • 8-bit integer encoding (IndexIVFScalarQuantizer with QT_8bit): the vector elements are quantized uniformly in 256 levels;
  • 4-bit integer encoding (IndexIVFScalarQuantizer with QT_4bit): vectors quantized to 16 levels;
  • PQ encoding (IndexIVFPQ): vectors are split into sub-vectors that are each quantized to a 8-bit integer. See the example below.

Relationship with LSH

The most popular cell-probe method is probably the original Locality Sensitive Hashing method referred to as [E2LSH] (http://www.mit.edu/~andoni/LSH/). However this method and its derivatives suffer from two drawbacks:

  • They require a lot of hash functions (=partitions) to achieve acceptable results, leading to a lot of extra memory. Memory is not cheap.
  • The hash function are not adapted to the input data. This is good for proofs but leads to suboptimal choice results in practice.

Binary codes

In C++, a LSH index (binary vector mode, See Charikar STOC'2002) is declared as follows:

  IndexLSH * index = new faiss::IndexLSH (d, nbits);

where d is the input vector dimensionality and nbits the number of bits use per stored vector.

In Python, the (improved) LSH index is constructed and search as follows

n_bits = 2 * d
lsh = faiss.IndexLSH (d, n_bits)
lsh.train (x_train)
lsh.add (x_base)
D, I = lsh.search (x_query, k)

NOTE: The algorithm is not vanilla-LSH, but a better choice. Instead of set of orthogonal projectors is used if n_bits <= d, or a tight frame if n_bits > d.

Quantization-based approaches

In C++, the indexes based on product quantization are identified by the keyword PQ. For instance, the most common indexes based on product quantization are declared as follows:

  #include <faiss/IndexPQ.h>
  #include <faiss/IndexIVFPQ.h>

  // Define a product quantizer for vectors of dimensionality d=128,
  // with 8 bits per subquantizer and M=16 distinct subquantizer
  size_t d = 128;
  int M = 16;
  int nbits = 8;
  faiss:IndexPQ * index_pq = new faiss::IndexPQ (d, M, nbits);

  // Define an index using both PQ and an inverted file with nlists to avoid exhaustive search
  // The index 'quantizer' must be already declared
  faiss::IndexIVFPQ * ivfpq = new faiss::IndexIVFPQ (quantizer, d, nlists, M, nbits);

  // Same but with another level of refinement
  faiss::IndexIVFPQR * ivfpqr = new faiss::IndexIVFPQR (quantizer, d, nclust, M, nbits, M_refine, nbits);

In Python, a product quantizer is defined by:

m = 16                                   # number of subquantizers
n_bits = 8                               # bits allocated per subquantizer
pq = faiss.IndexPQ (d, m, n_bits)        # Create the index
pq.train (x_train)                       # Training
pq.add (x_base)                          # Populate the index
D, I = pq.search (x_query, k)            # Perform a search

The number of bits n_bits must be equal to 8, 12 or 16. The dimension d should be a multiple of m

Inverted file with PQ refinement

The IndexIVFPQ is probably the most useful indexing structure for large-scale search. It is used like

coarse_quantizer = faiss.IndexFlatL2 (d)
index = faiss.IndexIVFPQ (coarse_quantizer, d,
                          ncentroids, code_size, 8)
index.nprobe = 5

See the chapter about IndexIVFFlat for the setting of ncentroids. The code_size is typically a power of two between 4 and 64. Like for IndexPQ, d should be a multiple of m.

Clone this wiki locally