Add a Better Binary Quantizer format for dense vectors #13651

benwtrent · 2024-08-13T19:02:05Z

Highlevel design

RaBitQ is basically a better binary quantization, which works across all models we have tested against. Like PQ, it does require coarse grained clustering to be effective at higher vector densities (effective being defined as only requiring 5x or lower oversampling for recall>95%). But in our testing, the number of vectors required per cluster can be exceptionally large (10s to 100s of millions).

The euclidean vectors as stored in the index:

quantized vector	distance_to_centroid	vector magnitude
(vector_dimension/8) bytes	float	float

For dot-product vectors:

quantized vector	vector dot-product with binarized self	vector magnitude	centroid dot-product
(vector_dimension/8) bytes	float	float	float

The vector metadata, in addition to all the regular things (similarity, encoding, sparse vector DISI, etc.).

For indexing into HNSW we actually have a multi-step process. Better binary encodes the query vectors different than the index vectors. Consequently, during segment merge & HNSW building, another temporary file is written containing the query quantized vectors over the configured centroids. One downside is that this temporary file will actually be larger than the regular vector index. This is because we use asymmetric quantization to keep good information around. But once the merge is complete, this file is deleted.

We then read from the query temporary file when adding a vector to the graph and when exploring HNSW, we search the indexed quantized values.

closes: #13650

mayya-sharipova · 2024-08-21T18:13:59Z

@benwtrent

possibly switch to LongValues for storing vectorOrd -> centroidOrd mapping

I was thinking about adding centroids mappings as LongValues at the end of meta file, but this could potentially make meta file quite large (for 100M docs, we would need extra 100Mb). We really try to keep meta files small, so I would prefer either:

to keep the current approach (add a byte at the end of each vector in the vectors file). Indeed, this may throw off paging size, but may be effect on memory mapped files is not big?
adding an extra file for centroids mapping. Centroids mapping can be accessed through memory mapped file, or loaded directly into memory on first use.

What do you think?

For now, we keep adopting the 1st (current) approach.

benwtrent · 2024-08-21T19:25:56Z

100MB assumes that even when compressed, it's a single byte per centroid. 100M vectors might only have 2 centroids and thus only need two bits two store.

Also, I would expect the centroids to be at the end of the "veb" file, not metadata. Like we do already for the sparse vector ord to doc resolution.

But, either solution needs testing for sure.

…t/lucene into feature/adv-binarization-format

ChrisHegarty

LGTM

mayya-sharipova · 2024-10-30T19:22:44Z

.../core/src/java/org/apache/lucene/codecs/lucene101/Lucene101BinaryQuantizedVectorsFormat.java

+ *   <li><b>vint</b> the vector dimensions
+ *   <li><b>vlong</b> the offset to the vector data in the .veb file
+ *   <li><b>vlong</b> the length of the vector data in the .veb file
+ *   <li><b>vint</b> the number of vectors


Also:

<li><b>[float]</b> clusterCenter <li><b>int</b> dotProduct of clusterCenter with itself

mayya-sharipova

Amazing work! Thanks Ben and the team!

benwtrent · 2024-11-05T12:23:02Z

Hey @ShashwatShivam mikemccand/luceneutil@main...benwtrent:luceneutil:bbq

that is the testing script I use.

But if Lucene has since been updated with a 101 codec, I would need to update this branch.

…ation-format

ShashwatShivam · 2024-11-07T15:57:31Z

@benwtrent thanks for giving the link to the testing script, it works! One question - the index size it reports is larger than the HNSW index size. For e.g. I was working with a Cohere 768 dim dataset with 500k docs and the index sizes were 1488.83 MB and 1544.79 MB for HNSW and RaBitQ (Lucene101HnswBinaryQuantizedVectorsFormat) respectively, which seems incorrect. Could you please tell me why this discrepancy occurs, if you've seen this issue before?

benwtrent · 2024-11-07T16:43:37Z

@ShashwatShivam why do you think the index size (total size of all the files) should be smaller?

We store the binary quantized vectors and the floating point vectors. So, I would expect about a 5% increase in disk size just from vectors only.

I have also noticed that the HNSW graph itself ends up being more densely connected, but this is only a marginal increase in disk space as well.

ShashwatShivam · 2024-11-07T23:19:45Z

@benwtrent makes sense, I wasn't accounting for the fact that the floating vectors are being stored too. I guess I should have instead asked how to reproduce the 'memory required' column, which shows a marked reduction for 1 bit quantization v/s raw?

benwtrent · 2024-11-08T12:49:26Z

@ShashwatShivam I don't think there is a "memory column" provided anywhere. I simply looked at the individual file sizes (veb, vex) and summed their sizes together.

ShashwatShivam · 2024-11-10T08:14:37Z

Hey @benwtrent,
Thank you for all your help so far! I have a question about the oversampling used to increase recall. From what I understand, it scales up the top-k and fanout values by the oversampling factor. In the final match set, do we return only the best top-k documents (not scaled up, but the original value)? I couldn't locate the code where the reranking or selection of the best k results from the expanded match set happens. Could you please help me find that part?
Thanks again!

mikemccand · 2024-11-12T12:23:54Z

@ShashwatShivam I don't think there is a "memory column" provided anywhere. I simply looked at the individual file sizes (veb, vex) and summed their sizes together.

Once this cool change is merged let's fix luceneutil's KNN benchy tooling (knnPerfTest.py, KnnGraphTester.java) to compute/report the "memory column" ("hot RAM", "searchable RAM", something)? Basically everything except the original (float32 or byte) vectors. I'll open an upstream luceneutil issue...

benwtrent · 2024-11-12T13:15:15Z

Quick update, we have been bothered with some of the numbers (for example, models like "gist" perform poorly) and we have some improvements to get done first before flipping back to "ready for review".

@mikemccand YES! That would be great! "Memory required" would be the quantized file size + hnsw graph file size (if the graph exists).

@ShashwatShivam

Sorry for the late reply. There are no "out of the box" rescoring actions directly in Lucene. Mainly because the individual tools are (mostly) already available to you. You can ask for more overall vectors with one query, and then rescore the individual documents according to the raw vector comparisons. I admit, this requires some Lucene API know how.

It would be good for a "vector scorer" to indicate if its an estimation or not to allow for smarter actions in the knn doc collector...

ShashwatShivam · 2024-11-12T16:06:41Z

I conducted a benchmark using Cohere's 768-dimensional data. Here are the steps I followed for reproducibility:

Set up the luceneutil repository following the installation instructions provided.
Switch branches to this specific branch since the latest mainline branch is not compatible with the feature needed for this experiment.
Change the branch of lucene_candidate to benwtrent:feature/adv-binarization-format to incorporate advanced binarization formats.
Run knnPerfTest.py after specifying the document and query file paths to the stored Cohere data files. The runtime parameters were set as follows:
- nDoc = 500,000
- topk = 10
- fanout = 100
- maxConn = 32
- beamWidth = 100
- oversample values tested: {1, 1.5, 2, 3, 4, 5}
I used quantizeBits = 1 for RaBitQ+HNSW and quantizeBits = 32 for regular HNSW.

A comparison was performed between HNSW and RaBitQ, and I observed the recall-latency tradeoff, which is shown in the attached image:
.

tanyaroosta · 2024-12-10T21:03:37Z

FYI, a blog post on RaBitQ:

https://dev.to/gaoj0017/quantization-in-the-counterintuitive-high-dimensional-space-4feg

gaoj0017 · 2024-12-14T04:13:37Z

Thanks, Tanya @tanyaroosta , for sharing our blog about RaBitQ in this thread. I am the first author of the RaBitQ paper. I am glad to know that our RaBitQ method has been discussed in the threads here. Regarding the BBQ (Better Binary Quantization) method mentioned in these threads, my understanding is that it majorly follows the framework of RaBitQ and makes some minor modifications for practical performance consideration. The claimed key features of BBQ as described in a blog from Elastic - Better Binary Quantization (BBQ) in Lucene and Elasticsearch - e.g., normalization around a centroid, multiple error correction values, asymmetric quantization, bit-wise operations, all originate from our RaBitQ paper.

We note that it is quite often that the industry customizes some methods from academia to better suit their applications, but the industry rarely gives the variant a new name and claim as a new method. For example, the PQ and HNSW methods are from academia and have been widely adopted in the industry with some modifications, but the industry still respects their original names. We believe the same practice should be followed for RaBitQ.

In addition, we would like to share that we have extended RaBitQ to support quantization beyond 1-bit per dimension (e.g., 2-bit, 3-bit, …). The paper of the extended RaBitQ was made available in Sep 2024. It achieves so by constructing a larger codebook than that of RaBitQ and can be equivalently understood as an optimized scalar quantization method. For details, please refer to the paper and also a blog that we have recently posted.

benwtrent · 2024-12-17T22:23:46Z

Closing this PR in deference to this one: #14078

An evolution of scalar quantization proved more flexible and provided better recall in our experiments.

@tveasey

This provides a binary vector format for vectors. The key ideas are: - Centroid centered vectors - Asymmetric quantization - Individually optimized scalar quantization This allows Lucene to have a single scalar quantization format that allows for high quality vector retrieval, even down to a single bit. The ideas here build on a couple of foundations: 1. Locally determined vector quantization techniques originated with LVQ: https://arxiv.org/abs/2304.04759 2. Anisotropic loss originally described with the SCANN technique: https://arxiv.org/abs/1908.10396 3. Lucene's own vector optimization for quantiles. Lucene wouldn't be the only "Optimized Quantile LVQ" technique on the block, as the original RaBitQ authors has extended its technique though the loss function and optimization are different: https://arxiv.org/abs/2409.09913 -------------------------------------------------------- For all similarity types, on disk it looks like. | quantized vector | lower quantile | upper quantile| additional correction | sum quantized components| | - | - |- | - | - | | (vector_dimension/8) bytes | float | float | float | short | During segment merge & HNSW building, another temporary file is written containing the query quantized vectors over the configured centroids. One downside is that this temporary file will actually be larger than the regular vector index. This is because we use asymmetric quantization to keep good information around. But once the merge is complete, this file is deleted. I think eventually, this can be removed. -------------------------------------------------------- Here are the results for _Recall@10|50_ | Dataset | old PR | this one | Improvement | | --- | --- | --- | --- | | Cohere 768 | 0.933 | 0.938 | 0.5% | | Cohere 1024 | 0.932 | 0.945 | 1.3% | | E5-Small-v2 | 0.972 | 0.975 | 0.3% | | GIST-1M | 0.740 | 0.989 | 24.9% | Even with the optimization step, indexing time with HNSW is only marginally increased. | Dataset | OLD PR | This One | Difference | | --- | --- | --- | --- | | Cohere 768 | 368.62s | 372.95s | +1% | | Cohere 1024 | 307.09s | 314.08s | +2% | | E5-Small-v2 | 227.37s | 229.83s | < +1% | The consistent improvement in recall and flexibility for various bits makes this format and quantization technique much preferred. Eventually, we should consider moving scalar quantization to utilize this new optimized quantizer. Though, the on disk format and scoring will change, so, I didn't do that in this PR. supersedes: #13651 Co-Authors: @tveasey @john-wagster @mayya-sharipova @ChrisHegarty

@tveasey

This provides a binary vector format for vectors. The key ideas are: - Centroid centered vectors - Asymmetric quantization - Individually optimized scalar quantization This allows Lucene to have a single scalar quantization format that allows for high quality vector retrieval, even down to a single bit. The ideas here build on a couple of foundations: 1. Locally determined vector quantization techniques originated with LVQ: https://arxiv.org/abs/2304.04759 2. Anisotropic loss originally described with the SCANN technique: https://arxiv.org/abs/1908.10396 3. Lucene's own vector optimization for quantiles. Lucene wouldn't be the only "Optimized Quantile LVQ" technique on the block, as the original RaBitQ authors has extended its technique though the loss function and optimization are different: https://arxiv.org/abs/2409.09913 -------------------------------------------------------- For all similarity types, on disk it looks like. | quantized vector | lower quantile | upper quantile| additional correction | sum quantized components| | - | - |- | - | - | | (vector_dimension/8) bytes | float | float | float | short | During segment merge & HNSW building, another temporary file is written containing the query quantized vectors over the configured centroids. One downside is that this temporary file will actually be larger than the regular vector index. This is because we use asymmetric quantization to keep good information around. But once the merge is complete, this file is deleted. I think eventually, this can be removed. -------------------------------------------------------- Here are the results for _Recall@10|50_ | Dataset | old PR | this one | Improvement | | --- | --- | --- | --- | | Cohere 768 | 0.933 | 0.938 | 0.5% | | Cohere 1024 | 0.932 | 0.945 | 1.3% | | E5-Small-v2 | 0.972 | 0.975 | 0.3% | | GIST-1M | 0.740 | 0.989 | 24.9% | Even with the optimization step, indexing time with HNSW is only marginally increased. | Dataset | OLD PR | This One | Difference | | --- | --- | --- | --- | | Cohere 768 | 368.62s | 372.95s | +1% | | Cohere 1024 | 307.09s | 314.08s | +2% | | E5-Small-v2 | 227.37s | 229.83s | < +1% | The consistent improvement in recall and flexibility for various bits makes this format and quantization technique much preferred. Eventually, we should consider moving scalar quantization to utilize this new optimized quantizer. Though, the on disk format and scoring will change, so, I didn't do that in this PR. supersedes: #13651 Co-Authors: @tveasey @john-wagster @mayya-sharipova @ChrisHegarty

benwtrent and others added 8 commits August 12, 2024 16:19

iter

2c4cca9

iter

d8f1aae

iter

20aa776

iter

df54dde

iter

1b31e3e

iter

9d783ff

iter

3415d52

fleshed out a basic binary quantizer class; needs cleanup/iter

01acdf2

ChrisHegarty requested review from ChrisHegarty and removed request for ChrisHegarty August 14, 2024 13:43

john-wagster and others added 12 commits August 14, 2024 10:19

fleshed out a basic binary quantizer class; needs cleanup/iter

1bf59f4

iter

dc0e2aa

iter

71cf39a

iter

938d0ad

bin quantizer; cleanup/iter

91cf834

iter

f6e71d7

bin scorer; cleanup/iter

d84064a

bin scorer; cleanup/iter

b05f906

Correct errors in format reading

ecdcd4f

More corrections in format

c56990e

bin scorer; cleanup/iter

2499263

Better centroid re-calculation based on weighted sum

56b133b

john-wagster and others added 6 commits August 21, 2024 19:27

bin scorer; cleanup/iter

8f4f935

Merge branch 'feature/adv-binarization-format' of github.com:benwtren…

0c4d66b

…t/lucene into feature/adv-binarization-format

Merge branch 'main' into feature/adv-binarization-format

19953fc

remove export from sandbox module-info

fb5faea

fix warnings: unused, forbidden, lint, headers, etc

c8d295b

spotless

88f0219

benwtrent requested a review from ChrisHegarty October 24, 2024 13:50

tteofili approved these changes Oct 28, 2024

View reviewed changes

ChrisHegarty approved these changes Oct 29, 2024

View reviewed changes

mayya-sharipova reviewed Oct 30, 2024

View reviewed changes

mayya-sharipova approved these changes Oct 30, 2024

View reviewed changes

benwtrent marked this pull request as draft November 1, 2024 15:06

benwtrent added 2 commits November 6, 2024 08:28

Merge remote-tracking branch 'upstream/main' into feature/adv-binariz…

12340ac

…ation-format

merging in main, fixing tests

e562aca

benwtrent changed the title ~~Add a Better Binary Quantizer (RaBitQ) format for dense vectors~~ Add a Better Binary Quantizer format for dense vectors Nov 8, 2024

mikemccand mentioned this pull request Nov 12, 2024

KNN benchmark tooling should also report "KNN Searcher RAM" mikemccand/luceneutil#314

Open

mikemccand mentioned this pull request Nov 13, 2024

Add binary (single bit) KNN quantization option to knnPerfTest.py mikemccand/luceneutil#317

Open

dungba88 mentioned this pull request Nov 29, 2024

Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009

Open

navneet1v mentioned this pull request Dec 3, 2024

[FEATURE] Improving Lucene Engine Query Performance by reducing number of times a single Lucene k-NN query gets executed opensearch-project/k-NN#2115

Closed

benwtrent mentioned this pull request Dec 5, 2024

Add higher quantization level for kNN vector search #13650

Open

benwtrent mentioned this pull request Dec 17, 2024

Binary vector format for flat and hnsw vectors #14078

Merged

benwtrent closed this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Better Binary Quantizer format for dense vectors #13651

Add a Better Binary Quantizer format for dense vectors #13651

benwtrent commented Aug 13, 2024 •

edited

Loading

mayya-sharipova commented Aug 21, 2024 •

edited

Loading

benwtrent commented Aug 21, 2024

ChrisHegarty left a comment

mayya-sharipova Oct 30, 2024 •

edited

Loading

mayya-sharipova left a comment

benwtrent commented Nov 5, 2024

ShashwatShivam commented Nov 7, 2024

benwtrent commented Nov 7, 2024

ShashwatShivam commented Nov 7, 2024

benwtrent commented Nov 8, 2024

ShashwatShivam commented Nov 10, 2024

mikemccand commented Nov 12, 2024

benwtrent commented Nov 12, 2024

ShashwatShivam commented Nov 12, 2024

tanyaroosta commented Dec 10, 2024

gaoj0017 commented Dec 14, 2024

benwtrent commented Dec 17, 2024

Add a Better Binary Quantizer format for dense vectors #13651

Add a Better Binary Quantizer format for dense vectors #13651

Conversation

benwtrent commented Aug 13, 2024 • edited Loading

Highlevel design

mayya-sharipova commented Aug 21, 2024 • edited Loading

benwtrent commented Aug 21, 2024

ChrisHegarty left a comment

Choose a reason for hiding this comment

mayya-sharipova Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

mayya-sharipova left a comment

Choose a reason for hiding this comment

benwtrent commented Nov 5, 2024

ShashwatShivam commented Nov 7, 2024

benwtrent commented Nov 7, 2024

ShashwatShivam commented Nov 7, 2024

benwtrent commented Nov 8, 2024

ShashwatShivam commented Nov 10, 2024

mikemccand commented Nov 12, 2024

benwtrent commented Nov 12, 2024

ShashwatShivam commented Nov 12, 2024

tanyaroosta commented Dec 10, 2024

gaoj0017 commented Dec 14, 2024

benwtrent commented Dec 17, 2024

benwtrent commented Aug 13, 2024 •

edited

Loading

mayya-sharipova commented Aug 21, 2024 •

edited

Loading

mayya-sharipova Oct 30, 2024 •

edited

Loading