Skip to content

Conversation

@kaivalnp
Copy link
Contributor

Spinoff from apache/lucene#14758

Add an option for index-time filtering of vectors to knnPerfTest.py

Index-time filtering means creating a separate HNSW graph for filters known at index time, by passing the same vector to Lucene under a different vector field name. This type of filtering may be beneficial to some users willing to move cost of filtering upfront to indexing (both time and storage).

Right now, Lucene stores copies of the duplicated vectors into different fields on disk -- and we're aiming to reduce this duplication in the linked Lucene issue (feedback welcome)!

KnnGraphTester already had a -filterSelectivity and -prefilter option to simulate pre and post-filtering -- I've modified that a bit to change options to: -filterStrategy (one of query-time-pre-filter, query-time-post-filter, index-time-filter) and -filterSelectivity (between 0 and 1, non-inclusive)

KnnGraphTester now expects either both, or none of the above parameters to be passed at the same time -- and does not perform any filtering if these values are not specified. When index-time-filter is used, it adds an additional vector field knn-filtered to the index, and uses this smaller field at search-time. The other two options correspond to the -prefilter boolean flag being present (or not) before this PR.

I'll add some benchmarks soon!

@kaivalnp
Copy link
Contributor Author

Benchmarks

Cohere vectors, 768d, MAXIMUM_INNER_PRODUCT
The 5 rows correspond to filterSelectivity = [0.90, 0.50, 0.20, 0.10, 0.01]

100K docs, query-time-pre-filter

recall  latency(ms)    netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)         filterStrategy  filterSelectivity  vec_disk(MB)  vec_RAM(MB)  indexType
 0.924        1.186     1.184        0.999  100000   100      50       32        200         no     4373      6.21      16113.44            4.76             1          300.10  query-time-pre-filter               0.90       292.969      292.969       HNSW
 0.908        3.360   -25.343       -7.542  100000   100      50       32        200         no     7009      0.00      Infinity            0.12             1          300.10  query-time-pre-filter               0.50       292.969      292.969       HNSW
 0.917        3.111   -10.914       -3.508  100000   100      50       32        200         no     5735      0.00      Infinity            0.13             1          300.10  query-time-pre-filter               0.20       292.969      292.969       HNSW
 0.899        2.181     2.180        0.999  100000   100      50       32        200         no     3580      0.00      Infinity            0.12             1          300.10  query-time-pre-filter               0.10       292.969      292.969       HNSW
 1.000        0.274     0.273        0.997  100000   100      50       32        200         no     1023      0.00      Infinity            0.12             1          300.10  query-time-pre-filter               0.01       292.969      292.969       HNSW

100K docs, index-time-filter

recall  latency(ms)    netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)         filterStrategy  filterSelectivity  vec_disk(MB)  vec_RAM(MB)  indexType
 0.922        1.085     1.084        0.999  100000   100      50       32        200         no     3990      9.57      10444.96            9.97             1          569.50      index-time-filter               0.90       292.969      292.969       HNSW
 0.942        1.028     1.027        0.999  100000   100      50       32        200         no     3924      9.56      10460.25            9.47             1          449.50      index-time-filter               0.50       292.969      292.969       HNSW
 0.964        0.865     0.864        0.999  100000   100      50       32        200         no     3593      7.72      12956.72            7.74             1          359.58      index-time-filter               0.20       292.969      292.969       HNSW
 0.978        0.629     0.628        0.999  100000   100      50       32        200         no     3125      7.35      13609.15            6.65             1          329.62      index-time-filter               0.10       292.969      292.969       HNSW
 1.000        0.142     0.141        0.994  100000   100      50       32        200         no     1023      6.92      14450.87            4.81             1          303.15      index-time-filter               0.01       292.969      292.969       HNSW

200K docs, query-time-pre-filter

recall  latency(ms)    netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)         filterStrategy  filterSelectivity  vec_disk(MB)  vec_RAM(MB)  indexType
 0.916        1.336     1.335        0.999  200000   100      50       32        200         no     4845     11.34      17642.91           13.33             1          600.85  query-time-pre-filter               0.90       585.938      585.938       HNSW
 0.889        4.007   -49.696      -12.401  200000   100      50       32        200         no     7793      0.00      Infinity            0.12             1          600.85  query-time-pre-filter               0.50       585.938      585.938       HNSW
 0.894        3.854   -23.024       -5.974  200000   100      50       32        200         no     6875      0.00      Infinity            0.12             1          600.85  query-time-pre-filter               0.20       585.938      585.938       HNSW
 0.885        3.069   -11.337       -3.694  200000   100      50       32        200         no     4703      0.00      Infinity            0.12             1          600.85  query-time-pre-filter               0.10       585.938      585.938       HNSW
 0.860        4.733     2.393        0.506  200000   100      50       32        200         no     1502      0.00      Infinity            0.13             1          600.85  query-time-pre-filter               0.01       585.938      585.938       HNSW

200K docs, index-time-filter

recall  latency(ms)    netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)         filterStrategy  filterSelectivity  vec_disk(MB)  vec_RAM(MB)  indexType
 0.912        1.234     1.233        0.999  200000   100      50       32        200         no     4413     18.78      10650.19           25.17             1         1140.75      index-time-filter               0.90       585.938      585.938       HNSW
 0.929        1.167     1.166        0.999  200000   100      50       32        200         no     4328     14.89      13430.93           18.79             1          900.93      index-time-filter               0.50       585.938      585.938       HNSW
 0.954        1.028     1.027        0.999  200000   100      50       32        200         no     4068     14.04      14249.07           17.77             1          720.34      index-time-filter               0.20       585.938      585.938       HNSW
 0.968        0.878     0.877        0.999  200000   100      50       32        200         no     3732     13.41      14912.02           16.43             1          660.26      index-time-filter               0.10       585.938      585.938       HNSW
 0.997        0.338     0.337        0.997  200000   100      50       32        200         no     1600     11.95      16732.20           13.63             1          606.77      index-time-filter               0.01       585.938      585.938       HNSW

500K docs, query-time-pre-filter

recall  latency(ms)    netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)         filterStrategy  filterSelectivity  vec_disk(MB)  vec_RAM(MB)  indexType
 0.898        1.716     1.715        0.999  500000   100      50       32        200         no     5262     32.26      15497.15           27.63             1         1503.01  query-time-pre-filter               0.90      1464.844     1464.844       HNSW
 0.860        4.477  -122.190      -27.294  500000   100      50       32        200         no     8331      0.00      Infinity            0.12             1         1503.01  query-time-pre-filter               0.50      1464.844     1464.844       HNSW
 0.859        4.633   -58.779      -12.686  500000   100      50       32        200         no     7601      0.00      Infinity            0.12             1         1503.01  query-time-pre-filter               0.20      1464.844     1464.844       HNSW
 0.855        4.038   -30.514       -7.557  500000   100      50       32        200         no     5665      0.00      Infinity            0.12             1         1503.01  query-time-pre-filter               0.10      1464.844     1464.844       HNSW
 0.728        4.507     0.098        0.022  500000   100      50       32        200         no     1644      0.00      Infinity            0.12             1         1503.01  query-time-pre-filter               0.01      1464.844     1464.844       HNSW

500K docs, index-time-filter

recall  latency(ms)    netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)         filterStrategy  filterSelectivity  vec_disk(MB)  vec_RAM(MB)  indexType
 0.896        1.615     1.613        0.999  500000   100      50       32        200         no     4815     52.87       9456.98           69.18             1         2854.58      index-time-filter               0.90      1464.844     1464.844       HNSW
 0.914        1.516     1.515        0.999  500000   100      50       32        200         no     4774     42.99      11629.80           56.55             1         2254.30      index-time-filter               0.50      1464.844     1464.844       HNSW
 0.941        1.219     1.218        0.999  500000   100      50       32        200         no     4611     36.78      13595.82           38.12             1         1803.51      index-time-filter               0.20      1464.844     1464.844       HNSW
 0.955        1.122     1.120        0.998  500000   100      50       32        200         no     4317     33.42      14962.89           36.61             1         1652.25      index-time-filter               0.10      1464.844     1464.844       HNSW
 0.991        0.516     0.515        0.998  500000   100      50       32        200         no     2643     38.44      13007.28           29.34             1         1517.73      index-time-filter               0.01      1464.844     1464.844       HNSW

Couple of things to note:

  • index(s) is 0 in subsequent runs of query-time-filter because the same index can be re-used, but we need separate indexes in case of an index-time-filter
  • vec_disk(MB) and vec_RAM(MB) are misleading because they only look at the main vector field (not the new "filtered" vector field) -- please look at index_size(MB) instead
  • netCPU and avgCpuCount are messed up for query-time-pre-filter because it computes exact KNN using multiple threads (and the same result is re-used by index-time-filter so those values are close to 1) -- please look at latency(ms) instead

The benefits of index-time filtering seem to improve with higher number of docs + more selective filters

# Conflicts:
#	src/python/knnPerfTest.py
Copy link
Owner

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome -- I left some minor polishing type comments!

FLAT
}

enum FilterStrategy {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh ... love it. I wish Lucene had this enum / query+filter optimizer.

break;
case "-filterStrategy":
if (iarg == args.length - 1) {
throw new IllegalArgumentException("-filterStrategy requires a following pathname");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathname -> strategy string (query-time-pre-filter, query-time-post-filter, or index-time-filter?

Copy link
Contributor Author

@kaivalnp kaivalnp Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Oops sorry, you meant the error message -- done!

case "query-time-pre-filter" -> FilterStrategy.QUERY_TIME_PRE_FILTER;
case "query-time-post-filter" -> FilterStrategy.QUERY_TIME_POST_FILTER;
case "index-time-filter" -> FilterStrategy.INDEX_TIME_FILTER;
default -> throw new IllegalArgumentException("-filterStrategy can be 'query-time-pre-filter' or 'query-time-post-filter' or 'index-time-filter' only");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of can be .... only say must be one of?

And can you include in the error message which (invalid) strategy string the user passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added!

Copy link
Owner

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you! We somehow need nightly benchy to test filtered KNN with these three strategies... maybe open spinoff issue? Thank @kaivalnp!

@mikemccand mikemccand merged commit 26a05a2 into mikemccand:main Oct 1, 2025
1 check passed
@kaivalnp kaivalnp deleted the index-time-filtering branch October 1, 2025 17:09
@kaivalnp
Copy link
Contributor Author

kaivalnp commented Oct 1, 2025

Thanks @mikemccand, I opened #473

@mikemccand
Copy link
Owner

Hmm nightly benchy is angry:

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "knn.KnnGraphTester$FilterStrategy.toString()" because "this.filterStrategy" is null
        at knn.KnnGraphTester.testSearch(KnnGraphTester.java:1013)
        at knn.KnnGraphTester.run(KnnGraphTester.java:568)
        at knn.KnnGraphTester.runWithCleanUp(KnnGraphTester.java:238)
        at knn.KnnGraphTester.main(KnnGraphTester.java:233)

I think this happens if you specify no filter strategy? I'll make a fix -- I think we just need a null check when we print the summary.

Also probably need to fix runNightlyKnn.py to handle the newly inserted column... I'll give that a shot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants