Run filtered disjunctions with MaxScoreBulkScorer. #14014

jpountz · 2024-11-22T21:13:21Z

Running filtered disjunctions with a specialized bulk scorer seems to yield a good speedup. For what it's worth, I also tried to implement a MAXSCORE-based scorer to see if it had to do with the BulkScorer specialization or the algorithm, but it didn't help.

To work properly, I had to add a rewrite rule to inline disjunctions in a MUST clause.

As a next step, it would be interesting to see if we can further optimize this by loading the filter into a bitset and applying it like live docs.

Running filtered disjunctions with a specialized bulk scorer seems to yield a good speedup. For what it's worth, I also tried to implement a MAXSCORE-based scorer to see if it had to do with the `BulkScorer` specialization or the algorithm, but it didn't help. To work properly, I had to add a rewrite rule to inline disjunctions in a MUST clause. As a next step, it would be interesting to see if we can further optimize this by loading the filter into a bitset and applying it like live docs.

jpountz · 2024-11-26T16:00:12Z

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             FilteredOrStopWords       49.95      (3.1%)       44.73      (1.9%)  -10.5% ( -14% -   -5%) 0.000
                       CountTerm     8973.48      (4.5%)     8705.79      (4.3%)   -3.0% ( -11% -    6%) 0.032
                    FilteredTerm      158.70      (2.4%)      156.76      (2.1%)   -1.2% (  -5% -    3%) 0.090
                 CountAndHighMed      170.30      (1.4%)      168.95      (1.3%)   -0.8% (  -3% -    1%) 0.066
                      OrHighHigh       52.77      (1.8%)       52.45      (1.9%)   -0.6% (  -4% -    3%) 0.306
     FilteredAnd2Terms2StopWords      196.53      (1.6%)      195.55      (2.2%)   -0.5% (  -4% -    3%) 0.416
                       OrHighMed      195.81      (1.5%)      195.02      (2.1%)   -0.4% (  -3% -    3%) 0.475
                        PKLookup      277.48      (1.5%)      276.36      (2.2%)   -0.4% (  -4% -    3%) 0.499
               FilteredAnd3Terms      190.97      (2.0%)      190.31      (2.1%)   -0.3% (  -4% -    3%) 0.591
             FilteredAndHighHigh       62.42      (2.1%)       62.22      (1.9%)   -0.3% (  -4% -    3%) 0.611
                CountAndHighHigh       57.69      (1.0%)       57.51      (1.0%)   -0.3% (  -2% -    1%) 0.291
                 CountOrHighHigh       75.30      (1.2%)       75.07      (1.1%)   -0.3% (  -2% -    2%) 0.422
              Or2Terms2StopWords      161.53      (4.6%)      161.07      (5.1%)   -0.3% (  -9% -    9%) 0.851
                    CombinedTerm       34.23      (1.1%)       34.14      (1.7%)   -0.3% (  -2% -    2%) 0.521
                        Or3Terms      169.51      (4.9%)      169.11      (4.7%)   -0.2% (  -9% -    9%) 0.877
             CombinedAndHighHigh       15.87      (1.0%)       15.84      (0.9%)   -0.2% (  -2% -    1%) 0.449
            FilteredAndStopWords       48.57      (2.2%)       48.46      (2.1%)   -0.2% (  -4% -    4%) 0.742
              FilteredAndHighMed      125.62      (3.0%)      125.37      (2.7%)   -0.2% (  -5% -    5%) 0.825
                      AndHighMed      122.31      (1.4%)      122.16      (1.3%)   -0.1% (  -2% -    2%) 0.774
              CombinedAndHighMed       58.05      (0.9%)       57.99      (0.9%)   -0.1% (  -1% -    1%) 0.725
               CombinedOrHighMed       78.59      (1.9%)       78.51      (2.1%)   -0.1% (  -4% -    4%) 0.881
              CombinedOrHighHigh       20.79      (1.8%)       20.78      (2.3%)   -0.0% (  -4% -    4%) 0.964
                  CountOrHighMed      142.41      (1.5%)      142.43      (1.3%)    0.0% (  -2% -    2%) 0.978
                     OrStopWords       32.66      (7.6%)       32.69      (7.7%)    0.1% ( -14% -   16%) 0.969
                     AndHighHigh       41.61      (1.5%)       41.65      (1.4%)    0.1% (  -2% -    2%) 0.825
                       And3Terms      168.26      (4.1%)      168.44      (4.2%)    0.1% (  -7% -    8%) 0.934
                    AndStopWords       29.79      (6.1%)       29.83      (6.1%)    0.1% ( -11% -   13%) 0.942
             And2Terms2StopWords      158.60      (3.9%)      159.00      (4.1%)    0.3% (  -7% -    8%) 0.840
                          OrMany       19.30      (5.3%)       19.37      (5.6%)    0.4% (  -9% -   11%) 0.835
                  FilteredPhrase       25.46      (2.6%)       25.55      (2.5%)    0.4% (  -4% -    5%) 0.640
                      OrHighRare      278.01      (4.2%)      279.32      (5.3%)    0.5% (  -8% -   10%) 0.754
      FilteredOr2Terms2StopWords      149.54      (2.3%)      150.37      (1.7%)    0.6% (  -3% -    4%) 0.380
              FilteredOrHighHigh       64.54      (3.3%)       66.22      (1.7%)    2.6% (  -2% -    7%) 0.002
                     CountPhrase        4.30      (4.6%)        4.43      (2.4%)    3.0% (  -3% -   10%) 0.009
                FilteredOr3Terms      151.25      (2.8%)      168.59      (1.6%)   11.5% (   6% -   16%) 0.000
               FilteredOrHighMed      137.64      (2.9%)      156.84      (1.3%)   13.9% (   9% -   18%) 0.000
                  FilteredOrMany       12.50      (2.5%)       16.94      (3.9%)   35.6% (  28% -   42%) 0.000

Filtered stop words are slower but other queries are faster.

Running filtered disjunctions with a specialized bulk scorer seems to yield a good speedup. For what it's worth, I also tried to implement a MAXSCORE-based scorer to see if it had to do with the `BulkScorer` specialization or the algorithm, but it didn't help. To work properly, I had to add a rewrite rule to inline disjunctions in a MUST clause. As a next step, it would be interesting to see if we can further optimize this by loading the filter into a bitset and applying it like live docs.

jpountz added this to the 10.1.0 milestone Nov 22, 2024

jpountz marked this pull request as draft November 25, 2024 11:03

jpountz added 2 commits November 26, 2024 16:01

Merge branch 'main' into filtered_maxscore

80903ed

tidy

2bc0f20

jpountz marked this pull request as ready for review November 26, 2024 15:15

CHANGES

8394b52

jpountz merged commit 98c59a7 into apache:main Nov 27, 2024
3 checks passed

jpountz deleted the filtered_maxscore branch November 27, 2024 20:56

ChrisHegarty mentioned this pull request Nov 29, 2024

TestBooleanRewrites.testRandom fails #14026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run filtered disjunctions with MaxScoreBulkScorer. #14014

Run filtered disjunctions with MaxScoreBulkScorer. #14014

jpountz commented Nov 22, 2024

jpountz commented Nov 26, 2024

Run filtered disjunctions with MaxScoreBulkScorer. #14014

Run filtered disjunctions with MaxScoreBulkScorer. #14014

Conversation

jpountz commented Nov 22, 2024

jpountz commented Nov 26, 2024