Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run filtered disjunctions with MaxScoreBulkScorer. #14014

Merged
merged 4 commits into from
Nov 27, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Nov 22, 2024

Running filtered disjunctions with a specialized bulk scorer seems to yield a good speedup. For what it's worth, I also tried to implement a MAXSCORE-based scorer to see if it had to do with the BulkScorer specialization or the algorithm, but it didn't help.

To work properly, I had to add a rewrite rule to inline disjunctions in a MUST clause.

As a next step, it would be interesting to see if we can further optimize this by loading the filter into a bitset and applying it like live docs.

Running filtered disjunctions with a specialized bulk scorer seems to yield a
good speedup. For what it's worth, I also tried to implement a MAXSCORE-based
scorer to see if it had to do with the `BulkScorer` specialization or the
algorithm, but it didn't help.

To work properly, I had to add a rewrite rule to inline disjunctions in a MUST
clause.

As a next step, it would be interesting to see if we can further optimize this
by loading the filter into a bitset and applying it like live docs.
@jpountz jpountz added this to the 10.1.0 milestone Nov 22, 2024
@jpountz jpountz marked this pull request as draft November 25, 2024 11:03
@jpountz jpountz marked this pull request as ready for review November 26, 2024 15:15
@jpountz
Copy link
Contributor Author

jpountz commented Nov 26, 2024

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             FilteredOrStopWords       49.95      (3.1%)       44.73      (1.9%)  -10.5% ( -14% -   -5%) 0.000
                       CountTerm     8973.48      (4.5%)     8705.79      (4.3%)   -3.0% ( -11% -    6%) 0.032
                    FilteredTerm      158.70      (2.4%)      156.76      (2.1%)   -1.2% (  -5% -    3%) 0.090
                 CountAndHighMed      170.30      (1.4%)      168.95      (1.3%)   -0.8% (  -3% -    1%) 0.066
                      OrHighHigh       52.77      (1.8%)       52.45      (1.9%)   -0.6% (  -4% -    3%) 0.306
     FilteredAnd2Terms2StopWords      196.53      (1.6%)      195.55      (2.2%)   -0.5% (  -4% -    3%) 0.416
                       OrHighMed      195.81      (1.5%)      195.02      (2.1%)   -0.4% (  -3% -    3%) 0.475
                        PKLookup      277.48      (1.5%)      276.36      (2.2%)   -0.4% (  -4% -    3%) 0.499
               FilteredAnd3Terms      190.97      (2.0%)      190.31      (2.1%)   -0.3% (  -4% -    3%) 0.591
             FilteredAndHighHigh       62.42      (2.1%)       62.22      (1.9%)   -0.3% (  -4% -    3%) 0.611
                CountAndHighHigh       57.69      (1.0%)       57.51      (1.0%)   -0.3% (  -2% -    1%) 0.291
                 CountOrHighHigh       75.30      (1.2%)       75.07      (1.1%)   -0.3% (  -2% -    2%) 0.422
              Or2Terms2StopWords      161.53      (4.6%)      161.07      (5.1%)   -0.3% (  -9% -    9%) 0.851
                    CombinedTerm       34.23      (1.1%)       34.14      (1.7%)   -0.3% (  -2% -    2%) 0.521
                        Or3Terms      169.51      (4.9%)      169.11      (4.7%)   -0.2% (  -9% -    9%) 0.877
             CombinedAndHighHigh       15.87      (1.0%)       15.84      (0.9%)   -0.2% (  -2% -    1%) 0.449
            FilteredAndStopWords       48.57      (2.2%)       48.46      (2.1%)   -0.2% (  -4% -    4%) 0.742
              FilteredAndHighMed      125.62      (3.0%)      125.37      (2.7%)   -0.2% (  -5% -    5%) 0.825
                      AndHighMed      122.31      (1.4%)      122.16      (1.3%)   -0.1% (  -2% -    2%) 0.774
              CombinedAndHighMed       58.05      (0.9%)       57.99      (0.9%)   -0.1% (  -1% -    1%) 0.725
               CombinedOrHighMed       78.59      (1.9%)       78.51      (2.1%)   -0.1% (  -4% -    4%) 0.881
              CombinedOrHighHigh       20.79      (1.8%)       20.78      (2.3%)   -0.0% (  -4% -    4%) 0.964
                  CountOrHighMed      142.41      (1.5%)      142.43      (1.3%)    0.0% (  -2% -    2%) 0.978
                     OrStopWords       32.66      (7.6%)       32.69      (7.7%)    0.1% ( -14% -   16%) 0.969
                     AndHighHigh       41.61      (1.5%)       41.65      (1.4%)    0.1% (  -2% -    2%) 0.825
                       And3Terms      168.26      (4.1%)      168.44      (4.2%)    0.1% (  -7% -    8%) 0.934
                    AndStopWords       29.79      (6.1%)       29.83      (6.1%)    0.1% ( -11% -   13%) 0.942
             And2Terms2StopWords      158.60      (3.9%)      159.00      (4.1%)    0.3% (  -7% -    8%) 0.840
                          OrMany       19.30      (5.3%)       19.37      (5.6%)    0.4% (  -9% -   11%) 0.835
                  FilteredPhrase       25.46      (2.6%)       25.55      (2.5%)    0.4% (  -4% -    5%) 0.640
                      OrHighRare      278.01      (4.2%)      279.32      (5.3%)    0.5% (  -8% -   10%) 0.754
      FilteredOr2Terms2StopWords      149.54      (2.3%)      150.37      (1.7%)    0.6% (  -3% -    4%) 0.380
              FilteredOrHighHigh       64.54      (3.3%)       66.22      (1.7%)    2.6% (  -2% -    7%) 0.002
                     CountPhrase        4.30      (4.6%)        4.43      (2.4%)    3.0% (  -3% -   10%) 0.009
                FilteredOr3Terms      151.25      (2.8%)      168.59      (1.6%)   11.5% (   6% -   16%) 0.000
               FilteredOrHighMed      137.64      (2.9%)      156.84      (1.3%)   13.9% (   9% -   18%) 0.000
                  FilteredOrMany       12.50      (2.5%)       16.94      (3.9%)   35.6% (  28% -   42%) 0.000

Filtered stop words are slower but other queries are faster.

@jpountz jpountz merged commit 98c59a7 into apache:main Nov 27, 2024
3 checks passed
@jpountz jpountz deleted the filtered_maxscore branch November 27, 2024 20:56
jpountz added a commit that referenced this pull request Nov 27, 2024
Running filtered disjunctions with a specialized bulk scorer seems to yield a
good speedup. For what it's worth, I also tried to implement a MAXSCORE-based
scorer to see if it had to do with the `BulkScorer` specialization or the
algorithm, but it didn't help.

To work properly, I had to add a rewrite rule to inline disjunctions in a MUST
clause.

As a next step, it would be interesting to see if we can further optimize this
by loading the filter into a bitset and applying it like live docs.
benchaplin pushed a commit to benchaplin/lucene that referenced this pull request Dec 31, 2024
Running filtered disjunctions with a specialized bulk scorer seems to yield a
good speedup. For what it's worth, I also tried to implement a MAXSCORE-based
scorer to see if it had to do with the `BulkScorer` specialization or the
algorithm, but it didn't help.

To work properly, I had to add a rewrite rule to inline disjunctions in a MUST
clause.

As a next step, it would be interesting to see if we can further optimize this
by loading the filter into a bitset and applying it like live docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant