Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a BulkScorer for DisjunctionMaxQuery. #14040

Merged
merged 3 commits into from
Dec 6, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 4, 2024

This introduces a bulk scorer for DisjunctionMaxQuery that delegates to the bulk scorers of the query clauses. This helps make the performance of top-level DisjunctionMaxQuery better, especially when its clauses have optimized bulk scorers themselves (e.g. disjunctions).

This introduces a bulk scorer for `DisjunctionMaxQuery` that delegates to the
bulk scorers of the query clauses. This helps make the performance of top-level
`DisjunctionMaxQuery` better, especially when its clauses have optimized bulk
scorers themselves (e.g. disjunctions).
@jpountz jpountz added this to the 10.1.0 milestone Dec 4, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Dec 4, 2024

This is already covered test-wise by existing tests, and QueryUtils checks in particular, which compare hits of the scorer and the bulk scorer. Here are benchmark results on wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      111.27     (14.2%)      103.85      (3.0%)   -6.7% ( -20% -   12%) 0.262
                     OrStopWords       34.54      (5.6%)       33.26      (6.9%)   -3.7% ( -15% -    9%) 0.302
              Or2Terms2StopWords      164.60      (2.5%)      160.43      (4.7%)   -2.5% (  -9% -    4%) 0.245
                        Or3Terms      174.76      (3.0%)      170.69      (5.2%)   -2.3% ( -10% -    6%) 0.342
                    AndStopWords       32.23      (2.2%)       31.62      (5.5%)   -1.9% (  -9% -    5%) 0.437
                       OrHighMed      194.53      (4.6%)      191.03      (4.2%)   -1.8% ( -10% -    7%) 0.479
                 CountOrHighHigh       76.13      (2.6%)       74.80      (0.8%)   -1.7% (  -5% -    1%) 0.116
                          OrMany       20.03      (2.8%)       19.68      (3.4%)   -1.7% (  -7% -    4%) 0.331
                      OrHighHigh       53.25      (5.8%)       52.45      (5.1%)   -1.5% ( -11% -   10%) 0.634
              FilteredAndHighMed      131.39      (1.9%)      129.50      (3.4%)   -1.4% (  -6% -    3%) 0.364
                  CountOrHighMed      141.00      (2.0%)      139.22      (1.2%)   -1.3% (  -4% -    1%) 0.178
                       And3Terms      180.03      (1.9%)      177.88      (4.3%)   -1.2% (  -7% -    5%) 0.534
             FilteredOrStopWords       43.38      (3.2%)       42.89      (2.3%)   -1.1% (  -6% -    4%) 0.488
             And2Terms2StopWords      165.05      (2.0%)      163.33      (3.2%)   -1.0% (  -6% -    4%) 0.504
            FilteredAndStopWords       48.51      (1.0%)       48.02      (2.8%)   -1.0% (  -4% -    2%) 0.410
                          Fuzzy1       80.86      (1.8%)       80.04      (2.4%)   -1.0% (  -5% -    3%) 0.411
                      OrHighRare      254.81     (11.8%)      252.51      (7.3%)   -0.9% ( -17% -   20%) 0.874
                AndMedOrHighHigh       59.37      (1.1%)       58.90      (1.2%)   -0.8% (  -3% -    1%) 0.234
             FilteredAndHighHigh       63.22      (1.1%)       62.73      (2.4%)   -0.8% (  -4% -    2%) 0.476
                    FilteredTerm      154.17      (1.8%)      152.99      (1.8%)   -0.8% (  -4% -    2%) 0.462
                 CountAndHighMed      163.64      (1.3%)      162.54      (2.4%)   -0.7% (  -4% -    3%) 0.549
              FilteredOrHighHigh       64.21      (2.5%)       63.82      (1.6%)   -0.6% (  -4% -    3%) 0.620
      FilteredOr2Terms2StopWords      147.69      (1.6%)      146.84      (0.7%)   -0.6% (  -2% -    1%) 0.419
                FilteredOr3Terms      166.33      (0.9%)      165.48      (1.3%)   -0.5% (  -2% -    1%) 0.429
               FilteredAnd3Terms      194.98      (2.0%)      194.00      (2.5%)   -0.5% (  -4% -    3%) 0.696
                CountAndHighHigh       56.16      (1.3%)       55.90      (2.0%)   -0.5% (  -3% -    2%) 0.622
                          Fuzzy2       75.92      (1.4%)       75.60      (2.3%)   -0.4% (  -4% -    3%) 0.705
               FilteredOrHighMed      153.77      (0.9%)      153.24      (0.9%)   -0.3% (  -2% -    1%) 0.505
                  FilteredPhrase       30.45      (1.8%)       30.36      (1.7%)   -0.3% (  -3% -    3%) 0.765
                        Wildcard       73.69      (3.7%)       73.50      (2.5%)   -0.3% (  -6% -    6%) 0.886
                     AndHighHigh       45.11      (2.1%)       45.01      (1.8%)   -0.2% (  -4% -    3%) 0.855
     FilteredAnd2Terms2StopWords      197.92      (0.6%)      197.62      (1.8%)   -0.2% (  -2% -    2%) 0.844
                        PKLookup      277.76      (2.5%)      277.39      (2.6%)   -0.1% (  -5% -    5%) 0.930
                         Prefix3      129.81      (5.4%)      129.74      (5.7%)   -0.1% ( -10% -   11%) 0.987
                      AndHighMed      131.45      (1.5%)      131.38      (1.5%)   -0.1% (  -2% -    2%) 0.952
               TermDayOfYearSort      632.23      (4.4%)      633.20      (5.2%)    0.2% (  -9% -   10%) 0.956
                  FilteredOrMany       16.79      (5.1%)       16.84      (3.7%)    0.3% (  -8% -    9%) 0.896
                   TermMonthSort     3401.24      (2.0%)     3420.11      (1.9%)    0.6% (  -3% -    4%) 0.621
                    TermGroup100       24.50      (4.5%)       24.68      (3.9%)    0.7% (  -7% -    9%) 0.758
                  TermBGroup1M1P       37.98      (3.9%)       38.34      (2.9%)    1.0% (  -5% -    8%) 0.629
                    TermGroup10K       20.11      (3.8%)       20.31      (2.5%)    1.0% (  -5% -    7%) 0.580
                      TermDTSort      282.59      (4.3%)      285.66      (6.2%)    1.1% (  -9% -   12%) 0.724
                            Term      457.08      (6.7%)      462.07      (4.1%)    1.1% (  -9% -   12%) 0.734
                      DismaxTerm      588.60      (3.6%)      595.02      (2.0%)    1.1% (  -4% -    6%) 0.511
                     TermGroup1M       19.72      (3.7%)       19.98      (3.2%)    1.3% (  -5% -    8%) 0.514
                       CountTerm     8714.03      (1.8%)     8858.62      (4.3%)    1.7% (  -4% -    7%) 0.383
                    TermBGroup1M       24.78      (4.1%)       25.26      (3.6%)    2.0% (  -5% -   10%) 0.383
                     CountPhrase        4.29      (2.8%)        4.37      (2.4%)    2.1% (  -3% -    7%) 0.167
                   TermTitleSort      149.64      (6.7%)      152.89      (1.2%)    2.2% (  -5% -   10%) 0.436
              CombinedAndHighMed       54.55      (2.6%)       55.77      (1.8%)    2.2% (  -2% -    6%) 0.082
             CombinedAndHighHigh       15.02      (2.7%)       15.36      (2.2%)    2.3% (  -2% -    7%) 0.108
                 AndHighOrMedMed       44.43      (2.9%)       45.82      (1.4%)    3.1% (  -1% -    7%) 0.020
                    CombinedTerm       29.19      (2.4%)       30.45      (1.2%)    4.3% (   0% -    8%) 0.000
                          Phrase       15.34      (5.7%)       16.01      (3.6%)    4.4% (  -4% -   14%) 0.107
              CombinedOrHighHigh       18.48      (3.5%)       19.32      (2.0%)    4.6% (   0% -   10%) 0.006
               CombinedOrHighMed       70.19      (3.0%)       73.51      (1.9%)    4.7% (   0% -    9%) 0.001
                DismaxOrHighHigh       67.52      (6.0%)      114.35      (3.7%)   69.3% (  56% -   84%) 0.000
                 DismaxOrHighMed       84.17      (4.3%)      166.95      (4.5%)   98.4% (  85% -  111%) 0.000

@jpountz jpountz merged commit c88f933 into apache:main Dec 6, 2024
3 checks passed
@jpountz jpountz deleted the dismax_bulk_scorer branch December 6, 2024 10:01
jpountz added a commit that referenced this pull request Dec 9, 2024
This introduces a bulk scorer for `DisjunctionMaxQuery` that delegates to the
bulk scorers of the query clauses. This helps make the performance of top-level
`DisjunctionMaxQuery` better, especially when its clauses have optimized bulk
scorers themselves (e.g. disjunctions).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant