Introduce a BulkScorer for DisjunctionMaxQuery. #14040

jpountz · 2024-12-04T17:35:41Z

This introduces a bulk scorer for DisjunctionMaxQuery that delegates to the bulk scorers of the query clauses. This helps make the performance of top-level DisjunctionMaxQuery better, especially when its clauses have optimized bulk scorers themselves (e.g. disjunctions).

This introduces a bulk scorer for `DisjunctionMaxQuery` that delegates to the bulk scorers of the query clauses. This helps make the performance of top-level `DisjunctionMaxQuery` better, especially when its clauses have optimized bulk scorers themselves (e.g. disjunctions).

jpountz · 2024-12-04T17:37:32Z

This is already covered test-wise by existing tests, and QueryUtils checks in particular, which compare hits of the scorer and the bulk scorer. Here are benchmark results on wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      111.27     (14.2%)      103.85      (3.0%)   -6.7% ( -20% -   12%) 0.262
                     OrStopWords       34.54      (5.6%)       33.26      (6.9%)   -3.7% ( -15% -    9%) 0.302
              Or2Terms2StopWords      164.60      (2.5%)      160.43      (4.7%)   -2.5% (  -9% -    4%) 0.245
                        Or3Terms      174.76      (3.0%)      170.69      (5.2%)   -2.3% ( -10% -    6%) 0.342
                    AndStopWords       32.23      (2.2%)       31.62      (5.5%)   -1.9% (  -9% -    5%) 0.437
                       OrHighMed      194.53      (4.6%)      191.03      (4.2%)   -1.8% ( -10% -    7%) 0.479
                 CountOrHighHigh       76.13      (2.6%)       74.80      (0.8%)   -1.7% (  -5% -    1%) 0.116
                          OrMany       20.03      (2.8%)       19.68      (3.4%)   -1.7% (  -7% -    4%) 0.331
                      OrHighHigh       53.25      (5.8%)       52.45      (5.1%)   -1.5% ( -11% -   10%) 0.634
              FilteredAndHighMed      131.39      (1.9%)      129.50      (3.4%)   -1.4% (  -6% -    3%) 0.364
                  CountOrHighMed      141.00      (2.0%)      139.22      (1.2%)   -1.3% (  -4% -    1%) 0.178
                       And3Terms      180.03      (1.9%)      177.88      (4.3%)   -1.2% (  -7% -    5%) 0.534
             FilteredOrStopWords       43.38      (3.2%)       42.89      (2.3%)   -1.1% (  -6% -    4%) 0.488
             And2Terms2StopWords      165.05      (2.0%)      163.33      (3.2%)   -1.0% (  -6% -    4%) 0.504
            FilteredAndStopWords       48.51      (1.0%)       48.02      (2.8%)   -1.0% (  -4% -    2%) 0.410
                          Fuzzy1       80.86      (1.8%)       80.04      (2.4%)   -1.0% (  -5% -    3%) 0.411
                      OrHighRare      254.81     (11.8%)      252.51      (7.3%)   -0.9% ( -17% -   20%) 0.874
                AndMedOrHighHigh       59.37      (1.1%)       58.90      (1.2%)   -0.8% (  -3% -    1%) 0.234
             FilteredAndHighHigh       63.22      (1.1%)       62.73      (2.4%)   -0.8% (  -4% -    2%) 0.476
                    FilteredTerm      154.17      (1.8%)      152.99      (1.8%)   -0.8% (  -4% -    2%) 0.462
                 CountAndHighMed      163.64      (1.3%)      162.54      (2.4%)   -0.7% (  -4% -    3%) 0.549
              FilteredOrHighHigh       64.21      (2.5%)       63.82      (1.6%)   -0.6% (  -4% -    3%) 0.620
      FilteredOr2Terms2StopWords      147.69      (1.6%)      146.84      (0.7%)   -0.6% (  -2% -    1%) 0.419
                FilteredOr3Terms      166.33      (0.9%)      165.48      (1.3%)   -0.5% (  -2% -    1%) 0.429
               FilteredAnd3Terms      194.98      (2.0%)      194.00      (2.5%)   -0.5% (  -4% -    3%) 0.696
                CountAndHighHigh       56.16      (1.3%)       55.90      (2.0%)   -0.5% (  -3% -    2%) 0.622
                          Fuzzy2       75.92      (1.4%)       75.60      (2.3%)   -0.4% (  -4% -    3%) 0.705
               FilteredOrHighMed      153.77      (0.9%)      153.24      (0.9%)   -0.3% (  -2% -    1%) 0.505
                  FilteredPhrase       30.45      (1.8%)       30.36      (1.7%)   -0.3% (  -3% -    3%) 0.765
                        Wildcard       73.69      (3.7%)       73.50      (2.5%)   -0.3% (  -6% -    6%) 0.886
                     AndHighHigh       45.11      (2.1%)       45.01      (1.8%)   -0.2% (  -4% -    3%) 0.855
     FilteredAnd2Terms2StopWords      197.92      (0.6%)      197.62      (1.8%)   -0.2% (  -2% -    2%) 0.844
                        PKLookup      277.76      (2.5%)      277.39      (2.6%)   -0.1% (  -5% -    5%) 0.930
                         Prefix3      129.81      (5.4%)      129.74      (5.7%)   -0.1% ( -10% -   11%) 0.987
                      AndHighMed      131.45      (1.5%)      131.38      (1.5%)   -0.1% (  -2% -    2%) 0.952
               TermDayOfYearSort      632.23      (4.4%)      633.20      (5.2%)    0.2% (  -9% -   10%) 0.956
                  FilteredOrMany       16.79      (5.1%)       16.84      (3.7%)    0.3% (  -8% -    9%) 0.896
                   TermMonthSort     3401.24      (2.0%)     3420.11      (1.9%)    0.6% (  -3% -    4%) 0.621
                    TermGroup100       24.50      (4.5%)       24.68      (3.9%)    0.7% (  -7% -    9%) 0.758
                  TermBGroup1M1P       37.98      (3.9%)       38.34      (2.9%)    1.0% (  -5% -    8%) 0.629
                    TermGroup10K       20.11      (3.8%)       20.31      (2.5%)    1.0% (  -5% -    7%) 0.580
                      TermDTSort      282.59      (4.3%)      285.66      (6.2%)    1.1% (  -9% -   12%) 0.724
                            Term      457.08      (6.7%)      462.07      (4.1%)    1.1% (  -9% -   12%) 0.734
                      DismaxTerm      588.60      (3.6%)      595.02      (2.0%)    1.1% (  -4% -    6%) 0.511
                     TermGroup1M       19.72      (3.7%)       19.98      (3.2%)    1.3% (  -5% -    8%) 0.514
                       CountTerm     8714.03      (1.8%)     8858.62      (4.3%)    1.7% (  -4% -    7%) 0.383
                    TermBGroup1M       24.78      (4.1%)       25.26      (3.6%)    2.0% (  -5% -   10%) 0.383
                     CountPhrase        4.29      (2.8%)        4.37      (2.4%)    2.1% (  -3% -    7%) 0.167
                   TermTitleSort      149.64      (6.7%)      152.89      (1.2%)    2.2% (  -5% -   10%) 0.436
              CombinedAndHighMed       54.55      (2.6%)       55.77      (1.8%)    2.2% (  -2% -    6%) 0.082
             CombinedAndHighHigh       15.02      (2.7%)       15.36      (2.2%)    2.3% (  -2% -    7%) 0.108
                 AndHighOrMedMed       44.43      (2.9%)       45.82      (1.4%)    3.1% (  -1% -    7%) 0.020
                    CombinedTerm       29.19      (2.4%)       30.45      (1.2%)    4.3% (   0% -    8%) 0.000
                          Phrase       15.34      (5.7%)       16.01      (3.6%)    4.4% (  -4% -   14%) 0.107
              CombinedOrHighHigh       18.48      (3.5%)       19.32      (2.0%)    4.6% (   0% -   10%) 0.006
               CombinedOrHighMed       70.19      (3.0%)       73.51      (1.9%)    4.7% (   0% -    9%) 0.001
                DismaxOrHighHigh       67.52      (6.0%)      114.35      (3.7%)   69.3% (  56% -   84%) 0.000
                 DismaxOrHighMed       84.17      (4.3%)      166.95      (4.5%)   98.4% (  85% -  111%) 0.000

This introduces a bulk scorer for `DisjunctionMaxQuery` that delegates to the bulk scorers of the query clauses. This helps make the performance of top-level `DisjunctionMaxQuery` better, especially when its clauses have optimized bulk scorers themselves (e.g. disjunctions).

jpountz added this to the 10.1.0 milestone Dec 4, 2024

jpountz added 2 commits December 6, 2024 10:44

Merge branch 'main' into dismax_bulk_scorer

2a9753b

CHANGES

4df90a8

jpountz merged commit c88f933 into apache:main Dec 6, 2024
3 checks passed

jpountz deleted the dismax_bulk_scorer branch December 6, 2024 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a BulkScorer for DisjunctionMaxQuery. #14040

Introduce a BulkScorer for DisjunctionMaxQuery. #14040

jpountz commented Dec 4, 2024

jpountz commented Dec 4, 2024

Introduce a BulkScorer for DisjunctionMaxQuery. #14040

Introduce a BulkScorer for DisjunctionMaxQuery. #14040

Conversation

jpountz commented Dec 4, 2024

jpountz commented Dec 4, 2024