Speed up PostingsEnum when reading positions. #14032

jpountz · 2024-12-02T21:42:14Z

This PR changes the following:

As much work as possible is moved from nextDoc()/advance() to nextPosition(). This helps only pay the overhead of reading positions when all query terms agree on a candidate.
Frequencies are read lazily. Again, this helps in case a document is needed in a block, but clauses do not agree on a common candidate match, so frequencies are never decoded.
A few other minor optimizations.

This PR changes the following: - As much work as possible is moved from `nextDoc()`/`advance()` to `nextPosition()`. This helps only pay the overhead of reading positions when all query terms agree on a candidate. - Frequencies are read lazily. Again, this helps in case a document is needed in a block, but clauses do not agree on a common candidate match, so frequencies are never decoded. - A few other minor optimizations.

jpountz · 2024-12-02T21:42:40Z

                        TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      IntNRQ      110.78     (11.4%)      108.66     (13.0%)   -1.9% ( -23% -   25%) 0.739
                 CountPhrase        4.37      (1.6%)        4.30      (1.2%)   -1.7% (  -4% -    1%) 0.009
                      OrMany       20.20      (3.6%)       19.88      (4.2%)   -1.6% (  -9% -    6%) 0.381
             DismaxOrHighMed       85.25      (1.7%)       84.24      (3.9%)   -1.2% (  -6% -    4%) 0.403
            DismaxOrHighHigh       69.09      (2.0%)       68.36      (3.6%)   -1.1% (  -6% -    4%) 0.435
                    SpanNear        2.17     (12.4%)        2.14     (12.0%)   -1.0% ( -22% -   26%) 0.855
                   CountTerm     8796.37      (5.0%)     8709.80      (2.8%)   -1.0% (  -8% -    7%) 0.605
                 OrStopWords       34.15      (8.2%)       33.89      (5.8%)   -0.8% ( -13% -   14%) 0.820
           FilteredAnd3Terms      196.11      (2.3%)      194.82      (1.9%)   -0.7% (  -4% -    3%) 0.503
                    Or3Terms      175.20      (5.2%)      174.24      (3.8%)   -0.6% (  -9% -    8%) 0.797
              FilteredOrMany       16.97      (5.4%)       16.88      (5.0%)   -0.5% ( -10% -   10%) 0.834
          Or2Terms2StopWords      165.63      (4.6%)      164.83      (3.5%)   -0.5% (  -8% -    7%) 0.801
                      Fuzzy1       81.21      (1.6%)       80.83      (1.5%)   -0.5% (  -3% -    2%) 0.529
                SloppyPhrase        1.78     (10.5%)        1.77      (3.6%)   -0.4% ( -13% -   15%) 0.907
 FilteredAnd2Terms2StopWords      201.36      (1.9%)      200.55      (1.3%)   -0.4% (  -3% -    2%) 0.599
          FilteredAndHighMed      130.30      (2.7%)      129.81      (2.6%)   -0.4% (  -5% -    5%) 0.764
                      Fuzzy2       76.58      (1.4%)       76.29      (1.2%)   -0.4% (  -2% -    2%) 0.536
                FilteredTerm      156.92      (2.5%)      156.39      (2.1%)   -0.3% (  -4% -    4%) 0.757
                   And3Terms      179.52      (3.2%)      179.00      (3.5%)   -0.3% (  -6% -    6%) 0.855
         FilteredAndHighHigh       63.65      (2.9%)       63.47      (2.0%)   -0.3% (  -5% -    4%) 0.813
        FilteredAndStopWords       48.99      (2.9%)       48.86      (1.9%)   -0.3% (  -4% -    4%) 0.828
         And2Terms2StopWords      167.29      (3.1%)      166.94      (3.3%)   -0.2% (  -6% -    6%) 0.887
           TermDayOfYearSort      631.10      (1.3%)      630.31      (1.2%)   -0.1% (  -2% -    2%) 0.835
            FilteredOr3Terms      168.75      (1.2%)      168.59      (1.4%)   -0.1% (  -2% -    2%) 0.879
             CountAndHighMed      169.49      (1.2%)      169.42      (1.3%)   -0.0% (  -2% -    2%) 0.948
  FilteredOr2Terms2StopWords      151.07      (1.0%)      151.02      (1.2%)   -0.0% (  -2% -    2%) 0.948
                AndStopWords       32.08      (5.1%)       32.07      (5.0%)   -0.0% (  -9% -   10%) 0.993
           FilteredOrHighMed      157.50      (1.1%)      157.49      (0.6%)   -0.0% (  -1% -    1%) 0.992
            CountAndHighHigh       57.71      (1.1%)       57.71      (0.8%)    0.0% (  -1% -    1%) 1.000
                    PKLookup      282.75      (1.9%)      282.86      (1.4%)    0.0% (  -3% -    3%) 0.958
                     Respell       54.63      (1.1%)       54.70      (1.5%)    0.1% (  -2% -    2%) 0.843
                TermGroup100       24.50      (4.5%)       24.54      (3.8%)    0.2% (  -7% -    8%) 0.929
              CountOrHighMed      141.72      (1.9%)      142.10      (1.1%)    0.3% (  -2% -    3%) 0.717
         FilteredOrStopWords       44.42      (1.9%)       44.56      (2.3%)    0.3% (  -3% -    4%) 0.759
             AndHighOrMedMed       43.17      (2.5%)       43.31      (0.7%)    0.3% (  -2% -    3%) 0.720
             CountOrHighHigh       74.88      (1.6%)       75.12      (1.3%)    0.3% (  -2% -    3%) 0.653
           CombinedOrHighMed       70.87      (5.8%)       71.13      (3.7%)    0.4% (  -8% -   10%) 0.873
               TermMonthSort     3404.71      (2.0%)     3418.15      (2.6%)    0.4% (  -4% -    5%) 0.718
          CombinedOrHighHigh       18.54      (5.9%)       18.64      (3.7%)    0.5% (  -8% -   10%) 0.821
               TermTitleSort      159.32      (3.5%)      160.30      (2.9%)    0.6% (  -5% -    7%) 0.686
                   OrHighMed      196.12      (5.6%)      197.40      (4.9%)    0.7% (  -9% -   11%) 0.794
            IntervalsOrdered        2.37      (2.7%)        2.38      (3.2%)    0.7% (  -5% -    6%) 0.627
          FilteredOrHighHigh       65.75      (1.7%)       66.20      (1.5%)    0.7% (  -2% -    3%) 0.365
                TermGroup10K       20.11      (3.6%)       20.25      (3.5%)    0.7% (  -6% -    8%) 0.682
                  DismaxTerm      605.77      (2.9%)      610.10      (2.8%)    0.7% (  -4% -    6%) 0.598
                 TermGroup1M       19.82      (3.1%)       19.97      (3.5%)    0.8% (  -5% -    7%) 0.616
            AndMedOrHighHigh       59.49      (1.9%)       59.96      (1.6%)    0.8% (  -2% -    4%) 0.351
                  TermDTSort      274.15      (1.8%)      276.92      (2.3%)    1.0% (  -3% -    5%) 0.309
                    Wildcard       73.73      (4.2%)       74.49      (3.0%)    1.0% (  -5% -    8%) 0.545
          CombinedAndHighMed       54.61      (4.7%)       55.18      (2.5%)    1.0% (  -5% -    8%) 0.558
                  AndHighMed      131.04      (2.8%)      132.48      (1.7%)    1.1% (  -3% -    5%) 0.308
                  OrHighHigh       52.99      (7.0%)       53.63      (6.2%)    1.2% ( -11% -   15%) 0.697
                        Term      470.85      (4.2%)      476.71      (4.3%)    1.2% (  -7% -   10%) 0.537
         CombinedAndHighHigh       14.94      (5.0%)       15.13      (2.5%)    1.2% (  -5% -    9%) 0.505
                  OrHighRare      271.47      (8.0%)      275.00      (7.1%)    1.3% ( -12% -   17%) 0.715
                CombinedTerm       29.64      (5.1%)       30.08      (3.8%)    1.5% (  -7% -   10%) 0.480
                TermBGroup1M       24.87      (3.9%)       25.26      (3.8%)    1.6% (  -5% -    9%) 0.386
                 AndHighHigh       44.95      (3.0%)       45.68      (1.7%)    1.6% (  -3% -    6%) 0.158
                     Prefix3      127.48      (6.7%)      129.68      (7.8%)    1.7% ( -11% -   17%) 0.615
              TermBGroup1M1P       36.78      (4.9%)       37.59      (5.4%)    2.2% (  -7% -   13%) 0.364
                      Phrase       13.78      (2.7%)       14.90      (2.8%)    8.1% (   2% -   13%) 0.000
              FilteredPhrase       26.01      (3.4%)       30.92      (1.6%)   18.9% (  13% -   24%) 0.000

rmuir

great to see the frequencies finally lazy-decoded

This PR changes the following: - As much work as possible is moved from `nextDoc()`/`advance()` to `nextPosition()`. This helps only pay the overhead of reading positions when all query terms agree on a candidate. - Frequencies are read lazily. Again, this helps in case a document is needed in a block, but clauses do not agree on a common candidate match, so frequencies are never decoded. - A few other minor optimizations.

jpountz · 2024-12-03T13:30:29Z

FilteredPhrase had a good 25% speedup, Phrase had a good 11% speedup, but other phrase queries had a slowdown, e.g. SloppyPhrase (-8%) and CountPhrase (-7%).

This PR changes the following: - As much work as possible is moved from `nextDoc()`/`advance()` to `nextPosition()`. This helps only pay the overhead of reading positions when all query terms agree on a candidate. - Frequencies are read lazily. Again, this helps in case a document is needed in a block, but clauses do not agree on a common candidate match, so frequencies are never decoded. - A few other minor optimizations.

jpountz added this to the 10.1.0 milestone Dec 2, 2024

rmuir approved these changes Dec 2, 2024

View reviewed changes

CHANGES

38c29ee

jpountz merged commit b2a10e3 into apache:main Dec 2, 2024
3 checks passed

jpountz deleted the speedup_positions branch December 2, 2024 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up PostingsEnum when reading positions. #14032

Speed up PostingsEnum when reading positions. #14032

jpountz commented Dec 2, 2024

jpountz commented Dec 2, 2024

rmuir left a comment

jpountz commented Dec 3, 2024

Speed up PostingsEnum when reading positions. #14032

Speed up PostingsEnum when reading positions. #14032

Conversation

jpountz commented Dec 2, 2024

jpountz commented Dec 2, 2024

rmuir left a comment

Choose a reason for hiding this comment

jpountz commented Dec 3, 2024