Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up PostingsEnum when reading positions. #14032

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 2, 2024

This PR changes the following:

  • As much work as possible is moved from nextDoc()/advance() to nextPosition(). This helps only pay the overhead of reading positions when all query terms agree on a candidate.
  • Frequencies are read lazily. Again, this helps in case a document is needed in a block, but clauses do not agree on a common candidate match, so frequencies are never decoded.
  • A few other minor optimizations.

This PR changes the following:
 - As much work as possible is moved from `nextDoc()`/`advance()` to
   `nextPosition()`. This helps only pay the overhead of reading positions when
   all query terms agree on a candidate.
 - Frequencies are read lazily. Again, this helps in case a document is needed
   in a block, but clauses do not agree on a common candidate match, so
   frequencies are never decoded.
 - A few other minor optimizations.
@jpountz jpountz added this to the 10.1.0 milestone Dec 2, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Dec 2, 2024

                        TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      IntNRQ      110.78     (11.4%)      108.66     (13.0%)   -1.9% ( -23% -   25%) 0.739
                 CountPhrase        4.37      (1.6%)        4.30      (1.2%)   -1.7% (  -4% -    1%) 0.009
                      OrMany       20.20      (3.6%)       19.88      (4.2%)   -1.6% (  -9% -    6%) 0.381
             DismaxOrHighMed       85.25      (1.7%)       84.24      (3.9%)   -1.2% (  -6% -    4%) 0.403
            DismaxOrHighHigh       69.09      (2.0%)       68.36      (3.6%)   -1.1% (  -6% -    4%) 0.435
                    SpanNear        2.17     (12.4%)        2.14     (12.0%)   -1.0% ( -22% -   26%) 0.855
                   CountTerm     8796.37      (5.0%)     8709.80      (2.8%)   -1.0% (  -8% -    7%) 0.605
                 OrStopWords       34.15      (8.2%)       33.89      (5.8%)   -0.8% ( -13% -   14%) 0.820
           FilteredAnd3Terms      196.11      (2.3%)      194.82      (1.9%)   -0.7% (  -4% -    3%) 0.503
                    Or3Terms      175.20      (5.2%)      174.24      (3.8%)   -0.6% (  -9% -    8%) 0.797
              FilteredOrMany       16.97      (5.4%)       16.88      (5.0%)   -0.5% ( -10% -   10%) 0.834
          Or2Terms2StopWords      165.63      (4.6%)      164.83      (3.5%)   -0.5% (  -8% -    7%) 0.801
                      Fuzzy1       81.21      (1.6%)       80.83      (1.5%)   -0.5% (  -3% -    2%) 0.529
                SloppyPhrase        1.78     (10.5%)        1.77      (3.6%)   -0.4% ( -13% -   15%) 0.907
 FilteredAnd2Terms2StopWords      201.36      (1.9%)      200.55      (1.3%)   -0.4% (  -3% -    2%) 0.599
          FilteredAndHighMed      130.30      (2.7%)      129.81      (2.6%)   -0.4% (  -5% -    5%) 0.764
                      Fuzzy2       76.58      (1.4%)       76.29      (1.2%)   -0.4% (  -2% -    2%) 0.536
                FilteredTerm      156.92      (2.5%)      156.39      (2.1%)   -0.3% (  -4% -    4%) 0.757
                   And3Terms      179.52      (3.2%)      179.00      (3.5%)   -0.3% (  -6% -    6%) 0.855
         FilteredAndHighHigh       63.65      (2.9%)       63.47      (2.0%)   -0.3% (  -5% -    4%) 0.813
        FilteredAndStopWords       48.99      (2.9%)       48.86      (1.9%)   -0.3% (  -4% -    4%) 0.828
         And2Terms2StopWords      167.29      (3.1%)      166.94      (3.3%)   -0.2% (  -6% -    6%) 0.887
           TermDayOfYearSort      631.10      (1.3%)      630.31      (1.2%)   -0.1% (  -2% -    2%) 0.835
            FilteredOr3Terms      168.75      (1.2%)      168.59      (1.4%)   -0.1% (  -2% -    2%) 0.879
             CountAndHighMed      169.49      (1.2%)      169.42      (1.3%)   -0.0% (  -2% -    2%) 0.948
  FilteredOr2Terms2StopWords      151.07      (1.0%)      151.02      (1.2%)   -0.0% (  -2% -    2%) 0.948
                AndStopWords       32.08      (5.1%)       32.07      (5.0%)   -0.0% (  -9% -   10%) 0.993
           FilteredOrHighMed      157.50      (1.1%)      157.49      (0.6%)   -0.0% (  -1% -    1%) 0.992
            CountAndHighHigh       57.71      (1.1%)       57.71      (0.8%)    0.0% (  -1% -    1%) 1.000
                    PKLookup      282.75      (1.9%)      282.86      (1.4%)    0.0% (  -3% -    3%) 0.958
                     Respell       54.63      (1.1%)       54.70      (1.5%)    0.1% (  -2% -    2%) 0.843
                TermGroup100       24.50      (4.5%)       24.54      (3.8%)    0.2% (  -7% -    8%) 0.929
              CountOrHighMed      141.72      (1.9%)      142.10      (1.1%)    0.3% (  -2% -    3%) 0.717
         FilteredOrStopWords       44.42      (1.9%)       44.56      (2.3%)    0.3% (  -3% -    4%) 0.759
             AndHighOrMedMed       43.17      (2.5%)       43.31      (0.7%)    0.3% (  -2% -    3%) 0.720
             CountOrHighHigh       74.88      (1.6%)       75.12      (1.3%)    0.3% (  -2% -    3%) 0.653
           CombinedOrHighMed       70.87      (5.8%)       71.13      (3.7%)    0.4% (  -8% -   10%) 0.873
               TermMonthSort     3404.71      (2.0%)     3418.15      (2.6%)    0.4% (  -4% -    5%) 0.718
          CombinedOrHighHigh       18.54      (5.9%)       18.64      (3.7%)    0.5% (  -8% -   10%) 0.821
               TermTitleSort      159.32      (3.5%)      160.30      (2.9%)    0.6% (  -5% -    7%) 0.686
                   OrHighMed      196.12      (5.6%)      197.40      (4.9%)    0.7% (  -9% -   11%) 0.794
            IntervalsOrdered        2.37      (2.7%)        2.38      (3.2%)    0.7% (  -5% -    6%) 0.627
          FilteredOrHighHigh       65.75      (1.7%)       66.20      (1.5%)    0.7% (  -2% -    3%) 0.365
                TermGroup10K       20.11      (3.6%)       20.25      (3.5%)    0.7% (  -6% -    8%) 0.682
                  DismaxTerm      605.77      (2.9%)      610.10      (2.8%)    0.7% (  -4% -    6%) 0.598
                 TermGroup1M       19.82      (3.1%)       19.97      (3.5%)    0.8% (  -5% -    7%) 0.616
            AndMedOrHighHigh       59.49      (1.9%)       59.96      (1.6%)    0.8% (  -2% -    4%) 0.351
                  TermDTSort      274.15      (1.8%)      276.92      (2.3%)    1.0% (  -3% -    5%) 0.309
                    Wildcard       73.73      (4.2%)       74.49      (3.0%)    1.0% (  -5% -    8%) 0.545
          CombinedAndHighMed       54.61      (4.7%)       55.18      (2.5%)    1.0% (  -5% -    8%) 0.558
                  AndHighMed      131.04      (2.8%)      132.48      (1.7%)    1.1% (  -3% -    5%) 0.308
                  OrHighHigh       52.99      (7.0%)       53.63      (6.2%)    1.2% ( -11% -   15%) 0.697
                        Term      470.85      (4.2%)      476.71      (4.3%)    1.2% (  -7% -   10%) 0.537
         CombinedAndHighHigh       14.94      (5.0%)       15.13      (2.5%)    1.2% (  -5% -    9%) 0.505
                  OrHighRare      271.47      (8.0%)      275.00      (7.1%)    1.3% ( -12% -   17%) 0.715
                CombinedTerm       29.64      (5.1%)       30.08      (3.8%)    1.5% (  -7% -   10%) 0.480
                TermBGroup1M       24.87      (3.9%)       25.26      (3.8%)    1.6% (  -5% -    9%) 0.386
                 AndHighHigh       44.95      (3.0%)       45.68      (1.7%)    1.6% (  -3% -    6%) 0.158
                     Prefix3      127.48      (6.7%)      129.68      (7.8%)    1.7% ( -11% -   17%) 0.615
              TermBGroup1M1P       36.78      (4.9%)       37.59      (5.4%)    2.2% (  -7% -   13%) 0.364
                      Phrase       13.78      (2.7%)       14.90      (2.8%)    8.1% (   2% -   13%) 0.000
              FilteredPhrase       26.01      (3.4%)       30.92      (1.6%)   18.9% (  13% -   24%) 0.000

Copy link
Member

@rmuir rmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great to see the frequencies finally lazy-decoded

@jpountz jpountz merged commit b2a10e3 into apache:main Dec 2, 2024
3 checks passed
@jpountz jpountz deleted the speedup_positions branch December 2, 2024 22:26
jpountz added a commit that referenced this pull request Dec 2, 2024
This PR changes the following:
 - As much work as possible is moved from `nextDoc()`/`advance()` to
   `nextPosition()`. This helps only pay the overhead of reading positions when
   all query terms agree on a candidate.
 - Frequencies are read lazily. Again, this helps in case a document is needed
   in a block, but clauses do not agree on a common candidate match, so
   frequencies are never decoded.
 - A few other minor optimizations.
@jpountz
Copy link
Contributor Author

jpountz commented Dec 3, 2024

FilteredPhrase had a good 25% speedup, Phrase had a good 11% speedup, but other phrase queries had a slowdown, e.g. SloppyPhrase (-8%) and CountPhrase (-7%).

benchaplin pushed a commit to benchaplin/lucene that referenced this pull request Dec 31, 2024
This PR changes the following:
 - As much work as possible is moved from `nextDoc()`/`advance()` to
   `nextPosition()`. This helps only pay the overhead of reading positions when
   all query terms agree on a candidate.
 - Frequencies are read lazily. Again, this helps in case a document is needed
   in a block, but clauses do not agree on a common candidate match, so
   frequencies are never decoded.
 - A few other minor optimizations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants