Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte. #592

Merged
merged 20 commits into from
Jan 16, 2024

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Jan 10, 2022

https://issues.apache.org/jira/browse/LUCENE-10366
#11402

Today, we do not rewrite #readVInt and #readVLong for ByteBufferIndexInput. By default, the logic will call #readByte several times, and we need to check whether ByteBuffer is valid every time. This may not be necessary as we just need a final check.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
       BrowseDayOfYearSSDVFacets       16.74     (17.3%)       15.91     (12.3%)   -5.0% ( -29% -   29%) 0.295
            MedTermDayTaxoFacets       27.01      (6.9%)       26.56      (5.9%)   -1.7% ( -13% -   11%) 0.402
                        Wildcard      111.55      (8.1%)      109.67      (7.6%)   -1.7% ( -16% -   15%) 0.499
                         Respell       58.06      (2.6%)       57.20      (2.6%)   -1.5% (  -6% -    3%) 0.074
          OrHighMedDayTaxoFacets        8.91      (4.7%)        8.81      (7.2%)   -1.1% ( -12% -   11%) 0.557
                          Fuzzy1      117.17      (3.8%)      116.14      (3.3%)   -0.9% (  -7% -    6%) 0.437
                          Fuzzy2      103.70      (3.2%)      102.82      (4.3%)   -0.9% (  -8% -    6%) 0.472
            HighIntervalsOrdered       10.11      (7.9%)       10.05      (7.4%)   -0.6% ( -14% -   15%) 0.797
           HighTermDayOfYearSort      183.18      (8.8%)      182.92     (10.8%)   -0.1% ( -18% -   21%) 0.964
        AndHighHighDayTaxoFacets       11.44      (3.8%)       11.43      (3.1%)   -0.1% (  -6% -    7%) 0.936
                         Prefix3      161.90     (13.5%)      161.80     (13.3%)   -0.1% ( -23% -   30%) 0.989
                    HighSpanNear       11.43      (4.8%)       11.45      (4.2%)    0.1% (  -8% -    9%) 0.928
                        PKLookup      220.15      (3.3%)      220.69      (6.2%)    0.2% (  -8% -   10%) 0.874
                     MedSpanNear       92.60      (4.0%)       93.11      (3.7%)    0.5% (  -6% -    8%) 0.656
                      TermDTSort      143.26      (9.0%)      144.14     (10.9%)    0.6% ( -17% -   22%) 0.847
             MedIntervalsOrdered       63.74      (6.6%)       64.21      (6.1%)    0.8% ( -11% -   14%) 0.707
            HighTermTitleBDVSort       99.61      (9.1%)      100.49     (12.4%)    0.9% ( -18% -   24%) 0.796
                     LowSpanNear      126.43      (3.6%)      127.61      (3.2%)    0.9% (  -5% -    8%) 0.383
             LowIntervalsOrdered       12.45      (5.4%)       12.58      (5.2%)    1.0% (  -9% -   12%) 0.535
                         LowTerm     1767.08      (3.7%)     1788.83      (3.1%)    1.2% (  -5% -    8%) 0.257
                HighSloppyPhrase       11.45      (7.0%)       11.61      (7.1%)    1.5% ( -11% -   16%) 0.515
         AndHighMedDayTaxoFacets       69.41      (3.7%)       70.46      (2.8%)    1.5% (  -4% -    8%) 0.147
     BrowseRandomLabelSSDVFacets       10.85      (6.1%)       11.04      (5.1%)    1.7% (  -9% -   13%) 0.342
                         MedTerm     2083.04      (5.3%)     2119.48      (5.7%)    1.7% (  -8% -   13%) 0.316
                 LowSloppyPhrase      148.79      (3.6%)      151.76      (3.2%)    2.0% (  -4% -    9%) 0.062
                      HighPhrase       98.67      (3.4%)      100.80      (3.5%)    2.2% (  -4% -    9%) 0.048
                    OrHighNotLow     1371.31      (7.1%)     1400.91      (7.9%)    2.2% ( -12% -   18%) 0.365
           BrowseMonthTaxoFacets       16.65     (11.6%)       17.03     (13.1%)    2.2% ( -20% -   30%) 0.565
                   OrHighNotHigh     1267.37      (6.8%)     1297.42      (8.9%)    2.4% ( -12% -   19%) 0.344
                 MedSloppyPhrase       39.35      (3.6%)       40.42      (4.2%)    2.7% (  -4% -   10%) 0.028
                   OrNotHighHigh     1190.01      (6.6%)     1224.72      (7.6%)    2.9% ( -10% -   18%) 0.194
                      OrHighHigh       37.72      (4.3%)       39.00      (3.4%)    3.4% (  -4% -   11%) 0.005
                     AndHighHigh       92.46      (4.5%)       95.76      (4.9%)    3.6% (  -5% -   13%) 0.017
                    OrHighNotMed     1231.31      (6.3%)     1275.65      (7.9%)    3.6% (  -9% -   18%) 0.109
                       OrHighMed      174.32      (3.8%)      181.43      (2.9%)    4.1% (  -2% -   11%) 0.000
                      AndHighLow     2761.91     (10.7%)     2885.28     (10.1%)    4.5% ( -14% -   28%) 0.175
                       MedPhrase      214.87      (4.9%)      224.55      (4.8%)    4.5% (  -4% -   14%) 0.003
                       LowPhrase      333.03      (3.8%)      348.43      (3.6%)    4.6% (  -2% -   12%) 0.000
               HighTermMonthSort      159.92      (9.8%)      167.50     (14.8%)    4.7% ( -18% -   32%) 0.232
                    OrNotHighMed      973.50      (6.0%)     1022.10      (6.0%)    5.0% (  -6% -   18%) 0.008
     BrowseRandomLabelTaxoFacets       13.14     (11.1%)       13.83     (14.0%)    5.3% ( -17% -   34%) 0.186
                        HighTerm     1682.54      (7.0%)     1786.66      (7.5%)    6.2% (  -7% -   22%) 0.007
                      AndHighMed      277.66      (3.6%)      295.75      (4.3%)    6.5% (  -1% -   14%) 0.000
            BrowseDateTaxoFacets       14.81     (12.9%)       15.78     (17.1%)    6.6% ( -20% -   42%) 0.170
       BrowseDayOfYearTaxoFacets       14.90     (13.1%)       15.93     (17.4%)    6.9% ( -20% -   43%) 0.158
                    OrNotHighLow     1255.52      (7.0%)     1344.26      (7.1%)    7.1% (  -6% -   22%) 0.002
                       OrHighLow      972.15      (5.3%)     1056.13      (4.8%)    8.6% (  -1% -   19%) 0.000
                          IntNRQ       99.91     (26.0%)      108.80     (19.2%)    8.9% ( -28% -   73%) 0.219
           BrowseMonthSSDVFacets       18.62     (20.7%)       20.45     (25.0%)    9.8% ( -29% -   70%) 0.175

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard exists because getting bytes from a closed mapped ByteBuffer would cause a segv. So to keep the protection it offers, we should keep calling ensureValid before reading any byte from the ByteBuffer while your PR does it afterwards. I would also prefer keeping access to the ByteBuffers encapsulated into ByteBufferGuard.

Separately, this makes me wonder if some of the tasks that get the best speedups shouldn't be using a different encoding than vints.

@jpountz
Copy link
Contributor

jpountz commented Jan 10, 2022

I think there is another issue with this PR, which is that if a vInt is written across two ByteBuffers, then your PR would only ensure the validity of one of these ByteBuffers, while I think it should check both for safety.

@gf2121 gf2121 requested a review from jpountz January 10, 2022 18:30
@gf2121 gf2121 force-pushed the LUCENE-10366 branch 2 times, most recently from b06f078 to b792e94 Compare January 11, 2022 05:58
@gf2121
Copy link
Contributor Author

gf2121 commented Jan 11, 2022

Thanks @jpountz ! I did not realize that reading ummapped bytebuffer will make JVM crach, sorry!

we should keep calling ensureValid before reading any byte from the ByteBuffer while your PR does it afterwards. I would also prefer keeping access to the ByteBuffers encapsulated into ByteBufferGuard.

Sorry for my poor english, i'm not sure i've got your point. So to confirm, which one is you are meaning?

  1. This approach need to be changed to implement a ByteBufferGuard#getVInt that just ensureValid once before readVInt (also consider the issue that vint exists across several bytebuffers).

  2. We need to EnsureValid for each byte read, like what we did before. so this approach need to be given up and some other ways (like using codecs other than vints for some of the tasks) should be tried.

I'm having this question because i have not understood why todays ensurevalid can protect readings on bytebuffer, it seems there should be a case that (thread 1)ensurevalid -> (thread 2)unmap -> (thread 1)read unmapped. Is the Thread.yield() helping solve this case? Will this still work for a heavier operation like readVInt?

@gf2121
Copy link
Contributor Author

gf2121 commented Jan 11, 2022

I checked some other usage, seeing that for operations like bytebuffer#readBytes we also just checked once. If the length <= 6, DirectByteBuffer will read byte one by one, then it should be similar to the readVInt logic. So based on this i guess ensureValid once can work for vint? I implemented this and tests got passed too.

@jpountz
Copy link
Contributor

jpountz commented Jan 11, 2022

Thanks @gf2121 the implementation looks correct to me now. Are you still seeing a good speedup with this change?

@uschindler
Copy link
Contributor

uschindler commented Jan 11, 2022

The problem here is that we just reducing the number of guard checks and at same time raising the risk to segv: ReadVint uses multiple atomic reads and all of those reads can segv.

Sorry to tell this: I would not change anything here, it is too risky. Let's wait for MemorySegmentIndexInput, which has JVM internal checks and a guard is not needed. see #518

Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do this. The benchmark above shows no significant changes and it looks like ran with wrong JVM options (defaults by Mike were bad, fixed recently). It should run with tiered compilation enabled then the guard checks should go away.

@gf2121
Copy link
Contributor Author

gf2121 commented Jan 11, 2022

Thanks @jpountz @uschindler , this is the benchmark result based on the newest codes

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            HighTermTitleBDVSort      168.66     (13.3%)      163.17     (11.2%)   -3.3% ( -24% -   24%) 0.404
           BrowseMonthSSDVFacets       20.35     (25.8%)       19.83     (25.1%)   -2.6% ( -42% -   65%) 0.751
           BrowseMonthTaxoFacets       17.31     (14.9%)       16.96     (14.8%)   -2.1% ( -27% -   32%) 0.661
            MedTermDayTaxoFacets       48.47      (4.8%)       47.52      (4.1%)   -2.0% ( -10% -    7%) 0.159
       BrowseDayOfYearSSDVFacets       16.71     (14.5%)       16.41     (14.9%)   -1.8% ( -27% -   32%) 0.702
            HighIntervalsOrdered       25.70     (10.9%)       25.33     (11.0%)   -1.5% ( -21% -   22%) 0.674
                 MedSloppyPhrase       21.62      (4.8%)       21.38      (3.1%)   -1.1% (  -8% -    7%) 0.393
                      TermDTSort      238.94     (13.0%)      236.98     (12.8%)   -0.8% ( -23% -   28%) 0.841
                HighSloppyPhrase       39.84      (4.1%)       39.60      (2.9%)   -0.6% (  -7% -    6%) 0.588
       BrowseDayOfYearTaxoFacets       15.89     (15.4%)       15.83     (16.6%)   -0.4% ( -28% -   37%) 0.933
                         LowTerm     1698.75      (3.5%)     1691.63      (3.4%)   -0.4% (  -7% -    6%) 0.702
            BrowseDateTaxoFacets       15.72     (15.3%)       15.67     (16.5%)   -0.4% ( -27% -   37%) 0.942
          OrHighMedDayTaxoFacets       15.19      (4.3%)       15.14      (4.9%)   -0.3% (  -9% -    9%) 0.813
                         Respell       81.95      (2.6%)       81.68      (2.7%)   -0.3% (  -5% -    5%) 0.694
                        PKLookup      227.94      (4.3%)      227.37      (5.3%)   -0.2% (  -9% -    9%) 0.870
     BrowseRandomLabelSSDVFacets       10.93      (6.6%)       10.92      (5.8%)   -0.1% ( -11% -   13%) 0.955
                          IntNRQ      144.63     (22.4%)      144.48     (22.3%)   -0.1% ( -36% -   57%) 0.989
                          Fuzzy2       88.59      (2.3%)       88.51      (2.2%)   -0.1% (  -4% -    4%) 0.897
             LowIntervalsOrdered       93.64      (6.0%)       93.72      (5.9%)    0.1% ( -11% -   12%) 0.962
                          Fuzzy1      137.32      (3.1%)      137.58      (2.7%)    0.2% (  -5% -    6%) 0.833
             MedIntervalsOrdered       17.06      (6.3%)       17.09      (6.4%)    0.2% ( -11% -   13%) 0.922
           HighTermDayOfYearSort       82.27      (8.1%)       82.50     (11.8%)    0.3% ( -18% -   21%) 0.930
     BrowseRandomLabelTaxoFacets       13.63     (12.7%)       13.68     (13.3%)    0.3% ( -22% -   30%) 0.934
        AndHighHighDayTaxoFacets        6.39      (2.6%)        6.42      (2.9%)    0.5% (  -4% -    6%) 0.578
                 LowSloppyPhrase       55.91      (2.7%)       56.43      (1.8%)    0.9% (  -3% -    5%) 0.193
                         MedTerm     2031.88      (3.7%)     2054.31      (4.9%)    1.1% (  -7% -   10%) 0.419
                         Prefix3      129.38      (9.3%)      130.94      (8.8%)    1.2% ( -15% -   21%) 0.676
                     AndHighHigh       67.64      (4.4%)       68.46      (4.5%)    1.2% (  -7% -   10%) 0.390
                        Wildcard      105.85      (9.5%)      107.23      (8.8%)    1.3% ( -15% -   21%) 0.652
                    HighSpanNear       12.80      (3.8%)       13.01      (4.0%)    1.7% (  -5% -    9%) 0.177
               HighTermMonthSort      130.57      (9.9%)      132.98     (11.1%)    1.8% ( -17% -   25%) 0.579
                     MedSpanNear       73.79      (3.4%)       75.30      (2.7%)    2.0% (  -3% -    8%) 0.035
                      OrHighHigh       53.48      (5.1%)       54.77      (4.4%)    2.4% (  -6% -   12%) 0.106
                     LowSpanNear       16.05      (3.5%)       16.47      (3.6%)    2.6% (  -4% -   10%) 0.018
                   OrHighNotHigh     1162.70      (3.6%)     1194.55      (4.4%)    2.7% (  -5% -   11%) 0.030
                    OrNotHighMed     1428.26      (3.0%)     1468.20      (4.8%)    2.8% (  -4% -   10%) 0.027
                        HighTerm     2587.85      (4.5%)     2662.11      (5.2%)    2.9% (  -6% -   13%) 0.061
                       OrHighMed      182.60      (4.8%)      188.40      (4.3%)    3.2% (  -5% -   12%) 0.026
                   OrNotHighHigh     1079.06      (3.2%)     1114.06      (4.6%)    3.2% (  -4% -   11%) 0.010
                    OrHighNotMed     1278.47      (3.8%)     1320.86      (4.4%)    3.3% (  -4% -   12%) 0.011
                      AndHighMed      170.34      (2.9%)      176.45      (3.7%)    3.6% (  -2% -   10%) 0.001
                    OrHighNotLow      932.47      (3.6%)      976.66      (5.1%)    4.7% (  -3% -   14%) 0.001
                      HighPhrase      168.16      (3.0%)      176.27      (3.0%)    4.8% (  -1% -   11%) 0.000
                       LowPhrase       48.60      (2.6%)       50.98      (2.5%)    4.9% (   0% -   10%) 0.000
         AndHighMedDayTaxoFacets       58.86      (2.6%)       62.10      (2.6%)    5.5% (   0% -   11%) 0.000
                       MedPhrase      145.00      (3.1%)      153.94      (2.3%)    6.2% (   0% -   11%) 0.000
                       OrHighLow     1078.49      (3.5%)     1149.09      (3.1%)    6.5% (   0% -   13%) 0.000
                      AndHighLow     1149.40      (4.3%)     1275.69      (2.9%)   11.0% (   3% -   18%) 0.000
                    OrNotHighLow     1765.89      (6.2%)     2019.83      (4.4%)   14.4% (   3% -   26%) 0.000

Sorry to tell this: I would not change anything here, it is too risky. Let's wait for MemorySegmentIndexInput, which has JVM internal checks and a guard is not needed.

OK! I'll close this then.

The benchmark above shows no significant changes and it looks like ran with wrong JVM options (defaults by Mike were bad, fixed recently). It should run with tiered compilation enabled then the guard checks should go away.

Yes i did not run with tiered compilation, but in fact i'm running with the newest luceneutil. It seems we used to add this param but removed again a few days ago? See this commit: mikemccand/luceneutil@f48b538

@gf2121 gf2121 closed this Jan 11, 2022
@uschindler
Copy link
Contributor

uschindler commented Jan 11, 2022

The param must go away or change - to +. Mike's commit fixed the tiered compilation problem and I was not sure if you have used his latest commit.

To get better predicatable results that are not so noisy (especially for those low-level changes) you should have a huge dataset (not only wikiMedium) and also run it with more queries/run. What you see here is mostly noise.

@gf2121 gf2121 reopened this Jan 12, 2022
@gf2121
Copy link
Contributor Author

gf2121 commented Jan 12, 2022

@uschindler Thank you for the benchmark guidance!

I made a new change in this commit that ensure valid before each call of ByteBuffer#get. And I run a benchmark on wikimediumall (16GB) with 20 rounds of JVM and each round repeat tasks 200 times (each jvm last 10 min and total time took 6h+). This benchmark should be similar to what we did in #518. And here is the result:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets        5.80     (14.4%)        5.50     (13.6%)   -5.1% ( -29% -   26%) 0.246
            HighTermTitleBDVSort       35.69     (30.2%)       34.46     (18.5%)   -3.4% ( -40% -   64%) 0.664
                         Prefix3      194.68     (12.3%)      188.12     (13.2%)   -3.4% ( -25% -   25%) 0.403
               HighTermMonthSort       49.11     (28.0%)       47.66     (15.0%)   -2.9% ( -35% -   55%) 0.678
                      OrHighHigh       16.37      (4.5%)       15.99      (4.0%)   -2.4% ( -10% -    6%) 0.078
                      TermDTSort       22.81     (23.4%)       22.30     (12.8%)   -2.3% ( -31% -   44%) 0.704
       BrowseDayOfYearSSDVFacets        4.43     (10.4%)        4.34      (7.1%)   -2.0% ( -17% -   17%) 0.473
            HighIntervalsOrdered        3.92      (9.6%)        3.85      (8.4%)   -1.9% ( -18% -   17%) 0.514
                        Wildcard       95.75     (14.1%)       94.10     (14.6%)   -1.7% ( -26% -   31%) 0.704
     BrowseRandomLabelTaxoFacets        4.55     (12.8%)        4.50     (12.2%)   -1.1% ( -23% -   27%) 0.788
                       OrHighMed       47.41      (4.7%)       46.98      (4.1%)   -0.9% (  -9% -    8%) 0.516
            BrowseDateTaxoFacets        5.25     (16.3%)        5.21     (15.4%)   -0.8% ( -27% -   37%) 0.881
       BrowseDayOfYearTaxoFacets        5.31     (16.5%)        5.27     (15.6%)   -0.7% ( -28% -   37%) 0.893
          OrHighMedDayTaxoFacets        1.03      (5.0%)        1.02      (5.0%)   -0.6% ( -10% -    9%) 0.700
            MedTermDayTaxoFacets       15.74      (5.8%)       15.67      (3.8%)   -0.5% (  -9% -    9%) 0.772
                          Fuzzy1       76.87      (2.4%)       76.53      (2.1%)   -0.4% (  -4% -    4%) 0.542
                        PKLookup      204.50      (5.0%)      203.69      (4.8%)   -0.4% (  -9% -    9%) 0.798
                          Fuzzy2       71.89      (2.1%)       71.64      (2.0%)   -0.4% (  -4% -    3%) 0.586
                 LowSloppyPhrase        4.00      (4.2%)        3.99      (4.0%)   -0.3% (  -8% -    8%) 0.833
                         Respell       46.07      (1.8%)       45.96      (2.1%)   -0.2% (  -4% -    3%) 0.686
             LowIntervalsOrdered        7.73      (4.9%)        7.73      (3.7%)   -0.0% (  -8% -    9%) 0.988
           HighTermDayOfYearSort       43.42     (23.3%)       43.54     (12.8%)    0.3% ( -29% -   47%) 0.965
        AndHighHighDayTaxoFacets        6.99      (3.0%)        7.02      (2.6%)    0.5% (  -4% -    6%) 0.544
     BrowseRandomLabelSSDVFacets        3.18      (6.5%)        3.21      (7.6%)    1.1% ( -12% -   16%) 0.637
                    HighSpanNear        1.70      (5.8%)        1.72      (4.9%)    1.1% (  -9% -   12%) 0.524
                     LowSpanNear        5.93      (3.4%)        6.00      (3.4%)    1.1% (  -5% -    8%) 0.312
                 MedSloppyPhrase       45.85      (2.9%)       46.42      (2.7%)    1.2% (  -4% -    6%) 0.161
                     MedSpanNear       23.88      (4.0%)       24.26      (3.8%)    1.6% (  -5% -    9%) 0.198
                         LowTerm     1699.14      (4.1%)     1729.13      (2.9%)    1.8% (  -4% -    9%) 0.114
           BrowseMonthSSDVFacets        4.67      (4.1%)        4.76     (12.0%)    2.0% ( -13% -   18%) 0.483
                       MedPhrase       47.09      (2.2%)       48.24      (1.5%)    2.4% (  -1% -    6%) 0.000
                HighSloppyPhrase       14.85      (2.6%)       15.22      (2.2%)    2.5% (  -2% -    7%) 0.001
             MedIntervalsOrdered        7.07      (3.6%)        7.29      (2.5%)    3.2% (  -2% -    9%) 0.001
                       LowPhrase       12.80      (2.4%)       13.27      (1.7%)    3.6% (   0% -    7%) 0.000
                     AndHighHigh       35.29      (3.1%)       36.71      (5.0%)    4.0% (  -4% -   12%) 0.002
                      AndHighMed       87.69      (2.4%)       91.35      (3.9%)    4.2% (  -2% -   10%) 0.000
                   OrHighNotHigh     1514.29      (4.5%)     1578.30      (4.0%)    4.2% (  -4% -   13%) 0.002
                         MedTerm     1838.55      (4.7%)     1917.70      (4.1%)    4.3% (  -4% -   13%) 0.002
                    OrNotHighMed     1182.37      (3.8%)     1237.57      (2.7%)    4.7% (  -1% -   11%) 0.000
                    OrHighNotLow     1218.07      (6.1%)     1279.72      (4.8%)    5.1% (  -5% -   17%) 0.004
         AndHighMedDayTaxoFacets       20.90      (3.4%)       21.98      (2.0%)    5.2% (   0% -   11%) 0.000
                   OrNotHighHigh      943.89      (5.8%)      999.13      (4.8%)    5.9% (  -4% -   17%) 0.001
                        HighTerm     1349.44      (5.8%)     1431.38      (5.3%)    6.1% (  -4% -   18%) 0.001
                    OrHighNotMed      926.10      (5.8%)      989.92      (5.6%)    6.9% (  -4% -   19%) 0.000
                      HighPhrase      290.29      (6.0%)      318.57      (3.1%)    9.7% (   0% -   20%) 0.000
                          IntNRQ       32.61     (25.8%)       35.87     (16.2%)   10.0% ( -25% -   70%) 0.142
                       OrHighLow      547.59      (6.0%)      604.39      (3.2%)   10.4% (   1% -   20%) 0.000
                    OrNotHighLow      886.23      (5.1%)      980.83      (2.2%)   10.7% (   3% -   18%) 0.000
                      AndHighLow      608.29      (6.3%)      693.67      (3.5%)   14.0% (   4% -   25%) 0.000

So it seems like the point that brings a speed up is not reducing number of valid check but something else ? Anyway, this approach is no longer raising the risk of segv and benchmark result seems positive so i reopened the PR to see if this will make sense to you now :)

@gf2121 gf2121 requested a review from uschindler January 12, 2022 04:16
@uschindler
Copy link
Contributor

uschindler commented Jan 12, 2022

Hi @gf2121,
thanks for taking the time to test more. It looks really like the guard checks are not affecting the whole thing, but as always we don't really know where the optimization effects are coming from. For such low level changes it often also varies between different JVM version (there are differences between JDK 11 and JDK 17). Actually the code SHOULD not make any difference, but it has.

I see that you carefully also save the position before the try block and restore it on boundary crossing exception. What I don't like: ByteBufferGuard is just a hack and should not contain any program code except guarding. So I am not happy of including readVInt code there.

Have you tried in just moving the Guard#readVInt() code into ByteBufferIndexInput#readVInt() inside its try block calling guard.readByte() -- all guard methods should be inlined by JVM very early? This is where it belongs! Nevertheless I wonder why Hotspot is not able to apply the optimization. The only thing we spare is the try/catch, but maybe thats the issue here that prevents it from being inlined into readVInt.

If that helps we can also try the same on the MemorySegment one from #518.

@uschindler
Copy link
Contributor

I think about this:

@Override
  public final int readVInt() throws IOException {
    try {
      // using #position instead of #mark here as #position is a final method
      int pos = curBuf.position();
      try {
        byte b = guard.get(curBuf);
        if (b >= 0) return b;
        int i = b & 0x7F;
        b = guard.get(curBuf);
        i |= (b & 0x7F) << 7;
        if (b >= 0) return i;
        b = guard.get(curBuf);
        i |= (b & 0x7F) << 14;
        if (b >= 0) return i;
        b = guard.get(curBuf);
        i |= (b & 0x7F) << 21;
        if (b >= 0) return i;
        b = guard.get(curBuf);
        // Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors:
        i |= (b & 0x0F) << 28;
        if ((b & 0xF0) == 0) return i;
        throw new IOException("Invalid vInt detected (too many bits)");
      } catch (
          @SuppressWarnings("unused")
          BufferUnderflowException e) {
        curBuf.position(pos);
        return super.readVInt();
      }
    } catch (
        @SuppressWarnings("unused")
        NullPointerException npe) {
      throw new AlreadyClosedException("Already closed: " + this);
    }
  }

@jpountz
Copy link
Contributor

jpountz commented Jan 12, 2022

I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no longer affects any of the JVM versions we support to my knowledge, maybe we should consider rolling up these loops again (assuming it helps performance).

@uschindler
Copy link
Contributor

I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no longer affects any of the JVM versions we support to my knowledge, maybe we should consider rolling up these loops again (assuming it helps performance).

We only loose the last check on last iteration (the "copypaste" statement).

@uschindler
Copy link
Contributor

We should maybe convert all of them back to loops in DataInput base class and remove the specializations and do more tests. I think the change here is then completely obsolete.

@gf2121 gf2121 changed the title LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte. Jan 17, 2022
lucene/CHANGES.txt Outdated Show resolved Hide resolved
@uschindler uschindler self-assigned this Jan 17, 2022
@uschindler
Copy link
Contributor

Hi @jpountz,
if you agree I'd merge this in and backport!

Uwe

@jpountz
Copy link
Contributor

jpountz commented Jan 18, 2022

+1

@jpountz
Copy link
Contributor

jpountz commented Jan 18, 2022

@gf2121 Do you have an email I can use to contact you? Please send me an email at jpountz@gmail.com if you don't want to leave it on GitHub.

@gf2121
Copy link
Contributor Author

gf2121 commented Jan 19, 2022

Hi @jpountz. This is my email: guofeng.my@qq.com

@gf2121
Copy link
Contributor Author

gf2121 commented Jan 24, 2022

I find that in a higher level, we are using MultiLevelSkipListReader#SkipBuffer for top skipdata and ByteBufferIndexInput for low skipdata, and this is the point making JVM confused. I tried to remove the MultiLevelSkipListReader#SkipBuffer, getting similar speed up :

Note: This speed up is a duplicate of this PR as they are actually solving the same problem with different approaches.
(baseline is main and my_modified_version is main + remove skipbuffer)

                           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ       32.23     (24.8%)       30.20     (28.2%)   -6.3% ( -47% -   62%) 0.452
               HighTermMonthSort       62.08     (24.3%)       60.47     (15.8%)   -2.6% ( -34% -   49%) 0.690
                    HighSpanNear        2.67      (6.1%)        2.62      (4.6%)   -1.7% ( -11% -    9%) 0.320
                        PKLookup      200.97      (3.8%)      198.22      (5.9%)   -1.4% ( -10% -    8%) 0.382
                          Fuzzy1       68.17      (2.1%)       67.41      (2.1%)   -1.1% (  -5% -    3%) 0.089
          OrHighMedDayTaxoFacets        0.76      (5.6%)        0.75      (8.4%)   -1.0% ( -14% -   13%) 0.658
                      TermDTSort       47.59     (15.1%)       47.14     (17.2%)   -0.9% ( -28% -   37%) 0.854
                         Respell       45.57      (2.1%)       45.17      (2.5%)   -0.9% (  -5% -    3%) 0.222
                          Fuzzy2       47.98      (2.0%)       47.57      (1.9%)   -0.8% (  -4% -    3%) 0.179
            MedTermDayTaxoFacets       15.33      (4.2%)       15.24      (5.1%)   -0.6% (  -9% -    9%) 0.673
                     MedSpanNear        4.55      (4.0%)        4.55      (2.9%)    0.1% (  -6% -    7%) 0.956
             LowIntervalsOrdered        4.51      (7.4%)        4.52      (7.8%)    0.1% ( -13% -   16%) 0.952
           BrowseMonthSSDVFacets        4.62      (5.1%)        4.63      (5.7%)    0.2% ( -10% -   11%) 0.918
            HighIntervalsOrdered        8.12      (7.9%)        8.15      (8.2%)    0.4% ( -14% -   17%) 0.887
                 MedSloppyPhrase        3.66      (5.7%)        3.67      (4.9%)    0.5% (  -9% -   11%) 0.777
     BrowseRandomLabelSSDVFacets        3.15      (5.4%)        3.17      (6.5%)    0.6% ( -10% -   13%) 0.735
       BrowseDayOfYearSSDVFacets        4.27      (6.4%)        4.31      (9.4%)    0.9% ( -13% -   17%) 0.727
             MedIntervalsOrdered        7.56      (6.0%)        7.62      (6.1%)    0.9% ( -10% -   13%) 0.642
        AndHighHighDayTaxoFacets        3.12      (3.5%)        3.16      (5.3%)    1.2% (  -7% -   10%) 0.391
                      OrHighHigh       17.23      (4.4%)       17.44      (3.9%)    1.2% (  -6% -    9%) 0.347
                     LowSpanNear       11.17      (3.5%)       11.32      (2.7%)    1.4% (  -4% -    7%) 0.159
                         LowTerm     1569.56      (2.7%)     1593.46      (3.3%)    1.5% (  -4% -    7%) 0.108
     BrowseRandomLabelTaxoFacets        4.40     (11.5%)        4.47     (12.7%)    1.6% ( -20% -   29%) 0.672
                         Prefix3      270.42     (18.3%)      275.42     (16.9%)    1.8% ( -28% -   45%) 0.740
           HighTermDayOfYearSort       63.17     (19.4%)       64.52     (18.0%)    2.1% ( -29% -   49%) 0.719
            HighTermTitleBDVSort       37.21     (17.0%)       38.01     (17.5%)    2.1% ( -27% -   44%) 0.695
            BrowseDateTaxoFacets        5.04     (14.6%)        5.17     (16.2%)    2.6% ( -24% -   39%) 0.598
       BrowseDayOfYearTaxoFacets        5.10     (14.9%)        5.23     (16.3%)    2.6% ( -24% -   39%) 0.596
                HighSloppyPhrase        5.80      (4.4%)        5.96      (4.9%)    2.8% (  -6% -   12%) 0.059
                       OrHighMed       48.70      (4.7%)       50.10      (3.4%)    2.9% (  -5% -   11%) 0.028
                        Wildcard       46.19      (9.7%)       47.52      (5.4%)    2.9% ( -11% -   19%) 0.247
                 LowSloppyPhrase        9.06      (3.0%)        9.33      (3.8%)    2.9% (  -3% -    9%) 0.006
         AndHighMedDayTaxoFacets       15.86      (2.3%)       16.44      (3.1%)    3.6% (  -1% -    9%) 0.000
                       MedPhrase       15.16      (2.3%)       15.72      (2.4%)    3.6% (  -1% -    8%) 0.000
                      AndHighMed       21.94      (3.9%)       22.76      (4.3%)    3.7% (  -4% -   12%) 0.004
                     AndHighHigh       48.32      (4.3%)       50.40      (4.4%)    4.3% (  -4% -   13%) 0.002
                       LowPhrase       55.55      (2.6%)       58.03      (2.8%)    4.5% (   0% -   10%) 0.000
                    OrHighNotMed     1192.57      (4.5%)     1251.05      (4.4%)    4.9% (  -3% -   14%) 0.001
                      HighPhrase      189.54      (3.2%)      199.56      (3.3%)    5.3% (  -1% -   12%) 0.000
                    OrHighNotLow      962.35      (4.3%)     1013.28      (4.6%)    5.3% (  -3% -   14%) 0.000
           BrowseMonthTaxoFacets        5.38     (13.8%)        5.68     (14.7%)    5.5% ( -20% -   39%) 0.219
                   OrNotHighHigh      871.79      (3.9%)      920.18      (3.7%)    5.5% (  -1% -   13%) 0.000
                   OrHighNotHigh      982.57      (4.2%)     1038.77      (4.2%)    5.7% (  -2% -   14%) 0.000
                    OrNotHighMed      945.32      (3.2%)     1001.55      (3.5%)    5.9% (   0% -   13%) 0.000
                        HighTerm     1238.69      (4.8%)     1317.72      (4.8%)    6.4% (  -3% -   16%) 0.000
                         MedTerm     1911.70      (4.7%)     2044.18      (4.2%)    6.9% (  -1% -   16%) 0.000
                       OrHighLow      574.17      (4.5%)      623.52      (3.1%)    8.6% (   0% -   16%) 0.000
                    OrNotHighLow      784.78      (5.0%)      875.15      (4.2%)   11.5% (   2% -   21%) 0.000
                      AndHighLow      463.10      (5.8%)      521.43      (4.5%)   12.6% (   2% -   24%) 0.000

I'm not very sure if we should move on in this approach to avoid potential confusion in a more complex environment other than benchmark, but it sounds reasonable to remove MultiLevelSkipListReader#SkipBuffer as it seems like an unnecessay complexity?

I raised https://issues.apache.org/jira/browse/LUCENE-10388 (#620)

@uschindler
Copy link
Contributor

Hi @gf2121,
In case you are wondering about why I haven't merged this PR: I wanted to give you the honour to merge your pull request as Lucene committer! Congrats, by the way! The same applies for the other PR.
Once you have setup your ASF account and linked it to your github account and also enabled 2 factor auth, you're ready to go 👍.
Uwe

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
@mikemccand
Copy link
Member

Hello @gf2121! Looks like @uschindler wants you to have to honor of merging this (now stale!) PR!

@uschindler
Copy link
Contributor

Does this also affect MemorySegmentIndexInput?

@github-actions github-actions bot removed the Stale label Jan 9, 2024
@gf2121 gf2121 merged commit ed7c78c into apache:main Jan 16, 2024
4 checks passed
@uschindler
Copy link
Contributor

Thanks!

asfgit pushed a commit that referenced this pull request Jan 16, 2024
…ut to avoid the abstraction confusion of #readByte. (#592)
slow-J pushed a commit to slow-J/lucene that referenced this pull request Jan 16, 2024
…ut to avoid the abstraction confusion of #readByte. (apache#592)
@jpountz
Copy link
Contributor

jpountz commented Jan 31, 2024

I'm pushing an annotation, this triggered a speedup in PKLookup: http://people.apache.org/~mikemccand/lucenebench/PKLookup.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants