Optimize readInts24 performance for DocIdsWriter #12527

jainankitk · 2023-08-30T00:34:11Z

Description

While recently working on numeric range queries, I noticed readInts24 to be consuming significant CPU cycles. When I looked into the code, I noticed multiple consecutive invocations of readLong.

Initially, it seems that the overhead from multiple syscalls should not be as much, but I tried quick patch by reading all the longs together and it seemed to help. Sharing the patch and numbers below (nyc_taxis range query):

diff --git a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
index 40db4c0069d..40ee7a1c968 100644
--- a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
+++ b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
@@ -325,11 +325,14 @@ final class DocIdsWriter {
 
   private static void readInts24(IndexInput in, int count, IntersectVisitor visitor)
       throws IOException {
+    long[] scratchLong = new long[(count/8) * 3];
+    in.readLongs(scratchLong, 0, (count/8) * 3);
     int i;
     for (i = 0; i < count - 7; i += 8) {
-      long l1 = in.readLong();
-      long l2 = in.readLong();
-      long l3 = in.readLong();
+      int li = (i/8) * 3;
+      long l1 = scratchLong[li];
+      long l2 = scratchLong[li+1];
+      long l3 = scratchLong[li+2];
       visitor.visit((int) (l1 >>> 40));
       visitor.visit((int) (l1 >>> 16) & 0xffffff);
       visitor.visit((int) (((l1 & 0xffff) << 8) | (l2 >>> 56)));

Without this change:

|                                                 Max Throughput |  range |        0.71 |  ops/s |                    
|                                        50th percentile latency |  range |     245.533 |     ms |                    
|                                        90th percentile latency |  range |     248.005 |     ms |                    
|                                        99th percentile latency |  range |     254.824 |     ms |                                                                                                                                           
|                                       100th percentile latency |  range |     256.902 |     ms |                                                                                                                                           
|                                   50th percentile service time |  range |     243.585 |     ms |                    
|                                   90th percentile service time |  range |     246.178 |     ms |                    
|                                   99th percentile service time |  range |     252.672 |     ms |                    
|                                  100th percentile service time |  range |     255.072 |     ms |                                                                                                                                           
|                                                     error rate |  range |           0 |      % |

With this change:

|                                              Median Throughput |  range |         0.7 |  ops/s |
|                                                 Max Throughput |  range |        0.71 |  ops/s |
|                                        50th percentile latency |  range |     207.554 |     ms |
|                                        90th percentile latency |  range |     209.392 |     ms |
|                                        99th percentile latency |  range |     213.157 |     ms |
|                                       100th percentile latency |  range |     219.398 |     ms |
|                                   50th percentile service time |  range |     205.421 |     ms |
|                                   90th percentile service time |  range |     207.361 |     ms |
|                                   99th percentile service time |  range |     211.164 |     ms |
|                                  100th percentile service time |  range |     217.787 |     ms |
|                                                     error rate |  range |           0 |      % |

The text was updated successfully, but these errors were encountered:

benwtrent · 2023-08-30T15:46:30Z

I honestly don't know how big new long[(count/8) * 3]; could get.

If this thing can be large ((Integer.MAX_VALUE - 1)/8)*3, it would be good to use some sort of statically limited buffer (or load 3 vectors at a time in each loop).

iverase · 2023-08-30T16:10:43Z

count can only be as bigger as max_point_on_leaf_nodes so that's ok. I would like not to create an array every time this method is called, can we find a way to reuse that array?

jainankitk · 2023-08-30T19:14:45Z

I ran the workload few more times, and somehow the difference was not as much:

Without the patch

Run 1:

|                                    Heap used for stored fields |        |           0 |     MB |                                                              
|                                                  Segment count |        |          27 |        |                                                              
|                                                 Min Throughput |  range |         0.7 |  ops/s |                                                              
|                                                Mean Throughput |  range |         0.7 |  ops/s |                                                              
|                                              Median Throughput |  range |         0.7 |  ops/s |                                                              
|                                                 Max Throughput |  range |        0.71 |  ops/s |                                                              
|                                        50th percentile latency |  range |     205.022 |     ms |                                                              
|                                        90th percentile latency |  range |     206.531 |     ms |                                                              
|                                        99th percentile latency |  range |     212.712 |     ms |                                                              
|                                       100th percentile latency |  range |     276.585 |     ms |                                                              
|                                   50th percentile service time |  range |     203.119 |     ms |                                                              
|                                   90th percentile service time |  range |     204.688 |     ms |                                                              
|                                   99th percentile service time |  range |     210.706 |     ms |                                                              
|                                  100th percentile service time |  range |     274.582 |     ms |                                                              
|                                                     error rate |  range |           0 |      % |

Run 2:

|                                    Heap used for stored fields |        |           0 |     MB |                                                              
|                                                  Segment count |        |          27 |        |                                                              
|                                                 Min Throughput |  range |         0.7 |  ops/s |                                                              
|                                                Mean Throughput |  range |        0.71 |  ops/s |                                                              
|                                              Median Throughput |  range |        0.71 |  ops/s |                                                              
|                                                 Max Throughput |  range |        0.71 |  ops/s |                                                              
|                                        50th percentile latency |  range |       203.6 |     ms |                                                              
|                                        90th percentile latency |  range |     205.006 |     ms |                                                              
|                                        99th percentile latency |  range |     212.968 |     ms |                                                              
|                                       100th percentile latency |  range |      250.84 |     ms |                                                              
|                                   50th percentile service time |  range |     201.565 |     ms |                                                              
|                                   90th percentile service time |  range |     202.803 |     ms |                                                              
|                                   99th percentile service time |  range |      210.92 |     ms |                                                              
|                                  100th percentile service time |  range |     249.233 |     ms |                                                              
|                                                     error rate |  range |           0 |      % |

Run 3:

|                                    Heap used for stored fields |        |           0 |     MB |
|                                                  Segment count |        |          27 |        |
|                                                 Min Throughput |  range |         0.7 |  ops/s |
|                                                Mean Throughput |  range |         0.7 |  ops/s |
|                                              Median Throughput |  range |         0.7 |  ops/s |
|                                                 Max Throughput |  range |        0.71 |  ops/s |
|                                        50th percentile latency |  range |     217.133 |     ms |
|                                        90th percentile latency |  range |     218.321 |     ms |
|                                        99th percentile latency |  range |     229.438 |     ms |
|                                       100th percentile latency |  range |     286.165 |     ms |
|                                   50th percentile service time |  range |      215.12 |     ms |
|                                   90th percentile service time |  range |     216.697 |     ms |
|                                   99th percentile service time |  range |      227.62 |     ms |
|                                  100th percentile service time |  range |     284.478 |     ms |
|                                                     error rate |  range |           0 |      % |

jainankitk · 2023-08-30T19:16:47Z

With the change:
Run 1:


|                                    Heap used for stored fields |        |           0 |     MB |                                                              
|                                                  Segment count |        |          27 |        |                                                              
|                                                 Min Throughput |  range |         0.7 |  ops/s |                                                              
|                                                Mean Throughput |  range |         0.7 |  ops/s |                                                              
|                                              Median Throughput |  range |         0.7 |  ops/s |                                                              
|                                                 Max Throughput |  range |        0.71 |  ops/s |                                                              
|                                        50th percentile latency |  range |     208.587 |     ms |                                                              
|                                        90th percentile latency |  range |     210.602 |     ms |                                                              
|                                        99th percentile latency |  range |     218.468 |     ms |                                                              
|                                       100th percentile latency |  range |     218.532 |     ms |                                                              
|                                   50th percentile service time |  range |      206.58 |     ms |                                                              
|                                   90th percentile service time |  range |     208.502 |     ms |                                                              
|                                   99th percentile service time |  range |     216.287 |     ms |                                                              
|                                  100th percentile service time |  range |     216.565 |     ms |                                                              
|                                                     error rate |  range |           0 |      % |

Run 2:

|                                    Heap used for stored fields |        |           0 |     MB |
|                                                  Segment count |        |          27 |        |
|                                                 Min Throughput |  range |         0.7 |  ops/s |
|                                                Mean Throughput |  range |        0.71 |  ops/s |
|                                              Median Throughput |  range |        0.71 |  ops/s |
|                                                 Max Throughput |  range |        0.71 |  ops/s |
|                                        50th percentile latency |  range |     205.603 |     ms |
|                                        90th percentile latency |  range |     206.672 |     ms |
|                                        99th percentile latency |  range |     213.752 |     ms |
|                                       100th percentile latency |  range |     226.531 |     ms |
|                                   50th percentile service time |  range |     203.618 |     ms |
|                                   90th percentile service time |  range |     204.732 |     ms |
|                                   99th percentile service time |  range |     211.709 |     ms |
|                                  100th percentile service time |  range |     224.158 |     ms |
|                                                     error rate |  range |           0 |      % |

Run 3:

|                                    Heap used for stored fields |        |           0 |     MB |
|                                                  Segment count |        |          27 |        |
|                                                 Min Throughput |  range |         0.7 |  ops/s |
|                                                Mean Throughput |  range |         0.7 |  ops/s |
|                                              Median Throughput |  range |         0.7 |  ops/s |
|                                                 Max Throughput |  range |        0.71 |  ops/s |
|                                        50th percentile latency |  range |     208.779 |     ms |
|                                        90th percentile latency |  range |     210.455 |     ms |
|                                        99th percentile latency |  range |     214.977 |     ms |
|                                       100th percentile latency |  range |      216.61 |     ms |
|                                   50th percentile service time |  range |     206.825 |     ms |
|                                   90th percentile service time |  range |     208.576 |     ms |
|                                   99th percentile service time |  range |     212.755 |     ms |
|                                  100th percentile service time |  range |     214.337 |     ms |
|                                                     error rate |  range |           0 |      % |

mikemccand · 2023-08-31T11:35:33Z

I like this idea, reducing possible IO overhead. But I tested it with luceneutil on wikimediumall:

                            Task    QPS base      StdDev   QPS base2      StdDev                Pct diff p-value                                                                                                        
                          IntNRQ       76.78     (10.7%)       62.62      (7.6%)  -18.4% ( -33% -    0%) 0.000                                                                                                          
                       OrHighLow      304.44      (4.9%)      298.02      (2.5%)   -2.1% (  -9% -    5%) 0.085                                                                                                          
                      OrHighHigh       34.42      (6.2%)       33.75      (4.6%)   -1.9% ( -12% -    9%) 0.265                                                                                                          
                       OrHighMed      132.82      (4.7%)      130.46      (2.9%)   -1.8% (  -9% -    6%) 0.153                                                                                                          
                        HighTerm      774.84      (5.6%)      764.08      (5.7%)   -1.4% ( -12% -   10%) 0.437                                                                                                          
                         MedTerm      903.16      (5.6%)      890.65      (5.7%)   -1.4% ( -12% -   10%) 0.438                                                                                                          
                    OrHighNotLow      513.31      (7.3%)      506.82      (5.9%)   -1.3% ( -13% -   12%) 0.546                                                                                                          
     BrowseRandomLabelSSDVFacets        9.36      (4.6%)        9.25      (5.0%)   -1.2% ( -10% -    8%) 0.420                                                                                                          
                     AndHighHigh       36.69      (3.1%)       36.28      (3.9%)   -1.1% (  -7% -    6%) 0.311                                                                                                          
                      AndHighMed      247.06      (2.4%)      244.49      (3.2%)   -1.0% (  -6% -    4%) 0.251                                                                                                          
                   OrNotHighHigh      446.63      (4.6%)      442.08      (3.8%)   -1.0% (  -9% -    7%) 0.448                                                                                                          
                    OrHighNotMed      506.58      (7.3%)      501.84      (6.4%)   -0.9% ( -13% -   13%) 0.666                                                                                                          
            BrowseDateTaxoFacets        8.32      (5.8%)        8.25      (5.0%)   -0.9% ( -10% -   10%) 0.611                                                                                                          
       BrowseDayOfYearTaxoFacets        8.29      (5.7%)        8.22      (4.9%)   -0.9% ( -10% -   10%) 0.614                                                                                                          
                   OrHighNotHigh      457.35      (5.8%)      453.90      (4.9%)   -0.8% ( -10% -   10%) 0.658                                                                                                          
                    OrNotHighMed      412.12      (2.5%)      409.12      (2.0%)   -0.7% (  -5% -    3%) 0.315                                                                                                          
                         LowTerm      705.46      (3.6%)      701.45      (3.9%)   -0.6% (  -7% -    7%) 0.632                                                                                                          
          OrHighMedDayTaxoFacets        9.38      (3.5%)        9.34      (4.6%)   -0.5% (  -8% -    7%) 0.715                                                                                                          
                HighSloppyPhrase       10.43      (3.4%)       10.39      (3.4%)   -0.4% (  -6% -    6%) 0.714                                                                                                          
     BrowseRandomLabelTaxoFacets        7.59      (3.8%)        7.56      (3.3%)   -0.4% (  -7% -    6%) 0.731                                                                                                          
                     MedSpanNear       78.40      (2.6%)       78.11      (1.8%)   -0.4% (  -4% -    4%) 0.589                                                                                                          
            BrowseDateSSDVFacets        2.18      (1.6%)        2.18      (1.1%)   -0.4% (  -3% -    2%) 0.399                                                                                                          
            HighTermTitleBDVSort        9.16      (1.7%)        9.13      (1.9%)   -0.3% (  -3% -    3%) 0.551                                                                                                          
                        PKLookup      253.13      (0.9%)      252.32      (0.8%)   -0.3% (  -1% -    1%) 0.215                                                                                                          
                    OrNotHighLow      428.27      (2.2%)      427.00      (1.6%)   -0.3% (  -4% -    3%) 0.627                                                                                                          
                     LowSpanNear       16.45      (2.5%)       16.41      (1.7%)   -0.3% (  -4% -    4%) 0.695                                                                                                          
            HighIntervalsOrdered       12.58      (2.9%)       12.55      (3.5%)   -0.3% (  -6% -    6%) 0.797                                                                                                          
                    HighSpanNear       19.81      (2.1%)       19.78      (1.7%)   -0.2% (  -3% -    3%) 0.794                                                                                                          
                 LowSloppyPhrase       38.11      (1.9%)       38.06      (2.1%)   -0.1% (  -4% -    3%) 0.846                                                                                                          
               HighTermTitleSort      263.09      (2.4%)      262.91      (2.2%)   -0.1% (  -4% -    4%) 0.927                                                                                                          
                      TermDTSort      482.30      (0.8%)      482.18      (0.7%)   -0.0% (  -1% -    1%) 0.919                                                                                                          
               HighTermMonthSort     3449.43      (1.1%)     3448.77      (0.7%)   -0.0% (  -1% -    1%) 0.948                                                                                                          
                      AndHighLow      863.72      (0.9%)      863.69      (1.2%)   -0.0% (  -2% -    2%) 0.991                                                                                                          
                 MedSloppyPhrase       46.06      (4.3%)       46.06      (4.4%)   -0.0% (  -8% -    9%) 1.000                                                                                                          
                       MedPhrase       32.96      (2.6%)       32.96      (2.9%)    0.0% (  -5% -    5%) 1.000                                                                                                          
                          Fuzzy2       53.68      (1.2%)       53.68      (1.2%)    0.0% (  -2% -    2%) 0.995                                                                                                          
           BrowseMonthTaxoFacets        8.25      (0.2%)        8.25      (0.2%)    0.0% (   0% -    0%) 0.789                                                                                                          
           HighTermDayOfYearSort      751.68      (0.8%)      752.01      (0.8%)    0.0% (  -1% -    1%) 0.866                                                                                                          
        AndHighHighDayTaxoFacets        7.74      (2.5%)        7.75      (1.7%)    0.1% (  -4% -    4%) 0.935                                                                                                          
             LowIntervalsOrdered       26.92      (2.0%)       26.94      (2.2%)    0.1% (  -4% -    4%) 0.900                                                                                                          
            MedTermDayTaxoFacets       31.71      (2.9%)       31.74      (2.3%)    0.1% (  -4% -    5%) 0.912                                                                                                          
                        Wildcard       38.55      (2.2%)       38.60      (1.9%)    0.1% (  -3% -    4%) 0.843                                                                                                          
             MedIntervalsOrdered        4.74      (2.2%)        4.74      (2.2%)    0.2% (  -4% -    4%) 0.817                                                                                                          
           BrowseMonthSSDVFacets       14.04      (9.1%)       14.07     (11.7%)    0.2% ( -18% -   23%) 0.948                                                                                                          
                         Respell       39.73      (1.2%)       39.81      (1.3%)    0.2% (  -2% -    2%) 0.568                                                                                                          
                          Fuzzy1       62.49      (1.3%)       62.67      (1.4%)    0.3% (  -2% -    2%) 0.484                                                                                                          
         AndHighMedDayTaxoFacets       42.88      (1.4%)       43.02      (1.1%)    0.3% (  -2% -    2%) 0.431                                                                                                          
                       LowPhrase       20.89      (3.3%)       20.95      (3.8%)    0.3% (  -6% -    7%) 0.775                                                                                                          
                      HighPhrase       66.24      (3.4%)       66.51      (3.6%)    0.4% (  -6% -    7%) 0.713                                                                                                          
                         Prefix3      198.81      (2.2%)      199.70      (1.8%)    0.4% (  -3% -    4%) 0.487                                                                                                          
       BrowseDayOfYearSSDVFacets       12.50      (4.8%)       12.73     (10.6%)    1.8% ( -12% -   18%) 0.487

It looks like IntNRQ (which I think is the only tasks using BKD tree / points for range filtering) is upset with it with high confidence (p=0.000). I'm surprised the impact was THAT large.

jainankitk · 2023-09-06T18:06:25Z

@mikemccand - Thanks for sharing the numbers. This is truly surprising result. Even though the impact of this small change not positive, it is significant enough to explore areas of improvement on this. I am thinking of trying out couple of things below:

Update the patch to use scratch array similar to int, and rerun the benchmark:

diff --git a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
index 40db4c0069d..64ed9b84084 100644
--- a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
+++ b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
@@ -35,9 +35,11 @@ final class DocIdsWriter {
   private static final byte LEGACY_DELTA_VINT = (byte) 0;
 
   private final int[] scratch;
+  private final long[] scratchLong;
 
   DocIdsWriter(int maxPointsInLeaf) {
     scratch = new int[maxPointsInLeaf];
+    scratchLong = new long[(maxPointsInLeaf / 8) * 3];
   }
 
   void writeDocIds(int[] docIds, int start, int count, DataOutput out) throws IOException {
@@ -236,12 +238,14 @@ final class DocIdsWriter {
     }
   }
 
-  private static void readInts24(IndexInput in, int count, int[] docIDs) throws IOException {
+  private void readInts24(IndexInput in, int count, int[] docIDs) throws IOException {
+    in.readLongs(scratchLong, 0, (count/8) * 3);
     int i;
     for (i = 0; i < count - 7; i += 8) {
-      long l1 = in.readLong();
-      long l2 = in.readLong();
-      long l3 = in.readLong();
+      int li = (i/8) * 3;
+      long l1 = scratchLong[li];
+      long l2 = scratchLong[li+1];
+      long l3 = scratchLong[li+2];
       docIDs[i] = (int) (l1 >>> 40);
       docIDs[i + 1] = (int) (l1 >>> 16) & 0xffffff;
       docIDs[i + 2] = (int) (((l1 & 0xffff) << 8) | (l2 >>> 56));
@@ -323,13 +327,15 @@ final class DocIdsWriter {
     }
   }
 
-  private static void readInts24(IndexInput in, int count, IntersectVisitor visitor)
+  private void readInts24(IndexInput in, int count, IntersectVisitor visitor)
       throws IOException {
+    in.readLongs(scratchLong, 0, (count/8) * 3);
     int i;
     for (i = 0; i < count - 7; i += 8) {
-      long l1 = in.readLong();
-      long l2 = in.readLong();
-      long l3 = in.readLong();
+      int li = (i/8) * 3;
+      long l1 = scratchLong[li];
+      long l2 = scratchLong[li+1];
+      long l3 = scratchLong[li+2];
       visitor.visit((int) (l1 >>> 40));
       visitor.visit((int) (l1 >>> 16) & 0xffffff);
       visitor.visit((int) (((l1 & 0xffff) << 8) | (l2 >>> 56)));

If the performance is still regressed after this, we can try removing the scratch array even for readInts32. Although, not sure if the benchmark has sufficient coverage for both types of docIds

uschindler · 2023-09-06T19:04:49Z

The main difference here is that by default Lucene uses MMapDirectoy. Loading many small arrays instead of single longs is not a good idea, as copyMemory off-heap to heap is heavy due to JVM overhead. When using MMapDirectory all the readLong() calls should be inlined, while those small readLongs(long[]) reads aren't.

I wonder why you see higher syscalls in Opensearch, could it be because it uses hybridfs? Nowadays the use of hybridfs should be avoided as the old "number kernel mappings" limitations are no longer applicable since Java 19.

So I disagree with chainging this loop of reading longs, it would be better to figure out why it is slower for you.

Another issue: How did you measure the overhead of syscalls? Most profilers do it wrong on those lower levels. Never ever trust a profiler telling you that readLong() takes roo long, it is in most cases a measurement error. Due to the use of variable handles in MMapDirectory you easily see wrong results because optimizer takes muuuuch longer to start inlining the reads.

jainankitk · 2023-09-06T19:28:20Z

Loading many small arrays instead of single longs is not a good idea, as copyMemory off-heap to heap is heavy due to JVM overhead. When using MMapDirectory all the readLong() calls should be inlined, while those small readLongs(long[]) reads aren't.

@uschindler - Thanks for the explanation. That makes sense.

I wonder why you see higher syscalls in Opensearch, could it be because it uses hybridfs? Nowadays the use of hybridfs should be avoided as the old "number kernel mappings" limitations are no longer applicable since Java 19.

While the default is hybridfs in Opensearch, it loads the compound files and point range files using MMapDirectory

So I disagree with chainging this loop of reading longs, it would be better to figure out why it is slower for you.

I will spend more time understanding this better. Although, I am still wondering why the JVM overhead copyMemory off-heap to heap is not an issue for readInts32 here. Maybe I am missing something obvious.

mikemccand · 2023-09-07T11:35:51Z

Indeed the inconsistency between readInts24 and readInts32 is odd? If it's faster to readLong 3 or 4 times, why not do that consistently in both methods :) Or if a long scratch[] is faster, do that consistently. Maybe we should fix readInts32 to just do 4 readLong()?

Let's see what the silicon tells us -- I kicked off another run on wikimediumall with your long scratch[] patch above.

mikemccand · 2023-09-07T14:15:45Z

OK I tested the "read into scratch array" approach from this comment:

                            Task    QPS base      StdDevQPS readLongs      StdDev                Pct diff p-value                                                                              
                          IntNRQ      676.26      (3.7%)      607.60      (3.2%)  -10.2% ( -16% -   -3%) 0.000                                                                                 
       BrowseDayOfYearSSDVFacets       13.19     (12.2%)       12.70     (10.2%)   -3.7% ( -23% -   21%) 0.296                                                                                 
           HighTermDayOfYearSort      676.54      (1.4%)      653.90      (1.2%)   -3.3% (  -5% -    0%) 0.000                                                                                 
                      TermDTSort      291.54      (1.6%)      285.07      (1.4%)   -2.2% (  -5% -    0%) 0.000                                                                                 
       BrowseDayOfYearTaxoFacets        8.47      (8.6%)        8.32      (5.0%)   -1.8% ( -14% -   12%) 0.417                                                                                 
            BrowseDateTaxoFacets        8.50      (8.4%)        8.35      (5.1%)   -1.7% ( -14% -   12%) 0.428                                                                                 
                HighSloppyPhrase       27.18      (2.5%)       26.90      (3.3%)   -1.0% (  -6% -    4%) 0.259                                                                                 
                       MedPhrase       12.74      (7.0%)       12.65      (7.2%)   -0.7% ( -13% -   14%) 0.747                                                                                 
                         LowTerm      820.76      (3.4%)      815.74      (3.5%)   -0.6% (  -7% -    6%) 0.574                                                                                 
            HighTermTitleBDVSort        5.39      (3.5%)        5.35      (2.5%)   -0.6% (  -6% -    5%) 0.557                                                                                 
     BrowseRandomLabelSSDVFacets        9.18      (5.4%)        9.13      (5.8%)   -0.6% ( -11% -   11%) 0.753                                                                                 
                      HighPhrase       10.16      (4.9%)       10.11      (4.9%)   -0.5% (  -9% -    9%) 0.739                                                                                 
                      OrHighHigh       26.04      (5.6%)       25.91      (4.4%)   -0.5% (  -9% -   10%) 0.745                                                                                 
                       LowPhrase       13.05      (3.4%)       12.98      (3.4%)   -0.5% (  -7% -    6%) 0.650                                                                                 
     BrowseRandomLabelTaxoFacets        7.68      (4.8%)        7.64      (3.1%)   -0.5% (  -8% -    7%) 0.703                                                                                 
            BrowseDateSSDVFacets        2.19      (1.1%)        2.18      (1.4%)   -0.4% (  -2% -    2%) 0.323                                                                                 
                         Prefix3      206.91      (5.3%)      206.16      (5.7%)   -0.4% ( -10% -   11%) 0.835                                                                                 
                      AndHighLow      700.11      (2.1%)      697.60      (1.6%)   -0.4% (  -3% -    3%) 0.545                                                                                 
                 LowSloppyPhrase       71.11      (1.9%)       70.87      (2.2%)   -0.3% (  -4% -    3%) 0.599                                                                                 
                    OrNotHighLow      666.63      (1.5%)      664.60      (1.6%)   -0.3% (  -3% -    2%) 0.539                                                                                 
                 MedSloppyPhrase       51.20      (4.6%)       51.07      (5.1%)   -0.3% (  -9% -    9%) 0.869                                                                                 
                   OrHighNotHigh      356.18      (4.7%)      355.40      (4.7%)   -0.2% (  -9% -    9%) 0.882                                                                                 
               HighTermMonthSort     3590.99      (1.1%)     3584.45      (0.9%)   -0.2% (  -2% -    1%) 0.576                                                                                 
                       OrHighLow      337.00      (4.3%)      336.39      (3.6%)   -0.2% (  -7% -    8%) 0.886                                                                                 
               HighTermTitleSort      193.11      (1.0%)      192.80      (0.9%)   -0.2% (  -2% -    1%) 0.589                                                                                 
                    OrNotHighMed      261.20      (2.7%)      261.10      (2.6%)   -0.0% (  -5% -    5%) 0.962                                                                                 
            MedTermDayTaxoFacets       41.25      (2.1%)       41.24      (1.5%)   -0.0% (  -3% -    3%) 0.958                                                                                 
                       OrHighMed      148.69      (5.2%)      148.66      (4.3%)   -0.0% (  -9% -    9%) 0.988                                                                                 
                          Fuzzy1       46.00      (1.1%)       46.00      (1.3%)   -0.0% (  -2% -    2%) 0.999                                                                                 
                        Wildcard      213.15      (1.6%)      213.19      (1.4%)    0.0% (  -2% -    2%) 0.967                                                                                 
                   OrNotHighHigh      455.16      (3.7%)      455.25      (3.7%)    0.0% (  -7% -    7%) 0.987                                                                                 
                    OrHighNotLow      517.26      (6.4%)      517.55      (6.6%)    0.1% ( -12% -   13%) 0.978                                                                                 
                         Respell       34.81      (0.8%)       34.84      (0.7%)    0.1% (  -1% -    1%) 0.723                                                                                 
                        HighTerm      695.15      (5.9%)      695.81      (5.8%)    0.1% ( -10% -   12%) 0.959                                                                                 
                      AndHighMed       76.71      (4.4%)       76.81      (4.3%)    0.1% (  -8% -    9%) 0.922                                                                                 
                     AndHighHigh       50.22      (3.0%)       50.29      (3.0%)    0.1% (  -5% -    6%) 0.886                                                                                 
                         MedTerm      925.46      (5.3%)      926.77      (5.4%)    0.1% ( -10% -   11%) 0.933                                                                                 
                    OrHighNotMed      499.30      (5.1%)      500.08      (5.3%)    0.2% (  -9% -   11%) 0.925                                                                                 
        AndHighHighDayTaxoFacets       16.62      (1.9%)       16.66      (0.9%)    0.3% (  -2% -    3%) 0.587                                                                                 
                          Fuzzy2       42.50      (1.0%)       42.61      (1.0%)    0.3% (  -1% -    2%) 0.399                                                                                 
            HighIntervalsOrdered        4.84      (4.8%)        4.85      (4.7%)    0.3% (  -8% -   10%) 0.825                                                                                 
                        PKLookup      253.83      (1.1%)      254.83      (1.0%)    0.4% (  -1% -    2%) 0.225                                                                                 
         AndHighMedDayTaxoFacets       28.49      (1.6%)       28.61      (1.9%)    0.4% (  -3% -    4%) 0.460                                                                                 
             LowIntervalsOrdered        7.78      (3.9%)        7.82      (4.1%)    0.5% (  -7% -    8%) 0.719                                                                                 
                    HighSpanNear       24.99      (2.6%)       25.11      (2.9%)    0.5% (  -4% -    6%) 0.585                                                                                 
                     LowSpanNear       10.42      (4.0%)       10.48      (4.0%)    0.5% (  -7% -    8%) 0.680                                                                                 
             MedIntervalsOrdered       26.55      (5.1%)       26.70      (5.5%)    0.5% (  -9% -   11%) 0.748                                                                                 
           BrowseMonthTaxoFacets        8.20      (1.6%)        8.25      (0.3%)    0.6% (  -1% -    2%) 0.109                                                                                 
          OrHighMedDayTaxoFacets        5.69      (3.8%)        5.73      (2.9%)    0.6% (  -5% -    7%) 0.577                                                                                 
                     MedSpanNear       13.20      (3.6%)       13.29      (3.7%)    0.6% (  -6% -    8%) 0.582                                                                                 
           BrowseMonthSSDVFacets       13.87     (13.0%)       14.38     (13.4%)    3.7% ( -20% -   34%) 0.380

It looks like IntNRQ is still upset, but not as much (-18% down to -10%).

Maybe next we should try 4 readLong() for readInts32? Though I wonder how often in this benchy are we really needing 32 bits to encode the docid deltas in a BKD leaf block? We need some sort of simple tool to print out how often each bit-width is uses in and index...

jainankitk · 2023-09-11T20:16:31Z

Maybe next we should try 4 readLong() for readInts32? Though I wonder how often in this benchy are we really needing 32 bits to encode the docid deltas in a BKD leaf block?

Why 4? we can just do single long for 2 ints. Although, I also doubt that we really need 32 bits to encode the docid deltas in a BKD leaf block in the benchmark.

Following patch should work for achieving this:

diff --git a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
index 40db4c0069d..830e4ba43c9 100644
--- a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
+++ b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
@@ -345,9 +345,14 @@ final class DocIdsWriter {
   }
 
   private void readInts32(IndexInput in, int count, IntersectVisitor visitor) throws IOException {
-    in.readInts(scratch, 0, count);
-    for (int i = 0; i < count; i++) {
-      visitor.visit(scratch[i]);
+    int i;
+    for (i = 0; i < count - 1; i += 2) {
+      long l = in.readLong();
+      visitor.visit((int) (l >>> 32));
+      visitor.visit((int) l);
+    }
+    for (; i < count; ++i) {
+      visitor.visit(in.readInt());
     }
   }
 }

jainankitk added the type:enhancement label Aug 30, 2023

sandeshkr419 mentioned this issue Dec 11, 2023

[Ideation] Improve readInts24 & readDelta16 performance for DocIdsWriter #12917

Open

expani mentioned this issue May 15, 2024

[Feature Request] Improve BKD Tree DocIds Encoding for 24 and 32 bit variations opensearch-project/OpenSearch#13686

Open

Bukhtawar mentioned this issue Aug 19, 2024

Adding @jainankitk as a Maintainer opensearch-project/OpenSearch#15304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize readInts24 performance for DocIdsWriter #12527

Optimize readInts24 performance for DocIdsWriter #12527

jainankitk commented Aug 30, 2023

benwtrent commented Aug 30, 2023

iverase commented Aug 30, 2023 •

edited

Loading

jainankitk commented Aug 30, 2023 •

edited

Loading

jainankitk commented Aug 30, 2023

mikemccand commented Aug 31, 2023

jainankitk commented Sep 6, 2023

uschindler commented Sep 6, 2023 •

edited

Loading

jainankitk commented Sep 6, 2023

mikemccand commented Sep 7, 2023

mikemccand commented Sep 7, 2023

jainankitk commented Sep 11, 2023

Optimize readInts24 performance for DocIdsWriter #12527

Optimize readInts24 performance for DocIdsWriter #12527

Comments

jainankitk commented Aug 30, 2023

Description

benwtrent commented Aug 30, 2023

iverase commented Aug 30, 2023 • edited Loading

jainankitk commented Aug 30, 2023 • edited Loading

jainankitk commented Aug 30, 2023

mikemccand commented Aug 31, 2023

jainankitk commented Sep 6, 2023

uschindler commented Sep 6, 2023 • edited Loading

jainankitk commented Sep 6, 2023

mikemccand commented Sep 7, 2023

mikemccand commented Sep 7, 2023

jainankitk commented Sep 11, 2023

iverase commented Aug 30, 2023 •

edited

Loading

jainankitk commented Aug 30, 2023 •

edited

Loading

uschindler commented Sep 6, 2023 •

edited

Loading