Break point estimate when threshold exceeded #13199

gf2121 · 2024-03-22T08:56:54Z

Typically, we estimate the point value count to compare to a threshold and all we need is just a boolean which represents whether the point count is greater than this threshold. This PR proposes to parse the threshold into the intersect logic and break the recursion when the threshold is exceeded.

Dynamic pruning is a case that heavily using estimate point count so i run luceneutil for it. Here is the benchmark result on wikimedium10m:

M2 Chip

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           HighTermDayOfYearSort      374.33      (1.8%)      429.74      (0.6%)   14.8% (  12% -   17%) 0.000
                      TermDTSort      388.72      (1.7%)      474.43      (1.5%)   22.1% (  18% -   25%) 0.000

Intel Chip

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      123.32      (2.6%)      158.34      (2.5%)   28.4% (  22% -   34%) 0.000
           HighTermDayOfYearSort      391.59      (1.8%)      577.73      (2.7%)   47.5% (  42% -   52%) 0.000

PS: When profiling i noticed that PointTree construction cost a lot so i tried to make it reusable, this optimization also contributed to this speed-up.

jpountz

This is a big speedup! I left some minor comments.

jpountz · 2024-03-26T10:21:34Z

lucene/core/src/java/org/apache/lucene/index/PointValues.java

+   *
+   * <p>TODO: Broad-first will help extimation terminate earlier?
+   */
+  public static long estimatePointCount(


Nit: To make contracts of these functions clearer, I'd rather make this function private, and them have another isEstimatedPointCountGreaterThanOrEqualTo public function (and probably tagged with @lucene.internal so that we can evolve it as we want) that calls this private function?

jpountz · 2024-03-26T10:22:57Z

lucene/core/src/java/org/apache/lucene/index/PointValues.java

-            cost += estimatePointCount(visitor, pointTree);
-          } while (pointTree.moveToSibling());
+            cost += estimatePointCount(visitor, pointTree, upperBound - cost);
+          } while (cost <= upperBound && pointTree.moveToSibling());


I believe that we can stop counting if cost == upperBound?

Suggested change

} while (cost <= upperBound && pointTree.moveToSibling());

} while (cost < upperBound && pointTree.moveToSibling());

jpountz · 2024-03-26T10:47:19Z

lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java

+      }
+      assert !pointTree.moveToParent();
+      return pointTree;
+    }


What about pulling the point tree in the constructor instead of doing it lazily (for simplicity)?

jpountz

LGTM

gf2121 · 2024-03-26T11:37:10Z

Thanks for review @jpountz !

gf2121 · 2024-03-27T07:49:38Z

Nightly benchmark:
https://home.apache.org/~mikemccand/lucenebench/TermDTSort.html
https://home.apache.org/~mikemccand/lucenebench/TermDayOfYearSort.html
https://home.apache.org/~mikemccand/lucenebench/2024.03.26.08.24.36.html

I opened a PR to add annotation:
https://github.com/mikemccand/luceneutil/pull/261/files

gf2121 added 5 commits March 22, 2024 16:29

better estimate

350eaef

Merge remote-tracking branch 'origin/main' into better_estimate

3269f9e

CHANGES

e74272e

iter

66ad432

java doc

00dc7c9

gf2121 requested a review from jpountz March 26, 2024 03:49

jpountz reviewed Mar 26, 2024

View reviewed changes

gf2121 added 2 commits March 26, 2024 19:10

review iter

836a028

private

854b00e

jpountz approved these changes Mar 26, 2024

View reviewed changes

gf2121 merged commit 99b9636 into apache:main Mar 26, 2024
3 checks passed

asfgit pushed a commit that referenced this pull request Mar 26, 2024

Break point estimation when threshold exceeded (#13199)

fafd16b

This was referenced Mar 26, 2024

New structure for numeric dynamic pruning #13217

Closed

Disjunction as CompetitiveIterator for numeric dynamic pruning #13221

Merged

iverase mentioned this pull request Jun 18, 2024

Avoid performance regression by constructing lazily the PointTree in NumericComparator #13498

Merged

kkewwei mentioned this pull request Jul 9, 2024

Pruning of estimating the point value count since BooleanScorerSupplier #13554

Open

kkewwei mentioned this pull request Nov 12, 2024

Pruning of estimating the point value count in BooleanScorerSupplier #13988

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break point estimate when threshold exceeded #13199

Break point estimate when threshold exceeded #13199

gf2121 commented Mar 22, 2024 •

edited

Loading

jpountz left a comment

jpountz Mar 26, 2024

jpountz Mar 26, 2024

jpountz Mar 26, 2024

jpountz left a comment

gf2121 commented Mar 26, 2024

gf2121 commented Mar 27, 2024

	} while (cost <= upperBound && pointTree.moveToSibling());
	} while (cost < upperBound && pointTree.moveToSibling());

Break point estimate when threshold exceeded #13199

Break point estimate when threshold exceeded #13199

Conversation

gf2121 commented Mar 22, 2024 • edited Loading

jpountz left a comment

Choose a reason for hiding this comment

jpountz Mar 26, 2024

Choose a reason for hiding this comment

jpountz Mar 26, 2024

Choose a reason for hiding this comment

jpountz Mar 26, 2024

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

gf2121 commented Mar 26, 2024

gf2121 commented Mar 27, 2024

gf2121 commented Mar 22, 2024 •

edited

Loading