-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break point estimate when threshold exceeded #13199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big speedup! I left some minor comments.
* | ||
* <p>TODO: Broad-first will help extimation terminate earlier? | ||
*/ | ||
public static long estimatePointCount( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: To make contracts of these functions clearer, I'd rather make this function private, and them have another isEstimatedPointCountGreaterThanOrEqualTo
public function (and probably tagged with @lucene.internal
so that we can evolve it as we want) that calls this private function?
cost += estimatePointCount(visitor, pointTree); | ||
} while (pointTree.moveToSibling()); | ||
cost += estimatePointCount(visitor, pointTree, upperBound - cost); | ||
} while (cost <= upperBound && pointTree.moveToSibling()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we can stop counting if cost == upperBound
?
} while (cost <= upperBound && pointTree.moveToSibling()); | |
} while (cost < upperBound && pointTree.moveToSibling()); |
} | ||
assert !pointTree.moveToParent(); | ||
return pointTree; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about pulling the point tree in the constructor instead of doing it lazily (for simplicity)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for review @jpountz ! |
Typically, we estimate the point value count to compare to a threshold and all we need is just a boolean which represents whether the point count is greater than this threshold. This PR proposes to parse the threshold into the intersect logic and break the recursion when the threshold is exceeded.
Dynamic pruning is a case that heavily using estimate point count so i run luceneutil for it. Here is the benchmark result on
wikimedium10m
:M2 Chip
Intel Chip
PS: When profiling i noticed that PointTree construction cost a lot so i tried to make it reusable, this optimization also contributed to this speed-up.