-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to prune numeric comparator #2
base: branch_9x
Are you sure you want to change the base?
Conversation
if (comparator instanceof NumericComparator) { | ||
comparator.setScorer(scorer, numHits); | ||
} else { | ||
comparator.setScorer(scorer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overloading setScore
, would it make sense to add a setNumHits
method to NumericComparator
and call that before calling setScorer
if comparator instaceof NumericComparator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
int lowerBoundCompetitiveDocs = cardinalityVisitor.getVisitedCount(); | ||
nonCompetitiveDocsToSkip[0] = lowerBoundCompetitiveDocs - numHits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you subtract the number of deleted documents in the segment from the lower bound, would this work in the presence of deletes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, this won't work in case of deletes and would only work for match all queries. I will add a logic to only enable this optimization where there are no deletes and its a match all query in the next iteration.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Description
TopFieldCollector, where the number of hits to collect is known before hand, can we make use of it to only collect most competitive hits collected by NumericCollector? Competitive iterator in NumericCollector will make use of BKD (when defined criteria is met) and will traverse the points in ascending order and if the query sort order is ascending too it works pretty well, as most competitive hits would be collected first and rest of them can be discarded fast. When query sort order is descending, it can cause priority queue churning and read amplification because of doc values retrieval because of collection of non-competitive in the beginning, as most competitive docs are towards the end of BKD.
If we know how many hits we need to collect, can we directly move to the right node of the tree and start collecting competitive docs thereafter?
This could be helpful for the cases where segment is big and difference between numHits << docs in segment (matching query) as it could prune large set of non competitive docs.
BKD and numeric comparator code is new to me and I might be missing few critical cases, but here is my best effort to implement something closest.