Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to prune numeric comparator #2

Open
wants to merge 1 commit into
base: branch_9x
Choose a base branch
from

Conversation

rishabhmaurya
Copy link
Owner

@rishabhmaurya rishabhmaurya commented Sep 1, 2023

Description

TopFieldCollector, where the number of hits to collect is known before hand, can we make use of it to only collect most competitive hits collected by NumericCollector? Competitive iterator in NumericCollector will make use of BKD (when defined criteria is met) and will traverse the points in ascending order and if the query sort order is ascending too it works pretty well, as most competitive hits would be collected first and rest of them can be discarded fast. When query sort order is descending, it can cause priority queue churning and read amplification because of doc values retrieval because of collection of non-competitive in the beginning, as most competitive docs are towards the end of BKD.
If we know how many hits we need to collect, can we directly move to the right node of the tree and start collecting competitive docs thereafter?
This could be helpful for the cases where segment is big and difference between numHits << docs in segment (matching query) as it could prune large set of non competitive docs.

BKD and numeric comparator code is new to me and I might be missing few critical cases, but here is my best effort to implement something closest.

Comment on lines +137 to +141
if (comparator instanceof NumericComparator) {
comparator.setScorer(scorer, numHits);
} else {
comparator.setScorer(scorer);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overloading setScore, would it make sense to add a setNumHits method to NumericComparator and call that before calling setScorer if comparator instaceof NumericComparator?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

Comment on lines +316 to +317
int lowerBoundCompetitiveDocs = cardinalityVisitor.getVisitedCount();
nonCompetitiveDocsToSkip[0] = lowerBoundCompetitiveDocs - numHits;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you subtract the number of deleted documents in the segment from the lower bound, would this work in the presence of deletes?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, this won't work in case of deletes and would only work for match all queries. I will add a logic to only enable this optimization where there are no deletes and its a match all query in the next iteration.

Copy link

github-actions bot commented Mar 7, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants