-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stops using IndexSearcher::count
in Lucene.
#234
Conversation
I would put a CHANGELOG entry around this even though it's a small internal change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually test the internal behavior but I can probably live with that. Once I've checked out the changelog I should be able to approve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 And I agree w/ Mike, it would be good to call out this expected performance boost in the Changelog (50% fewer Lucene queries in the typical case of getting the 1st page)
@michael-mclawhorn Is the test I added insufficient somehow? In our case, we only use the |
I think the test this PR adds is a good addition. I think Mike's point is more that we're making an internal code change which should result in a performance boost, but we're not really checking the internal implementation anywhere. Personally, I'm 100% OK with not testing this particular difference:
|
--It looks like `IndexSearcher::count` fires off an entire query, and then counts the number of documents that were hit. Currently, we are using it to determine the total number of dimension rows that satisfy a given query. However, this is unnecessary and wasteful, because the `TopDocs` object returned by the query for the actual data _also_ contains information on the total number of documents hit by the query.
--It looks like
IndexSearcher::count
fires off an entire query, andthen counts the number of documents that were hit. Currently, we are
using it to determine the total number of dimension rows that satisfy a
given query. However, this is unnecessary and wasteful, because the
TopDocs
object returned by the query for the actual data alsocontains information on the total number of documents hit by the query.