Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stops using IndexSearcher::count in Lucene. #234

Merged
merged 2 commits into from
Apr 12, 2017
Merged

Stops using IndexSearcher::count in Lucene. #234

merged 2 commits into from
Apr 12, 2017

Conversation

archolewa
Copy link
Contributor

--It looks like IndexSearcher::count fires off an entire query, and
then counts the number of documents that were hit. Currently, we are
using it to determine the total number of dimension rows that satisfy a
given query. However, this is unnecessary and wasteful, because the
TopDocs object returned by the query for the actual data also
contains information on the total number of documents hit by the query.

@michael-mclawhorn
Copy link
Contributor

I would put a CHANGELOG entry around this even though it's a small internal change.

@michael-mclawhorn michael-mclawhorn self-assigned this Apr 12, 2017
Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually test the internal behavior but I can probably live with that. Once I've checked out the changelog I should be able to approve this.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 And I agree w/ Mike, it would be good to call out this expected performance boost in the Changelog (50% fewer Lucene queries in the typical case of getting the 1st page)

@archolewa
Copy link
Contributor Author

@michael-mclawhorn Is the test I added insufficient somehow? In our case, we only use the documentCount to populate the pagination metadata, and I verify that the pagination metadata is correct.

@cdeszaq
Copy link
Collaborator

cdeszaq commented Apr 12, 2017

I think the test this PR adds is a good addition. I think Mike's point is more that we're making an internal code change which should result in a performance boost, but we're not really checking the internal implementation anywhere.

Personally, I'm 100% OK with not testing this particular difference:

  1. It doesn't (shouldn't) change the correctness of the result
  2. It's going to be difficult to test, since it's very intimate with the class itself (and even that particular method)
  3. I don't think we get much value from testing it

--It looks like `IndexSearcher::count` fires off an entire query, and
then counts the number of documents that were hit. Currently, we are
using it to determine the total number of dimension rows that satisfy a
given query. However, this is unnecessary and wasteful, because the
`TopDocs` object returned by the query for the actual data _also_
contains information on the total number of documents hit by the query.
@cdeszaq cdeszaq merged commit 7167bd8 into master Apr 12, 2017
@cdeszaq cdeszaq deleted the remove-count branch April 12, 2017 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants