Stops using `IndexSearcher::count` in Lucene. #234

archolewa · 2017-04-12T15:55:42Z

--It looks like IndexSearcher::count fires off an entire query, and
then counts the number of documents that were hit. Currently, we are
using it to determine the total number of dimension rows that satisfy a
given query. However, this is unnecessary and wasteful, because the
TopDocs object returned by the query for the actual data also
contains information on the total number of documents hit by the query.

michael-mclawhorn · 2017-04-12T16:01:15Z

I would put a CHANGELOG entry around this even though it's a small internal change.

michael-mclawhorn

We don't actually test the internal behavior but I can probably live with that. Once I've checked out the changelog I should be able to approve this.

cdeszaq

👍 And I agree w/ Mike, it would be good to call out this expected performance boost in the Changelog (50% fewer Lucene queries in the typical case of getting the 1st page)

archolewa · 2017-04-12T18:19:27Z

@michael-mclawhorn Is the test I added insufficient somehow? In our case, we only use the documentCount to populate the pagination metadata, and I verify that the pagination metadata is correct.

cdeszaq · 2017-04-12T18:21:50Z

I think the test this PR adds is a good addition. I think Mike's point is more that we're making an internal code change which should result in a performance boost, but we're not really checking the internal implementation anywhere.

Personally, I'm 100% OK with not testing this particular difference:

It doesn't (shouldn't) change the correctness of the result
It's going to be difficult to test, since it's very intimate with the class itself (and even that particular method)
I don't think we get much value from testing it

--It looks like `IndexSearcher::count` fires off an entire query, and then counts the number of documents that were hit. Currently, we are using it to determine the total number of dimension rows that satisfy a given query. However, this is unnecessary and wasteful, because the `TopDocs` object returned by the query for the actual data _also_ contains information on the total number of documents hit by the query.

michael-mclawhorn self-assigned this Apr 12, 2017

michael-mclawhorn requested changes Apr 12, 2017

View reviewed changes

cdeszaq approved these changes Apr 12, 2017

View reviewed changes

cdeszaq added the NEED 1 REVIEW label Apr 12, 2017

cdeszaq added the PERFORMANCE label Apr 12, 2017

archolewa force-pushed the remove-count branch from d1d2861 to b49db93 Compare April 12, 2017 19:45

archolewa force-pushed the remove-count branch from b49db93 to 49957c1 Compare April 12, 2017 19:46

Merge branch 'master' into remove-count

26b32f2

cdeszaq merged commit 7167bd8 into master Apr 12, 2017

cdeszaq deleted the remove-count branch April 12, 2017 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stops using `IndexSearcher::count` in Lucene. #234

Stops using `IndexSearcher::count` in Lucene. #234

archolewa commented Apr 12, 2017

michael-mclawhorn commented Apr 12, 2017

michael-mclawhorn left a comment

cdeszaq left a comment

archolewa commented Apr 12, 2017

cdeszaq commented Apr 12, 2017

Stops using IndexSearcher::count in Lucene. #234

Stops using IndexSearcher::count in Lucene. #234

Conversation

archolewa commented Apr 12, 2017

michael-mclawhorn commented Apr 12, 2017

michael-mclawhorn left a comment

Choose a reason for hiding this comment

cdeszaq left a comment

Choose a reason for hiding this comment

archolewa commented Apr 12, 2017

cdeszaq commented Apr 12, 2017

Stops using `IndexSearcher::count` in Lucene. #234

Stops using `IndexSearcher::count` in Lucene. #234