-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bulk Scorer For ToParentBlockJoinQuery #13697
Conversation
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into it, it's exciting that ToParentBlockJoinQuery
may soon be able to take advantage of our specialized bulk scorers for disjunctions and conjunctions! I left a few comments but this looks on the right track to me.
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Show resolved
Hide resolved
throw new AssertionError(); | ||
|
||
float score = 0; | ||
if (scoreMode != ScoreMode.None) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with not advancing childApproximation
is that testNextDocValidationForToParentBjq
now fails. I don't see a way to fix this test without advancing the iterator past the child docs. The other option is to remove this test since we can no longer detect that particular edge case when scoreMode == ScoreMode.None
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to look at the test case a little closer, but I'd rather we didn't unnecessarily advance to make a test case happy. If there's a much sparser query clause leading a conjunctive iteration, the eager advancing of the child iterator to the next parent could add meaningful overhead. I think anyway?
I'm also not totally sure why we even have that check in the TPBJQ logic. It seems like a slightly odd place to do that check? Maybe there's a different place we could do that check if necessary? Maybe we could do a check in ParentApproximation#advance
to validate that the child iterator doesn't provide a doc that's present in the parent bitset? Another alternative might be an assert
that eagerly advances to the next parent (like you had)? With an assert
, we could bypass the potential performance drain while still getting some validation coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think an assert
is an option for two reasons:
- As the tests in
TestBlockJoinValidation
demonstrate, the edge cases being tested can happen in production. - Using an
assert
to advance the iterator would introduce a material change to the logic that would not be active in production builds
I will investigate adding this error checking logic to ParentApproximation#advance
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a clear understanding of why it's important to do this error-check in the first place? It's unclear to me at the moment, and I'm a little worried we're bending the production code to a specific test case without understanding whether-or-not this is important (and why). My best guess here is that the check is trying to ensure we don't incorrectly allow parent documents to contribute to the store accumulation if the docs are indexed incorrectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding, this error check is important to ensure that parent docs do not match child doc queries. If such a thing were to happen in production, I think (at a minimum) there would be negative scoring implications. Perhaps @jpountz could provide some more background on why these checks were added.
Based on the tests that are written to exercise this error check, I do not believe we are bending production code to fit a test case. From what I see, it's the opposite: the tests are written to simulate a scenario that could happen in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a deep dive into how to avoid advancing childApproximation
and the approach as of 5a993e3 is so close to working, except it doesn't when the child approximation isn't exact and a two-phase iterator is required. I will continue investigating how to apply my solution with a two-phase iterator, but given the deadline with FF I think we should prepare for the outcome where we roll back to advancing childApproximation
in scoreChildDocs
.
I think this is a really interesting optimization and if we run out of time to implement it for Lucene 9.12 & 10, we can do it for a future release.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we have these checks because Lucene could otherwise fail with confusing errors if the child query matched parent documents, which would suggest a bug in Lucene rather than a problem in the application code.
It's ok to remove some of these checks if they cause issues (these checks are best effort anyway), but I'd err on the side of keeping them when they're cheap and e.g. don't impact the set of docs that we're evaluating.
Keeping the removal of this call to advance() works for me, it's somewhat orthogonal to this PR. I agree that removing this wasteful call to advance() would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinScorer.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
throw new AssertionError(); | ||
|
||
float score = 0; | ||
if (scoreMode != ScoreMode.None) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we have these checks because Lucene could otherwise fail with confusing errors if the child query matched parent documents, which would suggest a bug in Lucene rather than a problem in the application code.
It's ok to remove some of these checks if they cause issues (these checks are best effort anyway), but I'd err on the side of keeping them when they're cheap and e.g. don't impact the set of docs that we're evaluating.
Keeping the removal of this call to advance() works for me, it's somewhat orthogonal to this PR. I agree that removing this wasteful call to advance() would be nice.
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java
Outdated
Show resolved
Hide resolved
There's a couple of recent test failures, in main as well as 9x, that may have to do with this change, judging from the area that it touches:
|
No description provided.