Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bulk Scorer For ToParentBlockJoinQuery #13697
Add Bulk Scorer For ToParentBlockJoinQuery #13697
Changes from 42 commits
6e398eb
79ca811
fbd6ca5
5d55d19
1f39771
7e648ad
8bba0db
57cb4c0
c1dce29
9be3a20
e15105f
3e57ff9
3a1859f
25079b7
8f5a0b0
0f93801
ec3d967
cfd780e
ac14952
8786e58
4768012
2ce93ca
b0dd8cd
bb72ece
50e0db7
58b9868
ef49457
f26cb70
ab1fe69
e58cb26
e83ba94
e19d594
2ef6c37
ca0cd07
51cf4fd
f558d23
d3b7d5c
9d4cc56
e154b43
f76db73
12343b7
5a993e3
64672fa
f287a3b
a62d87c
5a41c63
448af12
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with not advancing
childApproximation
is thattestNextDocValidationForToParentBjq
now fails. I don't see a way to fix this test without advancing the iterator past the child docs. The other option is to remove this test since we can no longer detect that particular edge case whenscoreMode == ScoreMode.None
. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to look at the test case a little closer, but I'd rather we didn't unnecessarily advance to make a test case happy. If there's a much sparser query clause leading a conjunctive iteration, the eager advancing of the child iterator to the next parent could add meaningful overhead. I think anyway?
I'm also not totally sure why we even have that check in the TPBJQ logic. It seems like a slightly odd place to do that check? Maybe there's a different place we could do that check if necessary? Maybe we could do a check in
ParentApproximation#advance
to validate that the child iterator doesn't provide a doc that's present in the parent bitset? Another alternative might be anassert
that eagerly advances to the next parent (like you had)? With anassert
, we could bypass the potential performance drain while still getting some validation coverage?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think an
assert
is an option for two reasons:TestBlockJoinValidation
demonstrate, the edge cases being tested can happen in production.assert
to advance the iterator would introduce a material change to the logic that would not be active in production buildsI will investigate adding this error checking logic to
ParentApproximation#advance
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a clear understanding of why it's important to do this error-check in the first place? It's unclear to me at the moment, and I'm a little worried we're bending the production code to a specific test case without understanding whether-or-not this is important (and why). My best guess here is that the check is trying to ensure we don't incorrectly allow parent documents to contribute to the store accumulation if the docs are indexed incorrectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding, this error check is important to ensure that parent docs do not match child doc queries. If such a thing were to happen in production, I think (at a minimum) there would be negative scoring implications. Perhaps @jpountz could provide some more background on why these checks were added.
Based on the tests that are written to exercise this error check, I do not believe we are bending production code to fit a test case. From what I see, it's the opposite: the tests are written to simulate a scenario that could happen in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a deep dive into how to avoid advancing
childApproximation
and the approach as of 5a993e3 is so close to working, except it doesn't when the child approximation isn't exact and a two-phase iterator is required. I will continue investigating how to apply my solution with a two-phase iterator, but given the deadline with FF I think we should prepare for the outcome where we roll back to advancingchildApproximation
inscoreChildDocs
.I think this is a really interesting optimization and if we run out of time to implement it for Lucene 9.12 & 10, we can do it for a future release.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we have these checks because Lucene could otherwise fail with confusing errors if the child query matched parent documents, which would suggest a bug in Lucene rather than a problem in the application code.
It's ok to remove some of these checks if they cause issues (these checks are best effort anyway), but I'd err on the side of keeping them when they're cheap and e.g. don't impact the set of docs that we're evaluating.
Keeping the removal of this call to advance() works for me, it's somewhat orthogonal to this PR. I agree that removing this wasteful call to advance() would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Mikep86 & @jpountz!