Add Bulk Scorer For ToParentBlockJoinQuery #13697

Mikep86 · 2024-08-28T22:54:26Z

No description provided.

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

jpountz

Thanks for looking into it, it's exciting that ToParentBlockJoinQuery may soon be able to take advantage of our specialized bulk scorers for disjunctions and conjunctions! I left a few comments but this looks on the right track to me.

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

lucene/CHANGES.txt

Mikep86 · 2024-09-09T19:32:51Z

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

-              throw new AssertionError();
+
+      float score = 0;
+      if (scoreMode != ScoreMode.None) {


One issue with not advancing childApproximation is that testNextDocValidationForToParentBjq now fails. I don't see a way to fix this test without advancing the iterator past the child docs. The other option is to remove this test since we can no longer detect that particular edge case when scoreMode == ScoreMode.None. Thoughts?

I'll have to look at the test case a little closer, but I'd rather we didn't unnecessarily advance to make a test case happy. If there's a much sparser query clause leading a conjunctive iteration, the eager advancing of the child iterator to the next parent could add meaningful overhead. I think anyway?

I'm also not totally sure why we even have that check in the TPBJQ logic. It seems like a slightly odd place to do that check? Maybe there's a different place we could do that check if necessary? Maybe we could do a check in ParentApproximation#advance to validate that the child iterator doesn't provide a doc that's present in the parent bitset? Another alternative might be an assert that eagerly advances to the next parent (like you had)? With an assert, we could bypass the potential performance drain while still getting some validation coverage?

I don't think an assert is an option for two reasons:

As the tests in TestBlockJoinValidation demonstrate, the edge cases being tested can happen in production.

Using an assert to advance the iterator would introduce a material change to the logic that would not be active in production builds

I will investigate adding this error checking logic to ParentApproximation#advance.

Do we have a clear understanding of why it's important to do this error-check in the first place? It's unclear to me at the moment, and I'm a little worried we're bending the production code to a specific test case without understanding whether-or-not this is important (and why). My best guess here is that the check is trying to ensure we don't incorrectly allow parent documents to contribute to the store accumulation if the docs are indexed incorrectly?

Based on my understanding, this error check is important to ensure that parent docs do not match child doc queries. If such a thing were to happen in production, I think (at a minimum) there would be negative scoring implications. Perhaps @jpountz could provide some more background on why these checks were added.

Based on the tests that are written to exercise this error check, I do not believe we are bending production code to fit a test case. From what I see, it's the opposite: the tests are written to simulate a scenario that could happen in production.

I did a deep dive into how to avoid advancing childApproximation and the approach as of 5a993e3 is so close to working, except it doesn't when the child approximation isn't exact and a two-phase iterator is required. I will continue investigating how to apply my solution with a two-phase iterator, but given the deadline with FF I think we should prepare for the outcome where we roll back to advancing childApproximation in scoreChildDocs.

I think this is a really interesting optimization and if we run out of time to implement it for Lucene 9.12 & 10, we can do it for a future release.

Thoughts?

I believe that we have these checks because Lucene could otherwise fail with confusing errors if the child query matched parent documents, which would suggest a bug in Lucene rather than a problem in the application code.

It's ok to remove some of these checks if they cause issues (these checks are best effort anyway), but I'd err on the side of keeping them when they're cheap and e.g. don't impact the set of docs that we're evaluating.

Keeping the removal of this call to advance() works for me, it's somewhat orthogonal to this PR. I agree that removing this wasteful call to advance() would be nice.

Thanks @Mikep86 & @jpountz!

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinScorer.java

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

jpountz · 2024-09-10T21:20:18Z

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

-              throw new AssertionError();
+
+      float score = 0;
+      if (scoreMode != ScoreMode.None) {


I believe that we have these checks because Lucene could otherwise fail with confusing errors if the child query matched parent documents, which would suggest a bug in Lucene rather than a problem in the application code.

It's ok to remove some of these checks if they cause issues (these checks are best effort anyway), but I'd err on the side of keeping them when they're cheap and e.g. don't impact the set of docs that we're evaluating.

Keeping the removal of this call to advance() works for me, it's somewhat orthogonal to this PR. I agree that removing this wasteful call to advance() would be nice.

This reverts commits 12343b7 and 5a993e3

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java

…13764)

javanna · 2024-09-16T07:17:31Z

There's a couple of recent test failures, in main as well as 9x, that may have to do with this change, judging from the area that it touches:

FAILED:  org.apache.lucene.search.join.TestBlockJoinBulkScorer.testSetMinCompetitiveScoreWithScoreModeMax

Error Message:
java.lang.AssertionError: expected:<{16=5.0, 10=10.0, 2=6.0}> but was:<{2=6.0, 10=10.0}>

Stack Trace:
java.lang.AssertionError: expected:<{16=5.0, 10=10.0, 2=6.0}> but was:<{2=6.0, 10=10.0}>
        at __randomizedtesting.SeedInfo.seed([8531AA8A82E0A86C:D0DE9F82717ED75C]:0)
        at app/junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
        at app/junit@4.13.1/org.junit.Assert.failNotEquals(Assert.java:835)
        at app/junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:120)
        at app/junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:146)
        at app//org.apache.lucene.search.join.TestBlockJoinBulkScorer.assertScores(TestBlockJoinBulkScorer.java:301)
        at app//org.apache.lucene.search.join.TestBlockJoinBulkScorer.testSetMinCompetitiveScoreWithScoreModeMax(TestBlockJoinBulkScorer.java:397)

jpountz · 2024-09-16T07:19:46Z

FYI @Mikep86 opened a PR at #13785.

Mikep86 added 14 commits August 26, 2024 13:46

Added BlockJoinBulkScorer

6e398eb

BlockJoinBulkScorer development

79ca811

Fix assertion failures

fbd6ca5

Added TestBlockJoinBulkScorer

5d55d19

Test development

1f39771

Compute expected scores and compare them to actual scores

7e648ad

Randomize score mode. Score multiple random indices.

8bba0db

Filter out empty child docs

57cb4c0

Randomize search score mode

c1dce29

Updated approach to handle when scoring in multiple batches

9be3a20

Increase test iterations, fix assertion error

e15105f

fix assertion error

3e57ff9

Handle when score supplier is null

3a1859f

Fix min score computation

25079b7

Mikep86 commented Aug 28, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java Show resolved Hide resolved

Mikep86 commented Aug 28, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java Show resolved Hide resolved

Mikep86 commented Aug 28, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java Outdated Show resolved Hide resolved

Mikep86 commented Aug 28, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java Show resolved Hide resolved

Add license

8f5a0b0

Mikep86 commented Aug 28, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Outdated Show resolved Hide resolved

jpountz reviewed Aug 29, 2024

View reviewed changes

Mikep86 added 6 commits August 29, 2024 18:18

Change batching approach

0f93801

Remove unnecessary null check

ec3d967

Scoring computation adjustments

cfd780e

Remove unnecessary scorer null checks

ac14952

Check that there are no matches when score supplier is null

8786e58

Stop scoring once we've scored the last parent

4768012

jpountz reviewed Aug 30, 2024

View reviewed changes

Mikep86 added 2 commits August 30, 2024 18:17

Calculate scores using doubles

2ce93ca

Remove currentMin

b0dd8cd

Updated CHANGES.txt

d3b7d5c

jpountz approved these changes Sep 7, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Show resolved Hide resolved

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java Outdated Show resolved Hide resolved

Fix test

9d4cc56

Mikep86 commented Sep 9, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java Show resolved Hide resolved

Scoring optimizations

e154b43

jpountz approved these changes Sep 9, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Show resolved Hide resolved

lucene/CHANGES.txt Outdated Show resolved Hide resolved

Mikep86 commented Sep 9, 2024

View reviewed changes

Mikep86 added 2 commits September 9, 2024 15:54

Add/improve comments

f76db73

Move error check into ParentApproximation#advance

12343b7

Mikep86 commented Sep 10, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Outdated Show resolved Hide resolved

Mikep86 commented Sep 10, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinScorer.java Outdated Show resolved Hide resolved

Mikep86 commented Sep 10, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Outdated Show resolved Hide resolved

Mikep86 commented Sep 10, 2024

View reviewed changes

lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinValidation.java Outdated Show resolved Hide resolved

ParentApproximation#advance logic adjustments

5a993e3

Mikep86 commented Sep 10, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Outdated Show resolved Hide resolved

jpountz approved these changes Sep 10, 2024

View reviewed changes

Mikep86 added 2 commits September 11, 2024 08:11

Revert ParentApproximation#advance error-checking logic

64672fa

This reverts commits 12343b7 and 5a993e3

Fix test

f287a3b

jpountz approved these changes Sep 11, 2024

View reviewed changes

lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java Outdated Show resolved Hide resolved

Mikep86 added 3 commits September 11, 2024 08:32

Merge branch 'main' into nested-query_bulk-scorer

a62d87c

Fix build error

5a41c63

Improve comment

448af12

jpountz approved these changes Sep 11, 2024

View reviewed changes

jpountz merged commit 45da83b into apache:main Sep 11, 2024
3 checks passed

jpountz deleted the nested-query_bulk-scorer branch September 11, 2024 13:02

Mikep86 added a commit to Mikep86/lucene that referenced this pull request Sep 11, 2024

Back-port changes from apache#13697

7b19c84

Mikep86 mentioned this pull request Sep 11, 2024

Back-port Add Bulk Scorer For ToParentBlockJoinQuery (#13697) to 9.x #13764

Merged

jpountz pushed a commit that referenced this pull request Sep 11, 2024

Back-port Add Bulk Scorer For ToParentBlockJoinQuery (#13697) to 9.x (#…

4d379db

…13764)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bulk Scorer For ToParentBlockJoinQuery #13697

Add Bulk Scorer For ToParentBlockJoinQuery #13697

Mikep86 commented Aug 28, 2024

jpountz left a comment

Mikep86 Sep 9, 2024

gsmiller Sep 9, 2024

Mikep86 Sep 10, 2024

gsmiller Sep 10, 2024

Mikep86 Sep 10, 2024

Mikep86 Sep 10, 2024

jpountz Sep 10, 2024

gsmiller Sep 11, 2024

jpountz Sep 10, 2024

javanna commented Sep 16, 2024

jpountz commented Sep 16, 2024

Add Bulk Scorer For ToParentBlockJoinQuery #13697

Add Bulk Scorer For ToParentBlockJoinQuery #13697

Conversation

Mikep86 commented Aug 28, 2024

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Sep 16, 2024

jpountz commented Sep 16, 2024