Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break point estimate when threshold exceeded #13199

Merged
merged 7 commits into from
Mar 26, 2024
Merged

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Mar 22, 2024

Typically, we estimate the point value count to compare to a threshold and all we need is just a boolean which represents whether the point count is greater than this threshold. This PR proposes to parse the threshold into the intersect logic and break the recursion when the threshold is exceeded.

Dynamic pruning is a case that heavily using estimate point count so i run luceneutil for it. Here is the benchmark result on wikimedium10m:

M2 Chip

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           HighTermDayOfYearSort      374.33      (1.8%)      429.74      (0.6%)   14.8% (  12% -   17%) 0.000
                      TermDTSort      388.72      (1.7%)      474.43      (1.5%)   22.1% (  18% -   25%) 0.000

Intel Chip

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      123.32      (2.6%)      158.34      (2.5%)   28.4% (  22% -   34%) 0.000
           HighTermDayOfYearSort      391.59      (1.8%)      577.73      (2.7%)   47.5% (  42% -   52%) 0.000

PS: When profiling i noticed that PointTree construction cost a lot so i tried to make it reusable, this optimization also contributed to this speed-up.

@gf2121 gf2121 requested a review from jpountz March 26, 2024 03:49
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big speedup! I left some minor comments.

*
* <p>TODO: Broad-first will help extimation terminate earlier?
*/
public static long estimatePointCount(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: To make contracts of these functions clearer, I'd rather make this function private, and them have another isEstimatedPointCountGreaterThanOrEqualTo public function (and probably tagged with @lucene.internal so that we can evolve it as we want) that calls this private function?

cost += estimatePointCount(visitor, pointTree);
} while (pointTree.moveToSibling());
cost += estimatePointCount(visitor, pointTree, upperBound - cost);
} while (cost <= upperBound && pointTree.moveToSibling());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we can stop counting if cost == upperBound?

Suggested change
} while (cost <= upperBound && pointTree.moveToSibling());
} while (cost < upperBound && pointTree.moveToSibling());

}
assert !pointTree.moveToParent();
return pointTree;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about pulling the point tree in the constructor instead of doing it lazily (for simplicity)?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gf2121 gf2121 merged commit 99b9636 into apache:main Mar 26, 2024
3 checks passed
@gf2121
Copy link
Contributor Author

gf2121 commented Mar 26, 2024

Thanks for review @jpountz !

@gf2121
Copy link
Contributor Author

gf2121 commented Mar 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants