-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27815][SQL] Predicate pushdown in one pass for cascading joins #24956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #106853 has finished for PR 24956 at commit
|
9b6fe2e to
d8b63e2
Compare
|
Test build #106904 has finished for PR 24956 at commit
|
|
Test build #106946 has finished for PR 24956 at commit
|
8492e11 to
f46c5c6
Compare
|
Test build #106948 has finished for PR 24956 at commit
|
| CollapseRepartition, | ||
| CollapseProject, | ||
| CollapseWindow, | ||
| CombineFilters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us keep this.
|
LGTM except one comment |
|
Test build #107136 has finished for PR 24956 at commit
|
|
retest this please |
|
Test build #107151 has finished for PR 24956 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #107170 has finished for PR 24956 at commit
|
|
Thanks! Merged to master. |
|
What about updating the JIRA instead since the commit(74f1176) is permanant? @gatorsmile and @cloud-fan . |
|
K. Let me do that. |
|
Thanks! |
What changes were proposed in this pull request?
This PR makes the predicate pushdown logic in catalyst optimizer more efficient by unifying two existing rules
PushdownPredicatesandPushPredicateThroughJoin. Previously pushing down a predicate for queries such asFilter(Join(Join(Join)))requires n steps. This patch essentially reduces this to a single pass.To make this actually work, we need to unify a few rules such as
CombineFilters,PushDownPredicateandPushDownPrdicateThroughJoin. Otherwise cases such asFilter(Join(Filter(Join)))still requires several passes to fully push down predicates. This unification is done by composing several partial functions, which makes a minimal code change and can reuse existing UTs.Results show that this optimization can improve the catalyst optimization time by 16.5%. For queries with more joins, the performance is even better. E.g., for TPC-DS q64, the performance boost is 49.2%.
How was this patch tested?
Existing UTs + new a UT for the new rule.