-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter #23079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @aokolnychyi : I'd like to propose renaming the rule you introduced to add a BTW thank you very much for introducing that rule. It's really neat! |
|
Test build #98975 has finished for PR 23079 at commit
|
|
@rednaxelafx I am glad the rule gets more adoption. Renaming also makes sense to me. Shall we extend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases for ArrayFilter and ArrayExists seem to be identical. As we have those tests anyway, would it make sense to cover different lambda functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I intentionally made all three lambda the same (the MapFilter one only differs in the lambda parameter). I can encapsulate this lambda function into a test utility function. Let me update the PR and see what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add a withNewFunctions method in HigherOrderFunction? Then we can simplify this rule to
case f: HigherOrderFunction => f.withNewFunctions(f.functions.map(replaceNullWithFalse))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that's useful or not. First of all, the replaceNullWithFalse handling doesn't apply to all higher-order functions. In fact it only applies to a very narrow set, ones where a lambda function returns BooleanType and is immediately used as a predicate. So having a generic utility can certainly help make this PR slightly simpler, but I don't know how useful it is for other cases.
I'd prefer waiting for more such transformation cases to introduce a new utility for the pattern. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see. Sorry I missed it. Then it's safer to use a white-list here.
…ons: ArrayExists, ArrayFilter, MapFilter
710c886 to
6646a96
Compare
|
Test build #98996 has finished for PR 23079 at commit
|
|
LGTM as well. |
|
thanks, merging to master! |
…higher-order functions: ArrayExists, ArrayFilter, MapFilter ## What changes were proposed in this pull request? Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 (apache#22857) to also support optimizing predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, `MapFilter`. Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect its intent. Example: ```sql select filter(a, e -> if(e is null, null, true)) as b from ( select array(null, 1, null, 3) as a) ``` The optimized logical plan: **Before**: ``` == Optimized Logical Plan == Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) null else true, lambda e#13, false)) AS b#9] +- OneRowRelation ``` **After**: ``` == Optimized Logical Plan == Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) false else true, lambda e#13, false)) AS b#9] +- OneRowRelation ``` ## How was this patch tested? Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` (renamed from `ReplaceNullWithFalseSuite`). Closes apache#23079 from rednaxelafx/catalyst-master. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…icate ## What changes were proposed in this pull request? Based on apache#22857 and apache#23079, this PR did a few updates - Limit the data types of NULL to Boolean. - Limit the input data type of replaceNullWithFalse to Boolean; throw an exception in the testing mode. - Create a new file for the rule ReplaceNullWithFalseInPredicate - Update the description of this rule. ## How was this patch tested? Added a test case Closes apache#23139 from gatorsmile/followupSpark-25860. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>
What changes were proposed in this pull request?
Extend the
ReplaceNullWithFalseoptimizer rule introduced in SPARK-25860 (#22857) to also support optimizing predicates in higher-order functions ofArrayExists,ArrayFilter,MapFilter.Also rename the rule to
ReplaceNullWithFalseInPredicateto better reflect its intent.Example:
The optimized logical plan:
Before:
After:
How was this patch tested?
Added new unit test cases to the
ReplaceNullWithFalseInPredicateSuite(renamed fromReplaceNullWithFalseSuite).