Skip to content

Conversation

@rednaxelafx
Copy link
Contributor

@rednaxelafx rednaxelafx commented Nov 18, 2018

What changes were proposed in this pull request?

Extend the ReplaceNullWithFalse optimizer rule introduced in SPARK-25860 (#22857) to also support optimizing predicates in higher-order functions of ArrayExists, ArrayFilter, MapFilter.

Also rename the rule to ReplaceNullWithFalseInPredicate to better reflect its intent.

Example:

select filter(a, e -> if(e is null, null, true)) as b from (
  select array(null, 1, null, 3) as a)

The optimized logical plan:
Before:

== Optimized Logical Plan ==
Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) null else true, lambda e#13, false)) AS b#9]
+- OneRowRelation

After:

== Optimized Logical Plan ==
Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) false else true, lambda e#13, false)) AS b#9]
+- OneRowRelation

How was this patch tested?

Added new unit test cases to the ReplaceNullWithFalseInPredicateSuite (renamed from ReplaceNullWithFalseSuite).

@rednaxelafx
Copy link
Contributor Author

cc @aokolnychyi : I'd like to propose renaming the rule you introduced to add a -InPredicate suffix, because obviously we can't replace arbitrary nulls with false, but only the ones that are going to be directly used in a boolean predicate context (e.g. If(cond, _, _), Filter etc that you've already nicely identified). Does that make sense to you?

BTW thank you very much for introducing that rule. It's really neat!

@SparkQA
Copy link

SparkQA commented Nov 18, 2018

Test build #98975 has finished for PR 23079 at commit 710c886.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@aokolnychyi
Copy link
Contributor

@rednaxelafx I am glad the rule gets more adoption. Renaming also makes sense to me.

Shall we extend ReplaceNullWithFalseEndToEndSuite as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases for ArrayFilter and ArrayExists seem to be identical. As we have those tests anyway, would it make sense to cover different lambda functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I intentionally made all three lambda the same (the MapFilter one only differs in the lambda parameter). I can encapsulate this lambda function into a test utility function. Let me update the PR and see what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a withNewFunctions method in HigherOrderFunction? Then we can simplify this rule to

case f: HigherOrderFunction => f.withNewFunctions(f.functions.map(replaceNullWithFalse))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's useful or not. First of all, the replaceNullWithFalse handling doesn't apply to all higher-order functions. In fact it only applies to a very narrow set, ones where a lambda function returns BooleanType and is immediately used as a predicate. So having a generic utility can certainly help make this PR slightly simpler, but I don't know how useful it is for other cases.
I'd prefer waiting for more such transformation cases to introduce a new utility for the pattern. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see. Sorry I missed it. Then it's safer to use a white-list here.

@SparkQA
Copy link

SparkQA commented Nov 19, 2018

Test build #98996 has finished for PR 23079 at commit 6646a96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ReplaceNullWithFalseInPredicateEndToEndSuite extends QueryTest with SharedSQLContext

@aokolnychyi
Copy link
Contributor

LGTM as well.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in a09d5ba Nov 20, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…higher-order functions: ArrayExists, ArrayFilter, MapFilter

## What changes were proposed in this pull request?

Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 (apache#22857) to also support optimizing predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, `MapFilter`.

Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect its intent.

Example:
```sql
select filter(a, e -> if(e is null, null, true)) as b from (
  select array(null, 1, null, 3) as a)
```
The optimized logical plan:
**Before**:
```
== Optimized Logical Plan ==
Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) null else true, lambda e#13, false)) AS b#9]
+- OneRowRelation
```
**After**:
```
== Optimized Logical Plan ==
Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) false else true, lambda e#13, false)) AS b#9]
+- OneRowRelation
```

## How was this patch tested?

Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` (renamed from `ReplaceNullWithFalseSuite`).

Closes apache#23079 from rednaxelafx/catalyst-master.

Authored-by: Kris Mok <kris.mok@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…icate

## What changes were proposed in this pull request?

Based on apache#22857 and apache#23079, this PR did a few updates

- Limit the data types of NULL to Boolean.
- Limit the input data type of replaceNullWithFalse to Boolean; throw an exception in the testing mode.
- Create a new file for the rule ReplaceNullWithFalseInPredicate
- Update the description of this rule.

## How was this patch tested?
Added a test case

Closes apache#23139 from gatorsmile/followupSpark-25860.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: DB Tsai <d_tsai@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants