[SPARK-13484][SQL] Prevent illegal NULL propagation when filtering outer-join results #11371

maropu · 2016-02-25T16:11:44Z

What changes were proposed in this pull request?

This pr is to prevent illegal NULL propagation in the query below;

val a = sqlContext.range(10).select(col("id"), lit(0).as("count"))
val b = sqlContext.range(10).select((col("id") % 3).as("id")).groupBy("id").count()
a.join(b, a("id") === b("id"), "left_outer").filter(b("count").isNull)

It returns nothing because b("count") is not nullable and the filter condition is always false by Optimizer.

How was this patch tested?

Added a test for the query above in DataFrameJoinSuite.

SparkQA · 2016-02-25T17:51:08Z

Test build #51972 has finished for PR 11371 at commit 9f8ff3d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-25T17:59:32Z

Test build #51976 has finished for PR 11371 at commit 568afee.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-02-26T01:34:19Z

Jenkins, retest this please.

mengxr · 2016-02-26T02:43:54Z

cc @yhuai

SparkQA · 2016-02-26T03:21:03Z

Test build #52013 has finished for PR 11371 at commit 568afee.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-26T09:37:27Z

Test build #52034 has finished for PR 11371 at commit c305776.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-26T09:42:58Z

Test build #52035 has finished for PR 11371 at commit d3733ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-02-26T19:39:29Z

@maropu Thank you for the PR. My thought is that we may need to have a place to correct those nullable fields in the analyzer. Let me also think about it.

maropu · 2016-02-27T01:49:15Z

@yhuai okay.

maropu · 2016-02-27T15:18:28Z

@yhuai I added a new role SolveIllegalReferences to solve these kinds of illegal references.

SparkQA · 2016-02-27T16:44:06Z

Test build #52128 has finished for PR 11371 at commit f1718d6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-02-29T03:40:16Z

Jenkins, retest this please.

SparkQA · 2016-02-29T05:01:28Z

Test build #52159 has finished for PR 11371 at commit f1718d6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-01T11:14:58Z

Test build #52232 has finished for PR 11371 at commit c17c2b2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-03-02T01:34:18Z

Jenkins, retest this please.

SparkQA · 2016-03-02T03:32:50Z

Test build #52278 has finished for PR 11371 at commit c17c2b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-02T08:10:59Z

Test build #52300 has finished for PR 11371 at commit 9c981fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-03-06T16:28:54Z

@yhuai ping

rxin · 2016-03-15T07:59:38Z

cc @cloud-fan

cloud-fan · 2016-03-15T08:13:00Z

I think the fundamental problem is, we give users the resolved attribute but it may not be the real column when using it. For example, b("count") actually is not the real column of the join. Instead of adding some special handling, how about my proposal at #11632?

maropu · 2016-03-23T08:14:08Z

@cloud-fan If your pull request (#11632) merged, I think the query in the top throws analysis exception, right? SPARK-13484 essentially indicates that the kinds of queries should be correctly resolved in terms of user's usability. Anyway, I agree with your idea in #11632, so I'd like to discuss this based on #11632.
What do you think? cc: @mengxr

SparkQA · 2016-04-15T05:48:16Z

Test build #55897 has finished for PR 11371 at commit 0549f88.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T05:54:00Z

Test build #55899 has finished for PR 11371 at commit 1e45943.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T10:14:57Z

Test build #55911 has finished for PR 11371 at commit 441d9a5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-05-13T08:49:32Z

Jenkins, retest this please.

SparkQA · 2016-05-13T10:16:25Z

Test build #58552 has finished for PR 11371 at commit bd13652.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-05-25T02:25:20Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+      case q: LogicalPlan =>
+        q.transform {
+          case f @ Filter(filterCondition, ExtractJoinOutputAttributes(join, joinOutputMap)) =>


How about we use a q.transformUp to fix the nullability in a bottom-up way? For every node, we create an AttributeMap using the output of its child. Then, we use transformExpressions to fix the nullability if necessary. Let me try it out and ping you when I have a version.

okay, I wait your ping.

https://github.com/apache/spark/pull/13290/files This is the approach that I mentioned above.

okay, I'll check it.

…ter-join results ## What changes were proposed in this pull request? This PR add a rule at the end of analyzer to correct nullable fields of attributes in a logical plan by using nullable fields of the corresponding attributes in its children logical plans (these plans generate the input rows). This is another approach for addressing SPARK-13484 (the first approach is #11371). Close #113711 Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Author: Yin Huai <yhuai@databricks.com> Closes #13290 from yhuai/SPARK-13484. (cherry picked from commit 5eea332) Signed-off-by: Cheng Lian <lian@databricks.com>

…ter-join results ## What changes were proposed in this pull request? This PR add a rule at the end of analyzer to correct nullable fields of attributes in a logical plan by using nullable fields of the corresponding attributes in its children logical plans (these plans generate the input rows). This is another approach for addressing SPARK-13484 (the first approach is #11371). Close #113711 Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Author: Yin Huai <yhuai@databricks.com> Closes #13290 from yhuai/SPARK-13484.

maropu force-pushed the spark13484 branch from 9f8ff3d to 568afee Compare February 25, 2016 16:12

maropu force-pushed the spark13484 branch 2 times, most recently from c305776 to d3733ba Compare February 26, 2016 07:55

maropu force-pushed the spark13484 branch from afa05d7 to f1718d6 Compare February 27, 2016 15:19

maropu force-pushed the spark13484 branch from f1718d6 to c17c2b2 Compare March 1, 2016 09:49

maropu force-pushed the spark13484 branch 2 times, most recently from 0549f88 to 1e45943 Compare April 15, 2016 04:40

maropu added 7 commits May 12, 2016 23:21

Avoid illegal NULL propagation

1723a1a

Add comments

bc3a3a7

Add a new rule to solve illegal references

b56da9f

Use foreach not map

274c542

Solve illegal references in Projects

4a7121e

Add tests in DataFrameJoinSuite

a4903b6

Fix test codes in ResolveNaturalJoinSuite

bd13652

maropu force-pushed the spark13484 branch from 441d9a5 to bd13652 Compare May 12, 2016 14:25

yhuai reviewed May 25, 2016
View reviewed changes

yhuai mentioned this pull request May 25, 2016

[SPARK-13484] [SQL] Prevent illegal NULL propagation when filtering outer-join results #13290

Closed

maropu closed this Jun 7, 2016

maropu deleted the spark13484 branch July 5, 2017 11:43

[SPARK-13484][SQL] Prevent illegal NULL propagation when filtering outer-join results #11371

[SPARK-13484][SQL] Prevent illegal NULL propagation when filtering outer-join results #11371

Uh oh!

Conversation

maropu commented Feb 25, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

maropu commented Feb 26, 2016

Uh oh!

mengxr commented Feb 26, 2016

Uh oh!

SparkQA commented Feb 26, 2016

Uh oh!

SparkQA commented Feb 26, 2016

Uh oh!

SparkQA commented Feb 26, 2016

Uh oh!

yhuai commented Feb 26, 2016

Uh oh!

maropu commented Feb 27, 2016

Uh oh!

maropu commented Feb 27, 2016

Uh oh!

SparkQA commented Feb 27, 2016

Uh oh!

maropu commented Feb 29, 2016

Uh oh!

SparkQA commented Feb 29, 2016

Uh oh!

SparkQA commented Mar 1, 2016

Uh oh!

maropu commented Mar 2, 2016

Uh oh!

SparkQA commented Mar 2, 2016

Uh oh!

SparkQA commented Mar 2, 2016

Uh oh!

maropu commented Mar 6, 2016

Uh oh!

rxin commented Mar 15, 2016

Uh oh!

cloud-fan commented Mar 15, 2016

Uh oh!

maropu commented Mar 23, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

maropu commented May 13, 2016

Uh oh!

SparkQA commented May 13, 2016

Uh oh!

yhuai May 25, 2016

Choose a reason for hiding this comment

Uh oh!

maropu May 25, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai May 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu May 25, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

yhuai May 25, 2016 •

edited

Loading