Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Feb 25, 2016

What changes were proposed in this pull request?

This pr is to prevent illegal NULL propagation in the query below;

val a = sqlContext.range(10).select(col("id"), lit(0).as("count"))
val b = sqlContext.range(10).select((col("id") % 3).as("id")).groupBy("id").count()
a.join(b, a("id") === b("id"), "left_outer").filter(b("count").isNull)

It returns nothing because b("count") is not nullable and the filter condition is always false by Optimizer.

How was this patch tested?

Added a test for the query above in DataFrameJoinSuite.

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51972 has finished for PR 11371 at commit 9f8ff3d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51976 has finished for PR 11371 at commit 568afee.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 26, 2016

Jenkins, retest this please.

@mengxr
Copy link
Contributor

mengxr commented Feb 26, 2016

cc @yhuai

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52013 has finished for PR 11371 at commit 568afee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the spark13484 branch 2 times, most recently from c305776 to d3733ba Compare February 26, 2016 07:55
@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52034 has finished for PR 11371 at commit c305776.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52035 has finished for PR 11371 at commit d3733ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Feb 26, 2016

@maropu Thank you for the PR. My thought is that we may need to have a place to correct those nullable fields in the analyzer. Let me also think about it.

@maropu
Copy link
Member Author

maropu commented Feb 27, 2016

@yhuai okay.

@maropu
Copy link
Member Author

maropu commented Feb 27, 2016

@yhuai I added a new role SolveIllegalReferences to solve these kinds of illegal references.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52128 has finished for PR 11371 at commit f1718d6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 29, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52159 has finished for PR 11371 at commit f1718d6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2016

Test build #52232 has finished for PR 11371 at commit c17c2b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 2, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52278 has finished for PR 11371 at commit c17c2b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52300 has finished for PR 11371 at commit 9c981fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 6, 2016

@yhuai ping

@rxin
Copy link
Contributor

rxin commented Mar 15, 2016

cc @cloud-fan

@cloud-fan
Copy link
Contributor

I think the fundamental problem is, we give users the resolved attribute but it may not be the real column when using it. For example, b("count") actually is not the real column of the join. Instead of adding some special handling, how about my proposal at #11632?

@maropu
Copy link
Member Author

maropu commented Mar 23, 2016

@cloud-fan If your pull request (#11632) merged, I think the query in the top throws analysis exception, right? SPARK-13484 essentially indicates that the kinds of queries should be correctly resolved in terms of user's usability. Anyway, I agree with your idea in #11632, so I'd like to discuss this based on #11632.
What do you think? cc: @mengxr

@maropu maropu force-pushed the spark13484 branch 2 times, most recently from 0549f88 to 1e45943 Compare April 15, 2016 04:40
@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55897 has finished for PR 11371 at commit 0549f88.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55899 has finished for PR 11371 at commit 1e45943.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55911 has finished for PR 11371 at commit 441d9a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 13, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented May 13, 2016

Test build #58552 has finished for PR 11371 at commit bd13652.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case q: LogicalPlan =>
q.transform {
case f @ Filter(filterCondition, ExtractJoinOutputAttributes(join, joinOutputMap)) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use a q.transformUp to fix the nullability in a bottom-up way? For every node, we create an AttributeMap using the output of its child. Then, we use transformExpressions to fix the nullability if necessary. Let me try it out and ping you when I have a version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I wait your ping.

Copy link
Contributor

@yhuai yhuai May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/spark/pull/13290/files This is the approach that I mentioned above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll check it.

asfgit pushed a commit that referenced this pull request Jun 2, 2016
…ter-join results

## What changes were proposed in this pull request?
This PR add a rule at the end of analyzer to correct nullable fields of attributes in a logical plan by using nullable fields of the corresponding attributes in its children logical plans (these plans generate the input rows).

This is another approach for addressing SPARK-13484 (the first approach is #11371).

Close #113711

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #13290 from yhuai/SPARK-13484.

(cherry picked from commit 5eea332)
Signed-off-by: Cheng Lian <lian@databricks.com>
asfgit pushed a commit that referenced this pull request Jun 2, 2016
…ter-join results

## What changes were proposed in this pull request?
This PR add a rule at the end of analyzer to correct nullable fields of attributes in a logical plan by using nullable fields of the corresponding attributes in its children logical plans (these plans generate the input rows).

This is another approach for addressing SPARK-13484 (the first approach is #11371).

Close #113711

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #13290 from yhuai/SPARK-13484.
@maropu maropu closed this Jun 7, 2016
@maropu maropu deleted the spark13484 branch July 5, 2017 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants