[SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown #21832

PenguinToast · 2018-07-20T20:06:03Z

What changes were proposed in this pull request?

We get a NPE when we have a filter on a partition column of the form col in (x, null). This is due to the filter converter in HiveShim not handling nulls correctly. This patch fixes this bug while still pushing down as much of the partition pruning predicates as possible, by filtering out nulls from any in predicate. Since Hive only supports very simple partition pruning filters, this change should preserve correctness.

How was this patch tested?

Unit tests, manual tests

gatorsmile · 2018-07-20T20:07:37Z

ok to test

gatorsmile · 2018-07-20T20:09:37Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala

    """stringcol = 'p1" and q="q1' and 'p2" and q="q2' = stringcol""")

+  filterTest("SPARK-24879 null literals should be ignored for IN constructs",
+    Seq(a("intcol", IntegerType) in (Literal(1), Literal(null))),


Let us add more test cases for better test coverage

gatorsmile · 2018-07-20T20:11:48Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

    object ExtractableLiterals {
      def unapply(exprs: Seq[Expression]): Option[Seq[String]] = {
-        val extractables = exprs.map(ExtractableLiteral.unapply)
+        // SPARK-24879: The Hive filter parser does not support "null", but we still want to push


-> Hive metastore filter parser

gatorsmile · 2018-07-20T20:28:26Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

-        val extractables = exprs.map(ExtractableLiteral.unapply)
+        // SPARK-24879: The Hive filter parser does not support "null", but we still want to push
+        // down as many predicates as we can while still maintaining correctness. "x in (a, b,
+        // null)" can be rewritten as "x in (a, b)" for the purposes of partition pruning, so we


Maybe we should write down the rules here.
1 in (2, NULL) -> NULL
1 in (1, NULL) -> true
1 in (2) -> false

NULL is not equal to FALSE. Since all the pushed down predicates are NULL intolerant and connected by AND or OR, NULL can be treated as FALSE.

gatorsmile · 2018-07-20T20:33:51Z

Test this please

gatorsmile · 2018-07-20T21:59:08Z

add to whitelist

SparkQA · 2018-07-21T00:22:09Z

Test build #93369 has finished for PR 21832 at commit ce86fbe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-07-21T02:58:48Z

LGTM

Thanks! Merged to master/2.3

## What changes were proposed in this pull request? We get a NPE when we have a filter on a partition column of the form `col in (x, null)`. This is due to the filter converter in HiveShim not handling `null`s correctly. This patch fixes this bug while still pushing down as much of the partition pruning predicates as possible, by filtering out `null`s from any `in` predicate. Since Hive only supports very simple partition pruning filters, this change should preserve correctness. ## How was this patch tested? Unit tests, manual tests Author: William Sheu <william.sheu@databricks.com> Closes #21832 from PenguinToast/partition-pruning-npe. (cherry picked from commit bbd6f0c) Signed-off-by: Xiao Li <gatorsmile@gmail.com>

Filter out null values for partition pruning predicates

388caa9

gatorsmile reviewed Jul 20, 2018

View reviewed changes

Prevent partition pruning filters with null from being pushed down

34d74e3

gatorsmile reviewed Jul 20, 2018

View reviewed changes

Clearly document the nuances of this change

ce86fbe

asfgit closed this in bbd6f0c Jul 21, 2018

c21 mentioned this pull request Feb 24, 2021

[SPARK-34515][SQL] Fix NPE if InSet contains null value during getPartitionsByFilter #31632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown #21832

[SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown #21832

Uh oh!

PenguinToast commented Jul 20, 2018

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

gatorsmile Jul 20, 2018

Uh oh!

gatorsmile Jul 20, 2018

Uh oh!

gatorsmile Jul 20, 2018

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

SparkQA commented Jul 21, 2018

Uh oh!

gatorsmile commented Jul 21, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown #21832

[SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown #21832

Uh oh!

Conversation

PenguinToast commented Jul 20, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

gatorsmile Jul 20, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jul 20, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jul 20, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

gatorsmile commented Jul 20, 2018

Uh oh!

SparkQA commented Jul 21, 2018

Uh oh!

gatorsmile commented Jul 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gatorsmile commented Jul 21, 2018 •

edited

Loading