-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown #21832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
| """stringcol = 'p1" and q="q1' and 'p2" and q="q2' = stringcol""") | ||
|
|
||
| filterTest("SPARK-24879 null literals should be ignored for IN constructs", | ||
| Seq(a("intcol", IntegerType) in (Literal(1), Literal(null))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us add more test cases for better test coverage
| object ExtractableLiterals { | ||
| def unapply(exprs: Seq[Expression]): Option[Seq[String]] = { | ||
| val extractables = exprs.map(ExtractableLiteral.unapply) | ||
| // SPARK-24879: The Hive filter parser does not support "null", but we still want to push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Hive metastore filter parser
| val extractables = exprs.map(ExtractableLiteral.unapply) | ||
| // SPARK-24879: The Hive filter parser does not support "null", but we still want to push | ||
| // down as many predicates as we can while still maintaining correctness. "x in (a, b, | ||
| // null)" can be rewritten as "x in (a, b)" for the purposes of partition pruning, so we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should write down the rules here.
1 in (2, NULL) -> NULL
1 in (1, NULL) -> true
1 in (2) -> false
NULL is not equal to FALSE. Since all the pushed down predicates are NULL intolerant and connected by AND or OR, NULL can be treated as FALSE.
|
Test this please |
|
add to whitelist |
|
Test build #93369 has finished for PR 21832 at commit
|
|
LGTM Thanks! Merged to master/2.3 |
## What changes were proposed in this pull request? We get a NPE when we have a filter on a partition column of the form `col in (x, null)`. This is due to the filter converter in HiveShim not handling `null`s correctly. This patch fixes this bug while still pushing down as much of the partition pruning predicates as possible, by filtering out `null`s from any `in` predicate. Since Hive only supports very simple partition pruning filters, this change should preserve correctness. ## How was this patch tested? Unit tests, manual tests Author: William Sheu <william.sheu@databricks.com> Closes #21832 from PenguinToast/partition-pruning-npe. (cherry picked from commit bbd6f0c) Signed-off-by: Xiao Li <gatorsmile@gmail.com>
What changes were proposed in this pull request?
We get a NPE when we have a filter on a partition column of the form
col in (x, null). This is due to the filter converter in HiveShim not handlingnulls correctly. This patch fixes this bug while still pushing down as much of the partition pruning predicates as possible, by filtering outnulls from anyinpredicate. Since Hive only supports very simple partition pruning filters, this change should preserve correctness.How was this patch tested?
Unit tests, manual tests