-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13484][SQL] Prevent illegal NULL propagation when filtering outer-join results #11371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #51972 has finished for PR 11371 at commit
|
|
Test build #51976 has finished for PR 11371 at commit
|
|
Jenkins, retest this please. |
|
cc @yhuai |
|
Test build #52013 has finished for PR 11371 at commit
|
c305776 to
d3733ba
Compare
|
Test build #52034 has finished for PR 11371 at commit
|
|
Test build #52035 has finished for PR 11371 at commit
|
|
@maropu Thank you for the PR. My thought is that we may need to have a place to correct those nullable fields in the analyzer. Let me also think about it. |
|
@yhuai okay. |
|
@yhuai I added a new role |
|
Test build #52128 has finished for PR 11371 at commit
|
|
Jenkins, retest this please. |
|
Test build #52159 has finished for PR 11371 at commit
|
|
Test build #52232 has finished for PR 11371 at commit
|
|
Jenkins, retest this please. |
|
Test build #52278 has finished for PR 11371 at commit
|
|
Test build #52300 has finished for PR 11371 at commit
|
|
@yhuai ping |
|
cc @cloud-fan |
|
I think the fundamental problem is, we give users the resolved attribute but it may not be the real column when using it. For example, |
|
@cloud-fan If your pull request (#11632) merged, I think the query in the top throws analysis exception, right? |
0549f88 to
1e45943
Compare
|
Test build #55897 has finished for PR 11371 at commit
|
|
Test build #55899 has finished for PR 11371 at commit
|
|
Test build #55911 has finished for PR 11371 at commit
|
|
Jenkins, retest this please. |
|
Test build #58552 has finished for PR 11371 at commit
|
| def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { | ||
| case q: LogicalPlan => | ||
| q.transform { | ||
| case f @ Filter(filterCondition, ExtractJoinOutputAttributes(join, joinOutputMap)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we use a q.transformUp to fix the nullability in a bottom-up way? For every node, we create an AttributeMap using the output of its child. Then, we use transformExpressions to fix the nullability if necessary. Let me try it out and ping you when I have a version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I wait your ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/spark/pull/13290/files This is the approach that I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll check it.
…ter-join results ## What changes were proposed in this pull request? This PR add a rule at the end of analyzer to correct nullable fields of attributes in a logical plan by using nullable fields of the corresponding attributes in its children logical plans (these plans generate the input rows). This is another approach for addressing SPARK-13484 (the first approach is #11371). Close #113711 Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Author: Yin Huai <yhuai@databricks.com> Closes #13290 from yhuai/SPARK-13484. (cherry picked from commit 5eea332) Signed-off-by: Cheng Lian <lian@databricks.com>
…ter-join results ## What changes were proposed in this pull request? This PR add a rule at the end of analyzer to correct nullable fields of attributes in a logical plan by using nullable fields of the corresponding attributes in its children logical plans (these plans generate the input rows). This is another approach for addressing SPARK-13484 (the first approach is #11371). Close #113711 Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Author: Yin Huai <yhuai@databricks.com> Closes #13290 from yhuai/SPARK-13484.
What changes were proposed in this pull request?
This pr is to prevent illegal NULL propagation in the query below;
It returns nothing because
b("count")is not nullable and the filter condition is always false byOptimizer.How was this patch tested?
Added a test for the query above in
DataFrameJoinSuite.