-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make UnKnownColumn
s not equal to others physical exprs
#11536
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving this PR as it fixes a bug and a query that didn't run before will now run
However, I think the proposed fix might be treating the symptom rather than the root cause and I think more investigation is warranted. We could merge this PR and file a follow on ticket to investigate for example
Thank you very much @jonahgao
@@ -431,7 +431,12 @@ impl ExecutionPlan for InterleaveExec { | |||
self: Arc<Self>, | |||
children: Vec<Arc<dyn ExecutionPlan>>, | |||
) -> Result<Arc<dyn ExecutionPlan>> { | |||
Ok(Arc::new(InterleaveExec::try_new(children)?)) | |||
// New children may not be able to interleave; in that case, we fall back to UnionExec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how this particular change is beneficial as now the plan runs where it didn't before, so that is hard to argue with and I would be ok merging it in
However, I feel like this might be silently masking some bug in the planner. My personal preference would be to make this an error
if !can_interleave(children.iter()) {
return internal_err!("Can not create InterleaveExec: new children can not be interleaved");
}
And then track down / fix whatever optimizer pass is causing the children to no longer be interleavable
FYI @mustafasrepo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied. Thank you @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how this particular change is beneficial as now the plan runs where it didn't before, so that is hard to argue with and I would be ok merging it in
However, I feel like this might be silently masking some bug in the planner. My personal preference would be to make this an error
if !can_interleave(children.iter()) { return internal_err!("Can not create InterleaveExec: new children can not be interleaved"); }And then track down / fix whatever optimizer pass is causing the children to no longer be interleavable
FYI @mustafasrepo
This makes sense, invariant should be handled outside the operator during planning. This checks the whether assumptions met. Thanks for this.
UnionExec
if can't interleave UnKnownColumns
not equal to each other
UnKnownColumns
not equal to each otherUnKnownColumn
s not equal to each other
@alamb I think the root clause should be UnKnownColumn. I changed to an alternative fix and updated the description in the PR. |
UnKnownColumn
s not equal to each otherUnKnownColumn
s not equal to others physical exprs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @jonahgao and @mustafasrepo
.map(|x| self == x) | ||
.unwrap_or(false) | ||
fn eq(&self, _other: &dyn Any) -> bool { | ||
// UnknownColumn is not a valid expression, so it should not be equal to any other expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thank you @alamb for the review. |
…#11536) * fix: fall back to `UnionExec` if can't interleave * alternative fix * check interleavable in with_new_children * link to pr
…#11536) * fix: fall back to `UnionExec` if can't interleave * alternative fix * check interleavable in with_new_children * link to pr
Which issue does this PR close?
Closes #11409.
Rationale for this change
After certain optimizations such asProjectionPushdown
, the children ofInterleaveExec
can no longer meet the interleave condition because their output_partitioning becomes inconsistent.Perhaps in this situation, we could consider falling back toUnionExec
fromInterleaveExec
instead of throwing an error.Update
We determine that union inputs can interleave when they have the same hash partition spec.
datafusion/datafusion/physical-plan/src/union.rs
Line 499 in 12d82c4
But this may be incorrect if the expressions in the partition spec contain UnKnownColumn.
When the partitioning expression no longer appears in the plan's output, for example, if the projection expressions in
ProjectionExec
no longer include it, this expression is converted into anUnknownColumn
, usingexpr::to_string()
as its name. This expression comes from the child of current plan. It is invalid for the current plan and shouldn't be used for evaluation or making decisions against the current plan.datafusion/datafusion/physical-plan/src/projection.rs
Line 138 in 12d82c4
Even if two UnknownColumns have the same name, their source expressions might not be the same.
For example, in issue #11409, the partition expressions of both inputs of
InterleaveExec
areUnKnownColumn { name: "v0@0" }
, which corresponds to aColumn
expression named v0 with index 0. However, thisColumn
expression references the output of the child plan of these inputs, and their corresponding source expressions areCAST(t1.v0 AS Float64)
andt2.v0
, respectively.After
ProjectionPushdown
creates new ProjectionExec, it performs some degree of restoration on the UnKnownColumn, they become different and no longer meet the interleave condition.What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No