Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate and better tests for expression re-rewriting / aliasing #3727

Merged
merged 3 commits into from
Oct 12, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 5, 2022

Draft as it builds on #3710

Which issue does this PR close?

re #3704

Rationale for this change

There are at least three places that we have rediscovered the issue of names changing when rewriting expressions such as #3704, #3555 and https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/simplify_expressions.rs#L316

I hope we can avoid other tricky bugs in the future

What changes are included in this PR?

  1. introduce a function rewrite_preserving_name that handles aliasing during rewrites
  2. Change code to use rewrite_preserving_name
  3. Add tests

Are there any user-facing changes?

no

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Oct 5, 2022
@alamb alamb force-pushed the alamb/cleanup_rename branch from cec55d1 to a82ad9e Compare October 5, 2022 18:31
}

Ok(expr)
rewrite_preserving_name(expr, &mut expr_rewrite)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -637,7 +621,8 @@ mod test {
let mut config = OptimizerConfig::default();
let plan = rule.optimize(&plan, &mut config)?;
assert_eq!(
"Projection: a IN ([CAST(Int32(1) AS Int64), CAST(Int8(4) AS Int64), Int64(8)])\n EmptyRelation",
"Projection: a IN ([CAST(Int32(1) AS Int64), CAST(Int8(4) AS Int64), Int64(8)]) AS a IN (Map { iter: Iter([Int32(1), Int8(4), Int64(8)]) })\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @liukun4515 -- I think this is an improvement (and maybe a bug fix 🤔 )

let expr = expr.rewrite(&mut expr_rewriter)?;
add_alias_if_changed(&original_name, expr)
})
.map(|expr| rewrite_preserving_name(expr, &mut expr_rewriter))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR basically refactors the code into rewrite_preserving_name, adds tests, and calls it in a few places

@alamb alamb requested a review from liukun4515 October 5, 2022 18:33
@alamb alamb marked this pull request as ready for review October 5, 2022 18:33
/// `Expr::Sort` as appropriate
fn name_for_alias(expr: &Expr) -> Result<String> {
match expr {
Expr::Sort { expr, .. } => name_for_alias(expr),
Copy link
Contributor

@liukun4515 liukun4515 Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this issue #3710

But I want to know why we need to do the special branch for Sort Expr?

Copy link
Contributor Author

@alamb alamb Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically because calling Expr::name() on a Expr::Sort will throw an error:

https://github.com/apache/arrow-datafusion/blob/7c5c2e5/datafusion/expr/src/expr.rs#L1133-L1135

I am not super thrilled in general about how this works -- I wonder if I should support calling Expr::name() on Expr::Sort 🤔

@liukun4515
Copy link
Contributor

liukun4515 commented Oct 6, 2022

I think there are may place which use the ExprRewriter to rewrite the expr in the optimizer like ConstEvaluator and other.

Do we need to fix all of them?

@alamb
Copy link
Contributor Author

alamb commented Oct 6, 2022

I think there are may place which use the ExprRewriter to rewrite the expr in the optimizer like ConstEvaluator and other.
Do we need to fix all of them?

That is an excellent point @liukun4515 -- I think we probably do. I will file a follow on ticket to track and do so

@alamb
Copy link
Contributor Author

alamb commented Oct 6, 2022

Github actions is having issues I think -- the CI failures are not related to changes in this PR

@liukun4515
Copy link
Contributor

I think there are may place which use the ExprRewriter to rewrite the expr in the optimizer like ConstEvaluator and other.
Do we need to fix all of them?

That is an excellent point @liukun4515 -- I think we probably do. I will file a follow on ticket to track and do so

I have less knowledge about the alias for the expr, can you explain when and where we should do the alias for the expr which will be rewrited.

@alamb
Copy link
Contributor Author

alamb commented Oct 6, 2022

I have less knowledge about the alias for the expr, can you explain when and where we should do the alias for the expr which will be rewrited.

I will file a ticket that explains the issue in more detail

@alamb
Copy link
Contributor Author

alamb commented Oct 11, 2022

I have less knowledge about the alias for the expr, can you explain when and where we should do the alias for the expr which will be rewrited.

@liukun4515 I have filed #3794 which describes the issue more generally and explains why and when aliases are needed

@liukun4515
Copy link
Contributor

I have less knowledge about the alias for the expr, can you explain when and where we should do the alias for the expr which will be rewrited.

@liukun4515 I have filed #3794 which describes the issue more generally and explains why and when aliases are needed

I will take a look your issue with this pr #3788, but it will take more time for this.

@alamb
Copy link
Contributor Author

alamb commented Oct 12, 2022

Unless there are objections, I plan to merge this PR as it doesn't change any behavior and adds additional test coverage. We can continue to improve the code in other places it may be renaming incorrectly as part of #3794

@xudong963
Copy link
Member

Thanks @alamb , LGTM. Let's merge!

@xudong963 xudong963 merged commit e27e86b into apache:master Oct 12, 2022
@ursabot
Copy link

ursabot commented Oct 12, 2022

Benchmark runs are scheduled for baseline = 61c38b7 and contender = e27e86b. e27e86b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb deleted the alamb/cleanup_rename branch October 12, 2022 16:07
@alamb
Copy link
Contributor Author

alamb commented Oct 12, 2022

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants