Refactor: `Add LogicalPlan::observe_expressions` to walk expressions #4906

alamb · 2023-01-14T12:53:59Z

Which issue does this PR close?

Inspired by the bug in #4900 and discussion with @avantgardnerio on #4701 (comment)

Rationale for this change

I realized while reading #4900 there were likely other places in Expr that could have subqueries so there may be other bugs lurking similar to #4898

However, there was no good way to walk all expressions in a LogicalPlan other than to call LogicalPlan::expressions() which clones them all

What changes are included in this PR?

Add LogicalPlan::inspect_expressions that calls a function on each expression
Rewrite subquery traversal to use this this function

Are these changes tested?

Are there any user-facing changes?

alamb · 2023-01-14T12:54:40Z

datafusion/expr/src/logical_plan/plan.rs

@@ -233,42 +233,60 @@ impl LogicalPlan {
    /// logical plan node. This does not include expressions in any
    /// children
    pub fn expressions(self: &LogicalPlan) -> Vec<Expr> {
+        let mut exprs = vec![];
+        self.observe_expressions(|e| exprs.push(e.clone()));


The basic idea is to refactor the current code to clone all expressions to just call a function

alamb · 2023-01-14T12:55:39Z

datafusion/expr/src/logical_plan/plan.rs


-    fn collect_subqueries(expr: &Expr, sub: &mut Vec<Arc<LogicalPlan>>) {
-        match expr {


This expr is missing all sorts of possible locations for subqueries (like IS NULL); @askoa fixed one in #4900 but there are very likely more

alamb · 2023-01-14T13:22:38Z

datafusion/expr/src/logical_plan/plan.rs

+                        // use a synthetic plan so the visitor sees a
+                        // LogicalPlan::Subquery (even though it is
+                        // actually a Subquery alias)
+                        let synthetic_plan = LogicalPlan::Subquery(subquery.clone());


while this behavior is strange it is consistent with what collect_subqueries was doing

alamb · 2023-01-14T13:22:57Z

datafusion/optimizer/src/push_down_filter.rs

@@ -396,7 +396,7 @@ fn push_down_all_join(
            .chain(once(keep_condition.into_iter().reduce(Expr::and).unwrap()))
            .collect()
    } else {
-        plan.expressions()
+        expr


drive by cleanup to avoid (yet another) copy of the expressions

askoa · 2023-01-14T14:38:53Z

datafusion/expr/src/expr_visitor.rs

+/// Conveniece function for using a mutable function as an expression visiitor
+///
+/// TODO make this match names in physical plan
+pub fn walk_expr_down<F, E>(expr: &Expr, f: F) -> std::result::Result<(), E>


avantgardnerio

Makes sense 👍

ursabot · 2023-01-15T11:42:23Z

Benchmark runs are scheduled for baseline = d37dccf and contender = d49c805. d49c805 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the logical-expr Logical plan and expressions label Jan 14, 2023

alamb commented Jan 14, 2023

View reviewed changes

Refactor: Add LogicalPlan::observe_expressions to walk expressions

9f1b465

alamb force-pushed the alamb/walk_all_exprs branch from 112bd27 to 9f1b465 Compare January 14, 2023 13:16

alamb changed the title ~~Alamb/walk all exprs~~ Refactor: Add LogicalPlan::observe_expressions to walk expressions Jan 14, 2023

github-actions bot added the optimizer Optimizer rules label Jan 14, 2023

alamb changed the title ~~Refactor: Add LogicalPlan::observe_expressions to walk expressions~~ Refactor: Add LogicalPlan::observe_expressions to walk expressions Jan 14, 2023

alamb marked this pull request as ready for review January 14, 2023 13:20

alamb mentioned this pull request Jan 14, 2023

fix: Visit subqueries in Expr::Alias #4900

Merged

alamb commented Jan 14, 2023

View reviewed changes

alamb mentioned this pull request Jan 14, 2023

WIP Stop copying Exprs so much in logical planning #4907

Closed

askoa reviewed Jan 14, 2023

View reviewed changes

avantgardnerio approved these changes Jan 14, 2023

View reviewed changes

alamb merged commit d49c805 into apache:master Jan 15, 2023

alamb mentioned this pull request Jan 15, 2023

Improve documentation for ExprVisitor, port simple uses to new walking function #4916

Merged

alamb deleted the alamb/walk_all_exprs branch August 8, 2023 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: `Add LogicalPlan::observe_expressions` to walk expressions #4906

Refactor: `Add LogicalPlan::observe_expressions` to walk expressions #4906

alamb commented Jan 14, 2023 •

edited

Loading

alamb Jan 14, 2023

alamb Jan 14, 2023

alamb Jan 14, 2023

alamb Jan 14, 2023

askoa Jan 14, 2023

avantgardnerio left a comment

ursabot commented Jan 15, 2023


		fn collect_subqueries(expr: &Expr, sub: &mut Vec<Arc<LogicalPlan>>) {
		match expr {

Refactor: Add LogicalPlan::observe_expressions to walk expressions #4906

Refactor: Add LogicalPlan::observe_expressions to walk expressions #4906

Conversation

alamb commented Jan 14, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Jan 14, 2023

Choose a reason for hiding this comment

alamb Jan 14, 2023

Choose a reason for hiding this comment

alamb Jan 14, 2023

Choose a reason for hiding this comment

alamb Jan 14, 2023

Choose a reason for hiding this comment

askoa Jan 14, 2023

Choose a reason for hiding this comment

avantgardnerio left a comment

Choose a reason for hiding this comment

ursabot commented Jan 15, 2023

Refactor: `Add LogicalPlan::observe_expressions` to walk expressions #4906

Refactor: `Add LogicalPlan::observe_expressions` to walk expressions #4906

alamb commented Jan 14, 2023 •

edited

Loading