-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible
#9999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1b98219 to
5fcbede
Compare
| let new_children = self | ||
| .inputs() | ||
| .into_iter() | ||
| .cloned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this call to cloned() (and the self.with_new_exprs below) is the point of this PR.
As a subsequent PRs I plan to rewrite the Optimizer and the various passes to use this method to rewrite the plans without copying them
| where | ||
| F: FnMut(Self) -> Result<Transformed<Self>>, | ||
| { | ||
| Ok(match self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the (really very cool) pattern @peter-toth came up with in #9913
5fcbede to
36b28ad
Compare
36b28ad to
56af5e1
Compare
| } | ||
| } | ||
|
|
||
| /// Converts a `Arc<LogicalPlan>` without copying, if possible. Copies the plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the code that avoids copying for Arc<LogicalPlan> when possible (my performance results show it is possible most of the time)
Also, you can see it with a local change like this:
diff --git a/datafusion/expr/src/logical_plan/tree_node.rs b/datafusion/expr/src/logical_plan/tree_node.rs
index 97e2f7f56..a8570c0af 100644
--- a/datafusion/expr/src/logical_plan/tree_node.rs
+++ b/datafusion/expr/src/logical_plan/tree_node.rs
@@ -338,10 +338,16 @@ impl TreeNode for LogicalPlan {
/// Converts a `Arc<LogicalPlan>` without copying, if possible. Copies the plan
/// if there is a shared reference
fn unwrap_arc(plan: Arc<LogicalPlan>) -> LogicalPlan {
- Arc::try_unwrap(plan)
- // if None is returned, there is another reference to this
- // LogicalPlan, so we can not own it, and must clone instead
- .unwrap_or_else(|node| node.as_ref().clone())
+ match Arc::try_unwrap(plan) {
+ Ok(plan) => {
+ println!("unwrapped!");
+ plan
+ }
+ Err(plan) => {
+ println!("BOO copying");
+ plan.as_ref().clone()
+ }
+ }
}
/// Applies `f` to rewrite a `Arc<LogicalPlan>` without copying, if possibleAnd then running
cargo test --test sqllogictests
...
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!There still is plenty of copying going on (957 copies), but there are 22,913 less copies!
andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ cargo test --test sqllogictests | grep BOO | wc -l
Finished test [unoptimized + debuginfo] target(s) in 0.13s
Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-518eef2279430877)
957
andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ cargo test --test sqllogictests | grep unwrapped | wc -l
Finished test [unoptimized + debuginfo] target(s) in 0.13s
Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-518eef2279430877)
22913
LogicalPlan::clone() in LogicalPlan::map_children when possibleLogicalPlan::clone() in LogicalPlan::map_children when possible
| { | ||
| Ok(input_plans | ||
| .into_iter() | ||
| .map(unwrap_arc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder of we can drop this .map(unwrap_arc) and just call .map_until_stop_and_collect(|plan| rewrite_arc(plan, f))? because in that case we don't need the .update_data and we don't need to unwrap the remaining Arcs in case of TreeNodeRecursion::Stop is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent suggestion -- the code is both less verbose, and and it saves having to make a second Vec
I made this change in b29ebd2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the idea and the implementation @alamb and @jayzhan211!
Thanks @peter-toth -- I agree I am quite happy with this formulation compared to the initial versions. It is pretty slick and a great example of collaboration I think. I am now very excited to use it to stop the copying in the Optimizer -- I like refactoring as much as the next person, but it really helps to have an user visible goal (faster planning) |
jayzhan211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we can avoid unwrap and wrap with Arc if the transform state is TreeNodeRecursion::Stop
|
Thanks @peter-toth and @jayzhan211 |
Which issue does this PR close?
This is the first step towards making DataFusion planning (much) faster -- #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211
Rationale for this change
Profiling (see #9637) shows that slowest part of planning is copying
LogicalPlanandExprin theOptimizerThus avoiding this copying as much as possible is key.
The TreeNode API is fast becoming the standard way to rewrite
LogicalPlanin DataFusion so it is worth investing time making them faster (so everything implemented in terms of them gets faster). When I rewrote theOptimizerto use this API (in #9948 -- it gets 10% faster)What changes are included in this PR?
LogicalPlan::map_childrento avoid cloning inputs when possible (by rewritingArcas much as possible)Are these changes tested?
Functionally: By existing CI
Performance benchmarks: 1-2% faster in planning benchmarks (this only the beginning! Subsequent PRs are even better)
Details