-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache common plan properties to eliminate recursive calls in physical plan #9346
Cache common plan properties to eliminate recursive calls in physical plan #9346
Conversation
# Conflicts: # datafusion/physical-plan/src/streaming.rs
# Conflicts: # datafusion/physical-plan/src/projection.rs
Fix deploying DataFusion site error
Super excited! I did an initial review already but will do another deep dive tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another round of review and LGTM except a few minor things (whose comments I left inline).
One general comment is to add a succinct, single line docstring comment to create_cache
functions. Something like:
This function creates the cache object that stores plan properties such as equivalence properties, output ordering etc.
Looking forward to getting some eyes on this from the community!
I plan to review this PR today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR @mustafasrepo -- I agree with @ozankabak and @gruuya that the use of the term "cache" is confusing.
I also think we should remove the existing methods like ExecutionPlan::execution_mode
to avoid potential bugs due to inconsistent implementations.
In addition to improving the stack depth, I think this PR may also significantly improve planning speed (as it avoids redundant calculations)
Otherwise I think this PR looks good to me
# Conflicts: # datafusion/physical-plan/src/repartition/mod.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is looking quite good now. I'll try and give it another careful look tomorrow. Maybe we can wait to see if @gruuya has any additional comments before merging.
I do think it would be better to make it impossible for the API's to be inconsistent (aka #9346 (comment)) but I think it would be ok to leave the API as it is currently in this PR
No further comments from me, aside from thanks once again @mustafasrepo for handling this! Perhaps just a side note that even |
@metesynnada did some debugging of |
Yup, agreed. My own brief investigation led me to believe that in the case of |
I was curious on how this affected planning performance. Here are my results comparing 0c46d7f (parent of first commit in this PR) with a8fac85. Seems like there are quite big regressions in physical planning. But I may have messed up. Results
|
Perhaps it's the eager computation of plan properties now, as opposed to lazy computation before (meaning some props were not computed/needed)? |
I re-ran benchmarks in my machine. The results are below Results
Main Run 2 vs Main Run 1
Branch Run 2 vs Branch Run 1
|
Sorry for the noise, I must have somehow messed up. I also get green numbers now. I wanted to merge main into this PR to get a direct comparison, but there are some conflicts. I guess it would be good to do a final run when this PR merges cleanly with main. Results
|
# Conflicts: # datafusion/core/src/physical_optimizer/join_selection.rs # datafusion/physical-plan/src/aggregates/mod.rs # datafusion/physical-plan/src/coalesce_partitions.rs # datafusion/physical-plan/src/joins/cross_join.rs # datafusion/physical-plan/src/lib.rs # datafusion/physical-plan/src/sorts/sort_preserving_merge.rs # datafusion/physical-plan/src/unnest.rs # datafusion/physical-plan/src/windows/window_agg_exec.rs # datafusion/physical-plan/src/work_table.rs
@alamb, I did one final review to make sure all looks good. Since this will attract a lot of merge conflicts, it'd be a good idea to merge this sooner rather than later. Given the benchmark confusion is out of the way too, are you OK with merging? |
I am running benchmark numbers now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared this branch with b220f03 (the merge base)
git checkout b220f03fffda22c70f03fa84e244cf04f0e6644c
cargo bench --bench sql_planner
git checkout feature/remove_default_cache
cargo bench --bench sql_planner
All the performance seems to be faster or the same. Results: results.txt
Thanks again everyone
I have a follow on PR proposal #9389 to improve the documentation |
Thanks everyone for the awesome collaboration! |
* deps: upgrade datafusion to 37.1.0 * feat: re-implement SessionContext::tables The method was removed upstream but is used in many tests for `datafusion-python`. Ref: apache/datafusion#9627 * feat: upgrade dataframe write_parquet and write_json The options to write_parquet changed. write_json has a new argument that I defaulted to None. We can expose that config later. Ref: apache/datafusion#9382 * feat: impl new ExecutionPlanProperties for DatasetExec Ref: apache/datafusion#9346 * feat: add upstream variant and method params - `WindowFunction` and `AggregateFunction` have `null_treatment` options. - `ScalarValue` and `DataType` have new variants - `SchemaProvider::table` now returns a `Result` * lint: allow(deprecated) for make_scalar_function * feat: migrate functions.rs `datafusion` completed an Epic that ported many of the `BuiltInFunctions` enum to `SclarUDF`. I created new macros to simplify the port, and used these macros to refactor a few existing functions. Ref: apache/datafusion#9285 * fixme: commented out last failing test This is a bug upstream in datafusion FAILED datafusion/tests/test_functions.py::test_array_functions - pyo3_runtime.PanicException: range end index 9 out of range for slice of length 8 * chore: update Cargo.toml package info
Which issue does this PR close?
Closes #9084 .
Rationale for this change
In great analysis by @gruuya at the issue 9084. @gruuya recognized that stack usage (depth) increases a lot during logical and physical planning. The root cause of aggressive stack usage is
.clone
ofLogicalPlan
enum.API
s of theArc<dyn ExecutionPlan>
, such asEquivalenceProperties
,output_partitioning
,output_ordering
, etc.In the PR9084, @gruuya could reduce physical plan stack usage by caching
equivalence_properties
forProjectionExec
.This PR introduces a new struct to cache PlanProperties (
PlanPropertiesCache
). With this struct,schema
,output_partitioning
,equivalence_properties
,output_ordering
is cached. This caching mechanism removes recursive calls during getter methods. Also, given.cache
method is implemented, default implementations of the.output_partitioning
,.equivalence_properties
,output_ordering
works out of the box.With these changes stack depth decreases considerably, Since recursive calls are eliminated in the
PhysicalPlan
.As an example Flame graph for the query 54 is converted from following graph
to following graph
.
What changes are included in this PR?
Are these changes tested?
Existing tests should work
Are there any user-facing changes?
api change