-
Couldn't load subscription status.
- Fork 1.7k
Add ExecutionPlan::reset_state #17028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dbcd08c to
f26b319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @adriangb -- this PR makes sense to me except for the expect / panic -- otherwise I think it is good to go
| "implementations have drifted and this is no longer safe even if `new()` still works, ", | ||
| "for example if `new()` now does something different than just calling `compute_properties(...).unwrap()`", | ||
| "\n", | ||
| "This is clearly a bug, please report it!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return DatafusionError::Internal here rather than panic'ing as it is a better UX (if you want fail fast in debug builds, perhaps you could add a debug_assert)
I also recommend converting the explanation into comments and leaving the panic message like "Internal inconsistency in SortExec"
The rationale is that if a user sees this message it is not going to mean anything to them and they can't fix it, and this text will obscure the conclusion (this is a bug they can not do anything to fix). A developer will come to the source location and can read the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb this already had the possibility to panic in it because it called SortExec::new():
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Lines 865 to 872 in eb2b8c0
| impl SortExec { | |
| /// Create a new sort execution plan that produces a single, | |
| /// sorted output partition. | |
| pub fn new(expr: LexOrdering, input: Arc<dyn ExecutionPlan>) -> Self { | |
| let preserve_partitioning = false; | |
| let (cache, sort_prefix) = | |
| Self::compute_properties(&input, expr.clone(), preserve_partitioning) | |
| .unwrap(); |
datafusion/datafusion/physical-plan/src/sorts/sort.rs
Lines 1115 to 1119 in eb2b8c0
| fn with_new_children( | |
| self: Arc<Self>, | |
| children: Vec<Arc<dyn ExecutionPlan>>, | |
| ) -> Result<Arc<dyn ExecutionPlan>> { | |
| let mut new_sort = SortExec::new(self.expr.clone(), Arc::clone(&children[0])) |
Returning a Result would be a major breaking change to the ExecutionPlan::with_new_children API which I don't think we should do in this PR.
I do think ExecutionPlan::with_new_children returning a Result would be a good thing. In general I think trait methods should err on the side of returning a result in case some implementation needs to. If none of them do I'd expect compilation to make it pretty much a non issue for performance. But maybe let's do that as it's own PR if we really want to.
| /// Note to implementers: unlike [`ExecutionPlan::with_new_children`] this method does not accept new children as an argument, | ||
| /// thus it is expected that any cached plan properties will remain valid after the reset. | ||
| /// | ||
| /// [`DynamicFilterPhysicalExpr`]: datafusion_physical_expr::expressions::DynamicFilterPhysicalExpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should also add a note to DynamicFilterPhysicalExpr saying any ExecutionPlan that uses them should also implement reset_state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let (cache, sort_prefix) = Self::compute_properties( | ||
| &new_sort.input, | ||
| new_sort.expr.clone(), | ||
| new_sort.preserve_partitioning, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we put the logic into reset_state if the changes are due to the reset_state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for this change is:
- We need to do similar work as
with_new_children(i.e. cloneSortExec) but each method has slightly different requirements (with_new_childrenneeds to resetcachewhilereset_stateneeds to resetfilter. - To solve this I created the new
SortExec::clonedwhich does neither of those two things and moved the resetting ofcacheintowith_new_childrenand the resetting offilterintoreset_state.
In other words, it doesn't make sense to put Self::compute_properties(...) in reset_state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function returns a Result though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point yep I'll do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed your original point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend putting the PR into upgrading doc.
Co-authored-by: Robert Ream <robert@stably.io>
Good suggestion, added! |
|
Are we concerned that |
I suggest we file a ticket to look into this in the future for anyone who might be interested in that usecase, and not worry about it in this PR |
|
done! #17060 should we wait for anything else before merging this? |
Nope, I think we are good -- let's merge it! |
|
It seems like we should probably backport this to branch-49 for the 49.0.1 release |
* Add ExecutionPlan::reset_state Co-authored-by: Robert Ream <robert@stably.io> * Update datafusion/sqllogictest/test_files/cte.slt * Add reference * fmt * add to upgrade guide * add explain plan, implement in more plans * fmt * only explain --------- Co-authored-by: Robert Ream <robert@stably.io>
* Add ExecutionPlan::reset_state * Update datafusion/sqllogictest/test_files/cte.slt * Add reference * fmt * add to upgrade guide * add explain plan, implement in more plans * fmt * only explain --------- Co-authored-by: Robert Ream <robert@stably.io>
* Add ExecutionPlan::reset_state Co-authored-by: Robert Ream <robert@stably.io> * Update datafusion/sqllogictest/test_files/cte.slt * Add reference * fmt * add to upgrade guide * add explain plan, implement in more plans * fmt * only explain --------- Co-authored-by: Robert Ream <robert@stably.io>
* Enable physical filter pushdown for hash joins (apache#16954) (cherry picked from commit b10f453) * Add ExecutionPlan::reset_state (apache#17028) * Add ExecutionPlan::reset_state Co-authored-by: Robert Ream <robert@stably.io> * Update datafusion/sqllogictest/test_files/cte.slt * Add reference * fmt * add to upgrade guide * add explain plan, implement in more plans * fmt * only explain --------- Co-authored-by: Robert Ream <robert@stably.io> * Add dynamic filter (bounds) pushdown to HashJoinExec (apache#16445) (cherry picked from commit ff77b70) * Push dynamic pushdown through CooperativeExec and ProjectionExec (apache#17238) (cherry picked from commit 4bc0696) * Fix dynamic filter pushdown in HashJoinExec (apache#17201) (cherry picked from commit 1d4d74b) * Fix HashJoinExec sideways information passing for partitioned queries (apache#17197) (cherry picked from commit 64bc58d) * disallow pushdown of volatile functions (apache#16861) * dissallow pushdown of volatile PhysicalExprs * fix * add FilteredVec helper to handle filter / remap pattern (#34) * checkpoint: Address PR feedback in https://github.com/apach... * add FilteredVec to consolidate handling of filter / remap pattern * lint * Add slt test for pushing volatile predicates down (#35) --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit 94e8548) * fix bounds accumulator reset in HashJoinExec dynamic filter pushdown (apache#17371) --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> Co-authored-by: Robert Ream <robert@stably.io> Co-authored-by: Jack Kleeman <jackkleeman@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Fixes #16998. Closes #17016.