-
Couldn't load subscription status.
- Fork 1.7k
fix bounds accumulator reset in HashJoinExec dynamic filter pushdown #17371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@xudong963 would you mind reviewing? I also think maybe we should encapsulate the dynamic filter + the bounds accumulator in an optional struct to make the bad state unrepresentable. |
Done! |
| self.projection.as_ref(), | ||
| )?, | ||
| // Keep the dynamic filter, bounds accumulator will be reset | ||
| dynamic_filter: self.dynamic_filter.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's possible that subsequent optimization rules can break reference integrity, does it make sense to preserve this unconditionally?
Maybe we can an additional check and see if the dynamic filter is preserved on the incoming right child?
I wonder if in any case we can reset the bounds accumulator to OnceLock::new() since it's lazily initialized during execute()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's possible that subsequent optimization rules can break reference integrity, does it make sense to preserve this unconditionally?
FWIW this should never cause incorrect results, just disable the optimization (the filter never gets updated).
For HashJoinExec the dynamic filter and bounds accumulator go hand in hand: it makes sense to copy them together.
I wonder if in any case we can reset the bounds accumulator to OnceLock::new() since it's lazily initialized during execute()
We could, but I'm not sure that's a case we'll ever hit. When would with_new_children be called after execution has started?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be reasonable to always call DynamicFilterPhysicalExpr::update(lit(true)) from ExecutionPlan::with_new_children 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I'm not sure that's a case we'll ever hit. When would with_new_children be called after execution has started?
Yeah, while probably possible from a usage perspective I agree it's unrealistic.
FWIW this should never cause incorrect results, just disable the optimization (the filter never gets updated).
Good point - I guess the downside is just the stale dynamic_filter being left around in the case it becomes orphaned. But that's probably not a big deal.
* Enable physical filter pushdown for hash joins (apache#16954) (cherry picked from commit b10f453) * Add ExecutionPlan::reset_state (apache#17028) * Add ExecutionPlan::reset_state Co-authored-by: Robert Ream <robert@stably.io> * Update datafusion/sqllogictest/test_files/cte.slt * Add reference * fmt * add to upgrade guide * add explain plan, implement in more plans * fmt * only explain --------- Co-authored-by: Robert Ream <robert@stably.io> * Add dynamic filter (bounds) pushdown to HashJoinExec (apache#16445) (cherry picked from commit ff77b70) * Push dynamic pushdown through CooperativeExec and ProjectionExec (apache#17238) (cherry picked from commit 4bc0696) * Fix dynamic filter pushdown in HashJoinExec (apache#17201) (cherry picked from commit 1d4d74b) * Fix HashJoinExec sideways information passing for partitioned queries (apache#17197) (cherry picked from commit 64bc58d) * disallow pushdown of volatile functions (apache#16861) * dissallow pushdown of volatile PhysicalExprs * fix * add FilteredVec helper to handle filter / remap pattern (#34) * checkpoint: Address PR feedback in https://github.com/apach... * add FilteredVec to consolidate handling of filter / remap pattern * lint * Add slt test for pushing volatile predicates down (#35) --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit 94e8548) * fix bounds accumulator reset in HashJoinExec dynamic filter pushdown (apache#17371) --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> Co-authored-by: Robert Ream <robert@stably.io> Co-authored-by: Jack Kleeman <jackkleeman@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Any system that applies additional optimizer rules or otherwise manipulates plans will end up calling
with_new_childrenwhich wipes out the bounds accumulator thus disabling the optimization.