Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Sep 2, 2025

Any system that applies additional optimizer rules or otherwise manipulates plans will end up calling with_new_children which wipes out the bounds accumulator thus disabling the optimization.

@github-actions github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Sep 2, 2025
@adriangb
Copy link
Contributor Author

adriangb commented Sep 3, 2025

@xudong963 would you mind reviewing? I also think maybe we should encapsulate the dynamic filter + the bounds accumulator in an optional struct to make the bad state unrepresentable.

@adriangb
Copy link
Contributor Author

adriangb commented Sep 3, 2025

I also think maybe we should encapsulate the dynamic filter + the bounds accumulator in an optional struct to make the bad state unrepresentable.

Done!

@adriangb adriangb requested a review from xudong963 September 4, 2025 00:21
self.projection.as_ref(),
)?,
// Keep the dynamic filter, bounds accumulator will be reset
dynamic_filter: self.dynamic_filter.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's possible that subsequent optimization rules can break reference integrity, does it make sense to preserve this unconditionally?

Maybe we can an additional check and see if the dynamic filter is preserved on the incoming right child?

I wonder if in any case we can reset the bounds accumulator to OnceLock::new() since it's lazily initialized during execute()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's possible that subsequent optimization rules can break reference integrity, does it make sense to preserve this unconditionally?

FWIW this should never cause incorrect results, just disable the optimization (the filter never gets updated).
For HashJoinExec the dynamic filter and bounds accumulator go hand in hand: it makes sense to copy them together.

I wonder if in any case we can reset the bounds accumulator to OnceLock::new() since it's lazily initialized during execute()

We could, but I'm not sure that's a case we'll ever hit. When would with_new_children be called after execution has started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be reasonable to always call DynamicFilterPhysicalExpr::update(lit(true)) from ExecutionPlan::with_new_children 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I'm not sure that's a case we'll ever hit. When would with_new_children be called after execution has started?

Yeah, while probably possible from a usage perspective I agree it's unrealistic.

FWIW this should never cause incorrect results, just disable the optimization (the filter never gets updated).

Good point - I guess the downside is just the stale dynamic_filter being left around in the case it becomes orphaned. But that's probably not a big deal.

@adriangb adriangb merged commit 64c4027 into apache:main Sep 5, 2025
28 checks passed
@adriangb adriangb deleted the fix-hash-join branch September 5, 2025 09:39
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Sep 9, 2025
* Enable physical filter pushdown for hash joins (apache#16954)

(cherry picked from commit b10f453)

* Add ExecutionPlan::reset_state (apache#17028)

* Add ExecutionPlan::reset_state

Co-authored-by: Robert Ream <robert@stably.io>

* Update datafusion/sqllogictest/test_files/cte.slt

* Add reference

* fmt

* add to upgrade guide

* add explain plan, implement in more plans

* fmt

* only explain

---------

Co-authored-by: Robert Ream <robert@stably.io>

* Add dynamic filter (bounds) pushdown to HashJoinExec (apache#16445)

(cherry picked from commit ff77b70)

* Push dynamic pushdown through CooperativeExec and ProjectionExec (apache#17238)

(cherry picked from commit 4bc0696)

* Fix dynamic filter pushdown in HashJoinExec (apache#17201)

(cherry picked from commit 1d4d74b)

* Fix HashJoinExec sideways information passing for partitioned queries (apache#17197)

(cherry picked from commit 64bc58d)

* disallow pushdown of volatile functions (apache#16861)

* dissallow pushdown of volatile PhysicalExprs

* fix

* add FilteredVec helper to handle filter / remap pattern (#34)

* checkpoint: Address PR feedback in https://github.com/apach...

* add FilteredVec to consolidate handling of filter / remap pattern

* lint

* Add slt test for pushing volatile predicates down (#35)

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
(cherry picked from commit 94e8548)

* fix bounds accumulator reset in HashJoinExec dynamic filter pushdown (apache#17371)

---------

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
Co-authored-by: Robert Ream <robert@stably.io>
Co-authored-by: Jack Kleeman <jackkleeman@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants