Skip to content

Commit f097ad4

Browse files
LiaCastanedaasubiottoEeshanBembiEeshanebembi-crdb
authored
Cherry-pick filter passthrough AggregateExec and UnionExec (#58)
* Allow filter pushdown through AggregateExec (apache#18404) ## Which issue does this PR close? - Closes apache#18399 ## Rationale for this change Right now filters cannot pass through `AggregateExec` nodes, preventing filter pushdown optimization in queries with GROUP BY/DISTINCT operations. ## What changes are included in this PR? - Implemented `gather_filters_for_pushdown()` for `AggregateExec` that allows filters on grouping columns to pass through to children - Supports both Pre phase (static filters) and Post phase (dynamic filters from joins) Essentially, filter will pass through in the scenarios @asolimando mentioned [here](apache#18399 (comment)) ## Are these changes tested? Yes, added three tests: - `test_aggregate_filter_pushdown`: Positive case with aggregate functions - `test_no_pushdown_aggregate_filter_on_non_grouping_column`: Negative case ensuring filters on aggregate results are not pushed ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> (cherry picked from commit 076b091) * physical-plan: push filters down to UnionExec children (apache#18054) Filters are safe to be pushed down, so we can override the default behavior here. Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com> (cherry picked from commit 0ecd59b) * fix: prevent UnionExec panic with empty inputs (apache#17449) * fix: prevent UnionExec panic with empty inputs This commit fixes a panic in UnionExec when constructed with empty inputs. Previously, UnionExec::new(vec![]) would cause an index out of bounds panic at union.rs:542 when trying to access inputs[0]. Changes: - Made UnionExec::new() return Result<Self> with proper validation - Made union_schema() return Result<SchemaRef> with empty input checks - Added descriptive error messages for empty input cases - Updated all call sites to handle the new Result return type - Added comprehensive tests for edge cases Error messages: - "UnionExec requires at least one input" - "Cannot create union schema from empty inputs" The fix maintains backward compatibility for valid inputs while preventing crashes and providing clear error messages for invalid usage. Fixes apache#17052 * refactor: address PR review comments for UnionExec empty inputs fix - Add new try_new method that returns Result<Arc<dyn ExecutionPlan>> - Deprecate existing new method in favor of try_new - Optimize single-input case: try_new returns the input directly - Remove redundant assert!(result.is_err()) from tests - Rename test_union_multiple_inputs_still_works to test_union_schema_multiple_inputs - Update all call sites to use appropriate API (try_new for new code, deprecated new for tests) This maintains backward compatibility while providing better error handling and optimization for single-input cases. * Fix cargo fmt and clippy warnings - Add proper feature gates for parquet_encryption in datasource-parquet - Format code to pass cargo fmt checks - All tests passing * Fix clippy --------- Co-authored-by: Eeshan <eeshan@Eeshans-MacBook-Pro.local> Co-authored-by: ebembi-crdb <ebembi@cockroachlabs.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> (cherry picked from commit b122a16) --------- Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso.subiotto@polarsignals.com> Co-authored-by: EeshanBembi <33062610+EeshanBembi@users.noreply.github.com> Co-authored-by: Eeshan <eeshan@Eeshans-MacBook-Pro.local> Co-authored-by: ebembi-crdb <ebembi@cockroachlabs.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent f209f98 commit f097ad4

File tree

13 files changed

+727
-30
lines changed

13 files changed

+727
-30
lines changed

datafusion/core/src/physical_planner.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1243,7 +1243,7 @@ impl DefaultPhysicalPlanner {
12431243
}
12441244

12451245
// N Children
1246-
LogicalPlan::Union(_) => Arc::new(UnionExec::new(children.vec())),
1246+
LogicalPlan::Union(_) => UnionExec::try_new(children.vec())?,
12471247
LogicalPlan::Extension(Extension { node }) => {
12481248
let mut maybe_plan = None;
12491249
let children = children.vec();

datafusion/core/tests/physical_optimizer/enforce_distribution.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1783,6 +1783,7 @@ fn union_to_interleave() -> Result<()> {
17831783
);
17841784

17851785
// Union
1786+
#[allow(deprecated)]
17861787
let plan = Arc::new(UnionExec::new(vec![left, right]));
17871788

17881789
// final agg
@@ -1827,6 +1828,7 @@ fn union_not_to_interleave() -> Result<()> {
18271828
);
18281829

18291830
// Union
1831+
#[allow(deprecated)]
18301832
let plan = Arc::new(UnionExec::new(vec![left, right]));
18311833

18321834
// final agg

0 commit comments

Comments
 (0)