-
Couldn't load subscription status.
- Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
In the following example, the Dynamic Filter Pushdown is built from t2 (k, v) and correctly pushed to t1 (k), to reduce the number of scanned rows from t1:
select *
from t1 -- 1M rows, will become the right side
join t2 on t1.k = t2.k
where t2.v = 1000000; -- 1 row, will become the left side
HashJoinExec join_type=Inner build_time=78.306µs, join_time=22.402µs
DataSourceExec t2.parquet
predicate=v@1 = 1000000
output_rows=1
DataSourceExec t1.parquet
predicate=DynamicFilterPhysicalExpr [ k@0 >= 1000000 AND k@0 <= 1000000 ] -- dynamic filter added here
output_rows=1 -- only returns 1 rowNow if the t2 side is greater than t1, then the order of the join switches, as the smaller relation becomes the build side. However, the Dynamic Filter Pushdown is still applied to the right relation (now t2) instead of t1:
select *
from t1 -- 1M rows, will become the left side
join t2 on t1.k = t2.k
where t2.v >= 1000000; -- 9M rows, will become the right side
HashJoinExec join_type=Inner build_time=54.3462ms, join_time=2.403µs -- join is more expensive
DataSourceExec t1.parquet
output_rows=1000000
DataSourceExec t2.parquet
predicate=v@1 >= 1000000 AND DynamicFilterPhysicalExpr [ k@0 >= 3 AND k@0 <= 999998 ] -- dynamic filter added here
output_rows=0 -- should be 1, possibly due to issue https://github.com/apache/datafusion/issues/17188To Reproduce
copy (select i as k from generate_series(1, 1000000) t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 10000000) t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
-- works
explain analyze select *
from t1
join t2 on t1.k = t2.k
where t2.v = 1000000;
-- pushed to the wrong relation
explain analyze select *
from t1
join t2 on t1.k = t2.k
where t2.v >= 1000000;Expected behavior
Dynamic Filter applied to the correct table.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working