You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the datafusion logical planner build the AGGREGATE plan, it adds additional columns in the group_expr based on the functional dependencies. However, for queries that are aggregating upon table obatined through UNION operation, the functional dependency is still preserved in the schema of UNION plan, while the functional dependency no longer retains after the UNION. This causes wrong column being added as group_by column in aggregation plan
To Reproduce
Query involves aggregation on UNION will cause the issue. For example, the query below:
with t1 as (
select i_manufact_id, count(*) as extra from item
group by i_manufact_id
),
t2 as (
select i_manufact_id, count(*) as extra from item
group by i_manufact_id
)
select i_manufact_id, sum(extra)
from (select*from t1
union allselect*from t2) tmp1
group by i_manufact_id
order by i_manufact_id;
This will lead to a logical plan that involves wrong extra column in Aggregate Aggregate: groupBy=[[tmp1.i_manufact_id, tmp1.extra]], aggr=[[sum(tmp1. extra)]]
Expected behavior
UNION logical plan shouldn't retain functional dependencies from the tables involved in UNION. In the example below, both Table1 and Table2 has the functional dependency col1 -> col2. However, when select * from table1 UNION select * from table2, the functional dependency col1 -> col2 no longer holds.
Table 1:
col1 | col2
-----|-----
a | 1
b | 2
Table 2:
col1 | col2
-----|-----
a | 2
b | 4
Additional context
This bug is causing wrong results for running TPCDS query 33, 56, 60, 66 - duplicated groups are presented in results
The text was updated successfully, but these errors were encountered:
Describe the bug
When the datafusion logical planner build the
AGGREGATE
plan, it adds additional columns in the group_expr based on the functional dependencies. However, for queries that are aggregating upon table obatined throughUNION
operation, the functional dependency is still preserved in the schema of UNION plan, while the functional dependency no longer retains after the UNION. This causes wrong column being added as group_by column in aggregation planTo Reproduce
Query involves aggregation on UNION will cause the issue. For example, the query below:
This will lead to a logical plan that involves wrong extra column in Aggregate
Aggregate: groupBy=[[tmp1.i_manufact_id, tmp1.extra]], aggr=[[sum(tmp1. extra)]]
Expected behavior
UNION logical plan shouldn't retain functional dependencies from the tables involved in UNION. In the example below, both Table1 and Table2 has the functional dependency
col1 -> col2
. However, whenselect * from table1 UNION select * from table2
, the functional dependencycol1 -> col2
no longer holds.Table 1:
Table 2:
Additional context
This bug is causing wrong results for running TPCDS query 33, 56, 60, 66 - duplicated groups are presented in results
The text was updated successfully, but these errors were encountered: