Don't preserve functional dependency when generating UNION logical plan #44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This PR discards the functional dependencies when generating the UNION logical plan, thus helping avoid FDs that no longer exist being used by further operations, e.g. aggregation.
When the datafusion logical planner build the
AGGREGATE
plan, it adds additional columns in thegroup_expr
based on the functional dependencies.However, for queries that are aggregating upon table obatined through
UNION
operation, the functional dependency is still preserved in the schema ofUNION
plan, while the functional dependency no longer retains after theUNION
.Table 1:
Table 2:
In both Table1 and Table2, the functional dependency
col1 -> col2
holds. However, whenselect * from table1 UNION select * from table2
, the functional dependencycol1 -> col2
no longer holds.This causes trouble in further aggregation based on UNION results, consider the following query:
Due to the wrongly preserved functional dependency, the query generates wrong logical plan in the final aggregation step
In the test added, the result would contain duplicated groups without changes made in this PR:
data:image/s3,"s3://crabby-images/6b145/6b14567963209a44258c3892a460fac1cf0a7da9" alt="image"
What changes are included in this PR?
UNION
logical planAre these changes tested?
Yes
Are there any user-facing changes?
The target columns described by FD will no longer be wrongly included in the aggregate
group_by
columns