You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: UnnestExec preserves relevant equivalence properties of input (#16985)
## Which issue does this PR close?
- Closes#15231.
## What changes are included in this PR?
- In `UnnestExec`'s `compute_properties` we now construct
its`EquivalenceProperties` using what we can from the input plan, so
that we preserve sort ordering of unrelated columns (and avoid
unnecessary sorts further up in the plan).
## Are these changes tested?
- Adds test cases to the sqllogictests for `UnnestExec` in `unnest.slt`
## Are there any user-facing changes?
No
## Explanation
Given a struct or array value `col`, `unnest(col)` takes the N elements
of `col` and "spreads" these onto N rows, where all other columns in the
statement are preserved. Said another way, when we unnest a column we
are inserting a lateral cross-join against its elements, which by
construction:
- Duplicates every other column once for each array/map element
- Replaces the original collection column with one (or more) “element”
columns
- Expands one input row into zero (if empty) or many output rows
E.g. (from `unnest.slt`):
https://github.com/apache/datafusion/blob/6d9b76e4a30f6234ffa3f8100b5d4c2735558ca6/datafusion/sqllogictest/test_files/unnest.slt#L699-L712
The [`EquivalenceProperties`
struct](https://github.com/apache/datafusion/blob/66d6995b8f626f28f811489bd2cb552b6c64a85f/datafusion/physical-expr/src/equivalence/properties/mod.rs#L133-L146)
has three types of properties:
1. equivalence groups (expressions with the same value)
2. ordering equivalence classes (expressions that define the same
ordering)
3. table constraints - a set of columns that form a primary key or a
unique key
In this PR we construct the `UnnestExec` node's `EquivalenceProperties`
by using the input plan's equivalence properties for the columns that
are not transformed - except for table constraints, which we discard
entirely. The reasoning for discarding constraints is that because we're
duplicating the other columns across rows, we are invalidating any
uniqueness or primary-key constraint. We also need to some twiddling
with the mapping of the projection (indices change due to the
unnesting).
0 commit comments