-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
During the EnforceSorting optimizer run, a valid plan may be turned invalid due to the removal of a necessary coalesce. The result is a planning time failure in the SanityChecker due to does not satisfy distribution requirements: HashPartitioned[[a@0]]). Child-0 output partitioning: UnknownPartitioning(2)
.
We start with a valid input plan:
"SortExec: expr=[a@0 ASC], preserve_partitioning=[false]",
" AggregateExec: mode=SinglePartitioned, gby=[a@0 as a1], aggr=[]",
" CoalescePartitionsExec",
" ProjectionExec: expr=[a@0 as a, b@1 as value]",
" UnionExec",
" DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], file_type=parquet",
" DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], file_type=parquet"
And a coalesce is removed to make it invalid:
"SortPreservingMergeExec: [a@0 ASC]",
" SortExec: expr=[a@0 ASC], preserve_partitioning=[true]",
" AggregateExec: mode=SinglePartitioned, gby=[a@0 as a1], aggr=[]",
" ProjectionExec: expr=[a@0 as a, b@1 as value]",
" UnionExec",
" DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], file_type=parquet",
" DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], file_type=parquet",
To Reproduce
A test case demonstrates this: 670eff3
We discovered this bug using our own constructed (and valid) LogicalPlans, which were converted to a physical plan using the default planner, and then had a coalesce inserted during the EnforceDistribution optimizer pass (to handle the UnionExec -> AggregateExec
distribution requirements). The EnforceSorting would then remove the coalesce, thereby rendering the plan invalid once again. We do not feel this is unique to our plans, rather, that this behavior in the EnforceSorting is a bug which could impact others too.
Expected behavior
EnforceSorting should not take a valid plan, and make it invalid -- thereby causing failure in the planning sanity check.
Additional context
We already have a proposed solution: #14637
While debugging, I did a minor refactor to paralelize_sorts
and its helper remove_bottleneck_in_subplan
. The reason for the refactor (also summarized here), was that I noticed a pattern of several necessary nodes being removed -- and then added back later. I elected to simplify the code by tightening up how we build the PlanWithCorrespondingCoalescePartitions
, in order to correctly identify what nodes should be removed in the first place -- and then only removing those nodes. The refactor is isolated in this commit: 0661ed7
Update:
- not refactoring the building of the coalesce context.
- instead, we have added docs to EnforceSorting, it's subrules, and some of the helper methods.
- still have a refactor of the
pushdown_sorts
subrule.