Skip to content

fix: unify ordering display with optimization path#20362

Open
adriangb wants to merge 5 commits intoapache:mainfrom
pydantic:fix-complex-projection-ordering
Open

fix: unify ordering display with optimization path#20362
adriangb wants to merge 5 commits intoapache:mainfrom
pydantic:fix-complex-projection-ordering

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Feb 15, 2026

Summary

Unify the ordering display path with the optimization path so EXPLAIN output always matches what the optimizer sees.

FileScanConfig previously had two independent paths computing orderings:

  1. Optimization (eq_properties()): validates orderings at table-schema level via validated_output_ordering(), then projects through EquivalenceProperties::project().
  2. Display (fmt_as()): independently recomputed via get_projected_output_ordering(), which validated post-projection and could disagree with path 1.

The display path dropped valid orderings when any projection expression was complex (e.g. a + 1), even if the ordering column itself was a simple column reference. This PR replaces the display computation with eq_properties().oeq_class(), the same orderings the optimizer uses.

Changes

  • Replace get_projected_output_ordering() calls in both DataSource::fmt_as and DisplayAs::fmt_as with self.eq_properties().oeq_class()
  • Delete get_projected_output_ordering and resolve_sort_column_projection (no longer needed)
  • Add 3 regression tests:
    • test_display_ordering_with_complex_projection_multi_file — complex projections no longer drop valid orderings
    • test_display_ordering_dropped_for_overlapping_stats — overlapping file stats correctly suppress orderings
    • test_display_ordering_matches_eq_properties — display and optimization paths agree
  • Update SLT expectations to reflect equivalence-aware ordering display (e.g. simplified orderings when filter constants are present, additional equivalent orderings from monotonic projections like CAST)

Test plan

  • cargo test -p datafusion-datasource (100 tests pass)
  • SLT tests updated and passing: sort_pushdown, union, window, monotonic_projection_test, topk, group_by, joins

🤖 Generated with Claude Code

Previously, `get_projected_output_ordering` used
`ordered_column_indices_from_projection` which was all-or-nothing: if any
expression in the projection wasn't a simple Column, it returned None for
the entire projection — even if the sort columns themselves were simple
column refs.

Replace it with `resolve_sort_column_projection` which only requires
sort-column positions to resolve to simple Columns. Each ordering is now
independently evaluated: orderings on simple column refs get validated
with statistics even when other projection expressions are complex.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the datasource Changes to the datasource crate label Feb 15, 2026
Replace the independent display computation (get_projected_output_ordering)
with orderings extracted from eq_properties().oeq_class(), so EXPLAIN
output always matches what the optimizer actually sees.

Previously, fmt_as() independently recomputed orderings via
get_projected_output_ordering(), which validated post-projection and
would drop valid orderings when any projection expression was complex
(e.g. `a + 1`). Now both display and optimization use the same path:
validate at table-schema level, then project through
EquivalenceProperties::project().

- Delete get_projected_output_ordering and resolve_sort_column_projection
- Update DataSource::fmt_as and DisplayAs::fmt_as to use eq_properties()
- Add regression tests for complex projections with multi-file groups
- Update SLT expectations for equivalence-aware ordering display

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adriangb adriangb changed the title fix: handle complex projections in ordering validation fix: unify ordering display with optimization path Feb 15, 2026
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 15, 2026
@adriangb adriangb requested a review from zhuqi-lucas February 15, 2026 16:28
adriangb and others added 3 commits February 15, 2026 11:28
The partition/file ordering diagrams from the deleted
get_projected_output_ordering are useful context for understanding
why we validate orderings against file statistics. Move them to
validated_output_ordering where they belong.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adriangb adriangb marked this pull request as ready for review February 16, 2026 00:16
@adriangb
Copy link
Contributor Author

@zhuqi-lucas could you review this change please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant