Skip to content

Conversation

@Nithurshen
Copy link
Contributor

@Nithurshen Nithurshen commented Dec 22, 2025

Which issue does this PR close?

Rationale for this change

After establishing the high-level API for sort pushdown (#19064), it was identified that the optimizer did not handle cases where the sort prefix is a logical constant (e.g., due to a filter).

For example, in a query like:
SELECT * FROM table WHERE region = 'US' ORDER BY region, time

The column region is constant. Previously, the optimizer attempted to push down Sort(region, time). If the underlying source (e.g., Parquet or a custom TableProvider) only supported sorting by time (or had an index on time), the pushdown would fail or be unsupported.

By stripping the constant prefix, we can push down Sort(time), which the source is more likely to support, leading to significant performance improvements (e.g., skipping row groups via dynamic top-k pushdown).

What changes are included in this PR?

  • Modified the optimize method in physical-optimizer/src/pushdown_sort.rs.
  • The optimizer now checks the input's EquivalenceProperties.
  • Any sort expression that is determined to be constant is filtered out from the required_ordering before calling try_pushdown_sort on the source.
  • Added a new unit test test_sort_pushdown_prefix_removal that uses a MockPlanWithConstants to verify that a SortExec is correctly removed when the sort column is constant.

Are these changes tested?

  • Yes, a new unit test test_sort_pushdown_prefix_removal has been added to datafusion/physical-optimizer/src/pushdown_sort.rs.
  • Verified that the optimizer successfully strips constant columns and matches Exact pushdown results in the test case.

Are there any user-facing changes?

  • Users may see improved performance for queries involving ORDER BY and WHERE clauses on sorted data, as the optimizer can now push down sort requirements more aggressively.
  • No changes to public APIs.

@github-actions github-actions bot added the optimizer Optimizer rules label Dec 22, 2025
@Nithurshen Nithurshen closed this Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redesign the try_reverse_output to support more cases

1 participant