Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping ExecutionPlan::maintains_input_order(), equivalence_properties and maintains_input_order are confusing #8120

Open
alamb opened this issue Nov 10, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Nov 10, 2023

Is your feature request related to a problem or challenge?

While upgrading IOx, to use #8006, I found that the EnforceSorting pass was adding a sort node to an ExecutionPlan, even when the input to that plan was sorted correctly and ExecutionPlan::maintains_input_order returns true, where prior to #8006 it did not.

The issue is that our node did not also correctly report its EquivalenceProperties which now the EnforceSorting pass relies on.

Thus, I think we are in the situation where

  • the combination of output_ordering, and equivalence_properties is used by EnforceSorting
  • maintains_input_order is used in other places

This leads to the situation where implementers of ExecutionPlan have to keep the three methods in sync, which I think this is confusing and error prone.

Describe the solution you'd like

I propose we deprecate maintains_input_order (with a hint about equivalence classes to help other users) and use only output_ordering and equivalence_properties to determine if the order is maintained

Describe alternatives you've considered

No response

Additional context

Example

impl ExecutionPlan for DeduplicateExec {
...
    fn maintains_input_order(&self) -> Vec<bool> {
        vec![true]
    }
...
}

For a plan like this

ProjectionExec: expr=[field_int@1 as field_int, tag1@2 as tag1, time@3 as time]
  DeduplicateExec: [tag1@2 ASC,time@3 ASC]
    SortExec: expr=[tag1@2 ASC,time@3 ASC,__chunk_order@0 ASC]
      RecordBatchesExec: chunks=1, projection=[__chunk_order, field_int, tag1, time]

Prior to #8006

No sort is added,

2023-11-10T11:30:52.620358Z TRACE log: Optimized physical plan by EnforceSorting:
OutputRequirementExec
  ProjectionExec: expr=[field_int@1 as field_int, tag1@2 as tag1, time@3 as time]
    DeduplicateExec: [tag1@2 ASC,time@3 ASC]
      SortExec: expr=[tag1@2 ASC,time@3 ASC,__chunk_order@0 ASC]
        RecordBatchesExec: chunks=1, projection=[__chunk_order, field_int, tag1, time]

After #8006

The EnforceSorting rule adds a SortExec at the top

2023-11-10T11:29:45.120962Z TRACE log: Optimized physical plan by EnforceSorting:
OutputRequirementExec
  SortExec: expr=[tag1@1 ASC,time@2 ASC]
    ProjectionExec: expr=[field_int@1 as field_int, tag1@2 as tag1, time@3 as time]
      DeduplicateExec: [tag1@2 ASC,time@3 ASC]
        SortExec: expr=[tag1@2 ASC,time@3 ASC,__chunk_order@0 ASC]
          RecordBatchesExec: chunks=1, projection=[__chunk_order, field_int, tag1, time]

Adding equivalence_properties fixed the problem:

impl ExecutionPlan for DeduplicateExec {
...
    fn equivalence_properties(&self) -> EquivalenceProperties {
        // deduplicate does not change the equivalence properties
        self.input.equivalence_properties()
    }
...
}
@alamb alamb added the enhancement New feature or request label Nov 10, 2023
@alamb alamb changed the title Overlapping ExecutionPlan::maintains_input_order(), ExecutionPlan::equivalence_properties and ExecutionPlan::maintains_input_order Overlapping ExecutionPlan::maintains_input_order(), equivalence_properties and maintains_input_order are confusing Nov 10, 2023
@mustafasrepo
Copy link
Contributor

@alamb I agree with you we could simplify the API, to prevent inconsistent usage. However, I think keeping maintains_input_order and equivalence_properties is better option. output_ordering information is already in the equivalence_properties. We can default implement output_ordering as

self.equivalence_properties.output_ordering()

This way, when some one loses input_ordering since they didn't implement equivalence_properties. output_ordering would simply be None.

@alamb
Copy link
Contributor Author

alamb commented Nov 10, 2023

output_ordering information is already in the equivalence_properties. We can default implement output_ordering as

I don't understand how this would work for something like SortExec that creates an entirely new sort order that is not derived from its inputs 🤔

@mustafasrepo
Copy link
Contributor

output_ordering information is already in the equivalence_properties. We can default implement output_ordering as

I don't understand how this would work for something like SortExec that creates an entirely new sort order that is not derived from its inputs 🤔

We have a method .with_reorder that takes sort_expr. This API introduces a new ordering to the equivalences. Currently SortExec utilizes this API.

@alamb
Copy link
Contributor Author

alamb commented Nov 10, 2023

Here is a PR that tries to document the current state a bit better: #8128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants