Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ExecutionPlan::maintains_input_order to return vector (to support multi children executors better) #5035

Merged
merged 13 commits into from
Jan 25, 2023
Merged

Change ExecutionPlan::maintains_input_order to return vector (to support multi children executors better) #5035

merged 13 commits into from
Jan 25, 2023

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Jan 24, 2023

Which issue does this PR close?

Closes #4980

Rationale for this change

Currently the maintains_input_order function returns a bool value indicating whether the executor maintains ordering. Obviously, for executors with multiple inputs, this incurs information loss -- which input ordering are we talking about? The best we can do is to assign a special meaning like any or all to this function, and this is enough for some executors. Obviously, there are (and will be) executors where this is insufficient.

What changes are included in this PR?

In this PR, we change/generalize the signature of this function to properly support multi-children executors. The initial use case we have is the UnionExec implementation -- it will henceforth produce an output_ordering not only when input orderings are strictly equal, but when there is a subset ordering that satisfy all input orderings.

More use cases that will leverage this change are also on the way, such as:

Are these changes tested?

New tests are added to test the change.

Are there any user-facing changes?

This PR introduces an API change. api change

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions labels Jan 24, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo -- this is very exciting. There is one plan that doesn't look right to me and I have some other minor comments. I also plan to run the IOx tests with this change to see if ours come up with anything

@alamb
Copy link
Contributor

alamb commented Jan 24, 2023

FWIW I ran the IOx tests on the changes from this branch https://github.com/influxdata/influxdb_iox/pull/6681 and they seem to have passed 👍

@ozankabak
Copy link
Contributor

Thanks for the review @alamb, we incorporated your suggestions. I am excited about this change, it paves the way to many more optimizations.

@ozankabak
Copy link
Contributor

There seems to be a CI failure (/usr/bin/ld: final link failed: No space left on device), but I think it is unrelated to this PR.

@alamb
Copy link
Contributor

alamb commented Jan 24, 2023

There seems to be a CI failure (/usr/bin/ld: final link failed: No space left on device), but I think it is unrelated to this PR.

#5040

I think we need to split up the CI jobs a bit. I think @tustvold was looking at it, and if he doesn't have a chance I will try and do so later this afternoon

@alamb
Copy link
Contributor

alamb commented Jan 24, 2023

Thanks @mustafasrepo and @ozankabak

@alamb
Copy link
Contributor

alamb commented Jan 24, 2023

FYI any "no space left on device" CI errors should be fixed by merging up from master to get #5047

@ozankabak
Copy link
Contributor

Done, feel free to merge whenever you like.

@alamb
Copy link
Contributor

alamb commented Jan 24, 2023

Done, feel free to merge whenever you like.

Thanks -- I plan to merge this tomorrow so that @mingmwang / @yahoNanJing can review if they want

@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 24, 2023
@alamb alamb merged commit 4a33f27 into apache:master Jan 25, 2023
@alamb
Copy link
Contributor

alamb commented Jan 25, 2023

Thanks again @mustafasrepo and @ozankabak

@ursabot
Copy link

ursabot commented Jan 25, 2023

Benchmark runs are scheduled for baseline = 2aa1490 and contender = 4a33f27. 4a33f27 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@mingmwang
Copy link
Contributor

Sorry for the late response. I just come back from the holiday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return Vec<bool> instead of bool in ExecutionPlan::maintains_input_order
5 participants