Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18079] [SQL] CollectLimitExec.executeToIterator should perform per-partition limits #15614

Closed
wants to merge 1 commit into from

Conversation

pwoody
Copy link

@pwoody pwoody commented Oct 24, 2016

What changes were proposed in this pull request?

This change adds a partition local limit to the executeToIterator method.

How was this patch tested?

Added a test to SQLQuerySuite to ensure that only the limited amount is read from the partition.

@pwoody
Copy link
Author

pwoody commented Oct 24, 2016

@JoshRosen since you had made the analogous change to executeCollect.

I wonder if it also makes sense to simply push the LocalLimitExec into the planning within SparkStrategies so that we don't need to deal with this individually in CollectLimitExec. Happy to do either approach!

@JoshRosen
Copy link
Contributor

Jenkins, this is okay to test.

This looks fine to me. There's a larger ongoing discussion in #15596 which relates to the planning of these limit operations; let's see if further optimizations are subsumed by that change.

@pwoody
Copy link
Author

pwoody commented Oct 24, 2016

Yep, in that PR the LocalLimitExec is added into planning straight up. That would reduce this to simply adding the child.executeToIterator.take(limit) as an override.

@pwoody
Copy link
Author

pwoody commented Dec 6, 2016

@JoshRosen unfortunately the other PR got closed. thoughts on this independently?

@pwoody
Copy link
Author

pwoody commented Dec 16, 2016

@JoshRosen holler if you want me to make any other changes here.

@SparkQA
Copy link

SparkQA commented Dec 16, 2016

Test build #3507 has started for PR 15614 at commit cbd64b0.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwoody pwoody closed this Apr 1, 2018
@pwoody pwoody deleted the pw/localLimitIterator branch April 1, 2018 17:00
rshkv added a commit to palantir/spark that referenced this pull request Feb 23, 2021
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
rshkv added a commit to palantir/spark that referenced this pull request Feb 25, 2021
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
rshkv added a commit to palantir/spark that referenced this pull request Feb 26, 2021
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
rshkv added a commit to palantir/spark that referenced this pull request Feb 26, 2021
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
jdcasale added a commit to palantir/spark that referenced this pull request Mar 3, 2021
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
rshkv added a commit to palantir/spark that referenced this pull request Mar 4, 2021
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants