-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18079] [SQL] CollectLimitExec.executeToIterator should perform per-partition limits #15614
Conversation
@JoshRosen since you had made the analogous change to executeCollect. I wonder if it also makes sense to simply push the LocalLimitExec into the planning within SparkStrategies so that we don't need to deal with this individually in CollectLimitExec. Happy to do either approach! |
Jenkins, this is okay to test. This looks fine to me. There's a larger ongoing discussion in #15596 which relates to the planning of these limit operations; let's see if further optimizations are subsumed by that change. |
Yep, in that PR the LocalLimitExec is added into planning straight up. That would reduce this to simply adding the child.executeToIterator.take(limit) as an override. |
@JoshRosen unfortunately the other PR got closed. thoughts on this independently? |
@JoshRosen holler if you want me to make any other changes here. |
Test build #3507 has started for PR 15614 at commit |
Can one of the admins verify this patch? |
… per-partition limits We have an internal product that needs this. See upstream PR [1]. [1] apache#15614 Co-authored-by: Patrick Woody <pwoody@palantir.com> Co-authored-by: Josh Casale <jcasale@palantir.com> Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
… per-partition limits We have an internal product that needs this. See upstream PR [1]. [1] apache#15614 Co-authored-by: Patrick Woody <pwoody@palantir.com> Co-authored-by: Josh Casale <jcasale@palantir.com> Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
… per-partition limits We have an internal product that needs this. See upstream PR [1]. [1] apache#15614 Co-authored-by: Patrick Woody <pwoody@palantir.com> Co-authored-by: Josh Casale <jcasale@palantir.com> Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
… per-partition limits We have an internal product that needs this. See upstream PR [1]. [1] apache#15614 Co-authored-by: Patrick Woody <pwoody@palantir.com> Co-authored-by: Josh Casale <jcasale@palantir.com> Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
… per-partition limits We have an internal product that needs this. See upstream PR [1]. [1] apache#15614 Co-authored-by: Patrick Woody <pwoody@palantir.com> Co-authored-by: Josh Casale <jcasale@palantir.com> Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
… per-partition limits We have an internal product that needs this. See upstream PR [1]. [1] apache#15614 Co-authored-by: Patrick Woody <pwoody@palantir.com> Co-authored-by: Josh Casale <jcasale@palantir.com> Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
What changes were proposed in this pull request?
This change adds a partition local limit to the executeToIterator method.
How was this patch tested?
Added a test to SQLQuerySuite to ensure that only the limited amount is read from the partition.