Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Mar 5, 2021

What changes were proposed in this pull request?

This pr remove GlobalLimit operator if its child max rows not larger than limit number. For example:

val testRelation = LocalRelation.fromExternalRows(Seq("a".attr.int, "b".attr.int, "c".attr.int), 1.to(10).map(_ => Row(1, 2, 3)) )
val query = GlobalLimit(100, testRelation)

We can remove this GlobalLimit.

Why are the changes needed?

Further optimize the query.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Mar 5, 2021
@SparkQA
Copy link

SparkQA commented Mar 5, 2021

Test build #135787 has finished for PR 31750 at commit f9ee999.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Mar 6, 2021

cc @cloud-fan This is a part of #31691. To make that pr more clear.

* This rule optimizes Limit operators by:
* 1. Eliminate [[Limit]] operators if it's child max row <= limit.
* 2. Combines two adjacent [[Limit]] operators into one, merging the
* 2. Eliminate [[GlobalLimit]] operators if it's child max row <= limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about merging 1. and 2. like Eliminate [[Limit]]/[[GlobalLimit]] ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@SparkQA
Copy link

SparkQA commented Mar 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40412/

@SparkQA
Copy link

SparkQA commented Mar 6, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40412/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @wangyum and @maropu .
Merged to master.

@SparkQA
Copy link

SparkQA commented Mar 6, 2021

Test build #135830 has finished for PR 31750 at commit 22957b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum deleted the SPARK-34628 branch March 7, 2021 01:25
@cloud-fan
Copy link
Contributor

late LGTM

wangyum added a commit that referenced this pull request May 26, 2023
… if its child max rows not larger than limit number (#1033)

* [SPARK-34628][SQL] Remove GlobalLimit operator if its child max rows not larger than limit number

### What changes were proposed in this pull request?

This pr remove `GlobalLimit` operator if its child max rows not larger than limit number. For example:
```
val testRelation = LocalRelation.fromExternalRows(Seq("a".attr.int, "b".attr.int, "c".attr.int), 1.to(10).map(_ => Row(1, 2, 3)) )
val query = GlobalLimit(100, testRelation)
```
We can remove this `GlobalLimit`.

### Why are the changes needed?

Further optimize the query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31750 from wangyum/SPARK-34628.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants