-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35231][SQL] logical.Range override maxRowsPerPartition #32350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35231][SQL] logical.Range override maxRowsPerPartition #32350
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137955 has finished for PR 32350 at commit
|
|
cc @wangyum FYI |
|
existing testsuites
|
|
@wangyum No, there is no test to cover |
|
Yea, please add tests in |
|
To add a similar test in |
| extends OrderPreservingUnaryNode { | ||
| override def output: Seq[Attribute] = projectList.map(_.toAttribute) | ||
| override def maxRows: Option[Long] = child.maxRows | ||
| override def maxRowsPerPartition: Option[Long] = child.maxRowsPerPartition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this override is needed for the added test
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138058 has finished for PR 32350 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138071 has finished for PR 32350 at commit
|
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR and the description. I feel what they say is different from what this PR looks like.
| child | ||
| case GlobalLimit(l, child) if canEliminate(l, child) => | ||
| child | ||
| case LocalLimit(l, child) if !plan.isStreaming && canEliminateLocalLimit(l, child) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible that a user's query reaches this optimization path now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a streaming case, maxRowsPerPartition can be filled? (we need the condition !plan.isStreaming here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible that a user's query reaches this optimization path now?
end user's query should not reaches this path, I think. This path is only for adding a similar test in CombiningLimitsSuite
In a streaming case, maxRowsPerPartition can be filled? (we need the condition !plan.isStreaming here?)
org.apache.spark.sql.streaming.StreamSuite.SPARK-30657: streaming limit optimization from StreamingLocalLimitExec to LocalLimitExec fails if do not add this condition.
|
|
||
| def limit(limitExpr: Expression): LogicalPlan = Limit(limitExpr, logicalPlan) | ||
|
|
||
| def localLimit(limitExpr: Expression): LogicalPlan = LocalLimit(limitExpr, logicalPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is used only once now, could you use LocalLimit directly in the test?
| checkPlanAndMaxRowsPerPartition( | ||
| Range(0, 100, 1, 3).select().localLimit(34), | ||
| Range(0, 100, 1, 3).select(), | ||
| 34 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make the test more simple? For example:
assert(Range(0, 100, 1, 3).maxRowsPerPartition === Some(34))
assert(Range(0, 100, 1, 4).maxRowsPerPartition === Some(25))
assert(Range(0, 100, 1, 3).select('id).maxRowsPerPartition === Some(34))|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #138193 has finished for PR 32350 at commit
|
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @wangyum
|
Thank you, @zhengruifeng . Merged to master. |
|
Thank you so much! |
What changes were proposed in this pull request?
when
numSlicesis avaiable,logical.Rangeshould compute a exactmaxRowsPerPartitionWhy are the changes needed?
maxRowsPerPartitionis used in optimizer, we should provide an exact value if possibleDoes this PR introduce any user-facing change?
No
How was this patch tested?
existing testsuites