Skip to content

Conversation

@zml1206
Copy link
Contributor

@zml1206 zml1206 commented Sep 27, 2023

What changes were proposed in this pull request?

This PR add a new optimizer rule EliminateWindowPartitions, it remove window partition if partition expressions are foldable.
sql1:
select row_number() over(order by a) b from t
sql2:
select row_number() over(partition by 1 order by a) b from t
After this PR, the optimizedPlan for sql1 and sql2 is the same.

Why are the changes needed?

Foldable partition is redundant, remove it not only can simplify plan, but some rules can also take effect when the partitions are all foldable, such as LimitPushDownThroughWindow.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 27, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptimizeWindowPartitions -> EliminateWindowPartitions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plan.transformAllExpressionsWithPruning(_.containsAnyPattern(WINDOW_EXPRESSION), ruleId) {
plan.transformAllExpressionsWithPruning(_.containsPattern(WINDOW_EXPRESSION), ruleId) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case we @ WindowExpression(_, ws @ WindowSpecDefinition(ps, _, _)) =>
case we @ WindowExpression(_, ws @ WindowSpecDefinition(ps, _, _)) if ps.exists(_.foldable) =>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more test cases:

  • partitions only unfoldable.
  • Mix unfoldable and foldable partition specs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer Thanks, updated all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can simplify the code here with transformWithPruning only, then we can remove the removeWindowExpressionPartitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If do as you say, this will cause the partitionSpec in WindowExpression to be inconsistent with the partitionSpec in Window, which may cause hidden dangers.

Copy link
Contributor

@beliefer beliefer Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is merge the logic of removeWindowExpressionPartitions into transformWithPruning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      case w @ Window(windowExpressions, partitionSpec, _, _) if partitionSpec.exists(_.foldable) =>
        val newWindowExpressions =
          windowExpressions.map(_.transformWithPruning(_.containsPattern(WINDOW_EXPRESSION)) {
            case wsd @ WindowSpecDefinition(ps, _, _) if ps.exists(_.foldable) =>
              wsd.copy(partitionSpec = ps.filter(!_.foldable))
          }.asInstanceOf[NamedExpression])

Copy link
Contributor Author

@zml1206 zml1206 Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done? WINDOW_EXPRESSION mismatch WindowSpecDefinition.

      case w @ Window(we, ps, _, _) if ps.exists(_.foldable) =>
        val newWe = we.map(_.transformWithPruning(_.containsPattern(WINDOW_EXPRESSION)) {
          case _we @ WindowExpression(_, wsd @ WindowSpecDefinition(_ps, _, _))
            if _ps.exists(_.foldable) =>
            val newWsd = wsd.copy(partitionSpec = _ps.filter(!_.foldable))
            _we.copy(windowSpec = newWsd)
          }.asInstanceOf[NamedExpression])
        w.copy(windowExpressions = newWe, partitionSpec = ps.filter(!_.foldable))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually built it, just for reference. You should ensure that it is correct.

Could we make the variable name more readable? e.g. we => windowExprs or windowExpressions and others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the above code

      case w @ Window(windowExprs, partitionSpec, _, _) if partitionSpec.exists(_.foldable) =>
        val newWe = windowExprs.map(_.transformWithPruning(_.containsPattern(WINDOW_EXPRESSION)) {
          case windowExpr @ WindowExpression(_, wsd @ WindowSpecDefinition(ps, _, _))
            if ps.exists(_.foldable) =>
            val newWsd = wsd.copy(partitionSpec = ps.filter(!_.foldable))
            windowExpr.copy(windowSpec = newWsd)
        }.asInstanceOf[NamedExpression])
        w.copy(windowExpressions = newWe, partitionSpec = partitionSpec.filter(!_.foldable))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I'm missing. Please rename newWe to newWindowExprs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zml1206 zml1206 changed the title [SPARK-45352][SQL] Remove window partition if partition expressions are foldable [SPARK-45352][SQL] Eliminate foldable window partitions Oct 8, 2023
@zml1206
Copy link
Contributor Author

zml1206 commented Oct 30, 2023

cc @wangyum

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use withNewChildren instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which object uses withNewChildren? What is the purpose of using withNewChildren?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce object copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

@beliefer
Copy link
Contributor

cc @cloud-fan

/**
* Remove window partition if partition expressions are foldable
*/
object EliminateWindowPartitions extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's put it in a new file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Row(1) :: Row(1) :: Nil)
}

test("SPARK-45352: Remove window partition if partition expression are foldable") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a new optimizer rule, we should add unit tests, like LimitPushdownSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val window1 = new WindowSpec(Seq(), Seq(sortOrder), UnspecifiedFrame)
val window2 = new WindowSpec(Seq(lit(1).expr), Seq(sortOrder), UnspecifiedFrame)
val df1 = ds.select(row_number().over(window1).alias("num"))
val df2 = ds.select(row_number().over(window2).alias("num"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for end-to-end tests, let's use end-to-end APIs, e.g. Window.partitionBy($"key").orderBy($"value")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

@zml1206
Copy link
Contributor Author

zml1206 commented Jan 2, 2024

@cloud-fan Already fixed unit test, could you have time to take a look?Thanks.

case windowExpr @ WindowExpression(_, wsd @ WindowSpecDefinition(ps, _, _))
if ps.exists(_.foldable) =>
val newWsd = wsd.copy(partitionSpec = ps.filter(!_.foldable))
windowExpr.withNewChildren(Seq(windowExpr.windowFunction, newWsd))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: windowExpr.copy(windowSpec = newWsd)

testRelation
.select(a, b,
windowExpr(RowNumber(),
windowSpec(Nil, b.desc :: Nil, windowFrame)).as("rn"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer can you help confirm that no partition column means a single partition in the window operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. All the data rows in a single partition.

windowExpr(RowNumber(),
windowSpec(a :: Nil, b.desc :: Nil, windowFrame)).as("rn"))

val correctAnswer =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: val correctAnswer = originalQuery

Row(1) :: Row(1) :: Nil)
}

test("SPARK-45352: Eliminate foldable window partitions") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we turn on/off the new optimizer rule and run the test twice to make sure the result is the same?

@zml1206
Copy link
Contributor Author

zml1206 commented Jan 3, 2024

Updated all. @cloud-fan

@zml1206 zml1206 requested a review from cloud-fan January 4, 2024 13:35
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 8facc99 Jan 8, 2024
@zml1206
Copy link
Contributor Author

zml1206 commented Jan 8, 2024

Thank you for review. @cloud-fan @beliefer

zml1206 added a commit to zml1206/spark that referenced this pull request May 7, 2025
### What changes were proposed in this pull request?
This PR add a new optimizer rule `EliminateWindowPartitions`, it remove window partition if partition expressions are foldable.
sql1:
`select row_number() over(order by a) b from t `
sql2:
`select row_number() over(partition by 1 order by a) b from t `
After this PR, the `optimizedPlan` for sql1 and sql2 is the same.

### Why are the changes needed?
Foldable partition is redundant, remove it not only can simplify plan, but some rules can also take effect when the partitions are all foldable, such as `LimitPushDownThroughWindow`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#43144 from zml1206/SPARK-45352.

Authored-by: zml1206 <zhuml1206@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants