Skip to content

Conversation

@zml1206
Copy link
Contributor

@zml1206 zml1206 commented Feb 22, 2023

What changes were proposed in this pull request?

Extend the CollapseWindow rule to collapse Window nodes, when one window in subquery.

Why are the changes needed?

select a, b, c, row_number() over (partition by a order by b) as d from
( select a, b, rank() over (partition by a order by b) as c from t1) t2

== Optimized Logical Plan ==
before
Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS d#26], [a#11], [b#12 ASC NULLS FIRST]
+- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS c#25], [a#11], [b#12 ASC NULLS FIRST]
   +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 replicas)
         +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
            +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#7]
               +- *(1) MapElements org.apache.spark.sql.DataFrameSuite$$Lambda$1517/1628848368@3a479fda, obj#5: scala.Tuple2
                  +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: java.lang.Long
                     +- *(1) Range (0, 10, step=1, splits=2)

after
Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS d#26], [a#11], [b#12 ASC NULLS FIRST]
+- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 replicas)
      +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
         +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#7]
            +- *(1) MapElements org.apache.spark.sql.DataFrameSuite$$Lambda$1518/1928028672@4d7a64ca, obj#5: scala.Tuple2
               +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: java.lang.Long
                  +- *(1) Range (0, 10, step=1, splits=2)

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@github-actions github-actions bot added the SQL label Feb 22, 2023
@zml1206 zml1206 changed the title collapse two adjacent windows with the same partition/order in subquery [SPARK-42525][core]collapse two adjacent windows with the same partition/order in subquery Feb 22, 2023
@zml1206 zml1206 changed the title [SPARK-42525][core]collapse two adjacent windows with the same partition/order in subquery [SPARK-42525][CORE]collapse two adjacent windows with the same partition/order in subquery Feb 22, 2023
@zml1206
Copy link
Contributor Author

zml1206 commented Feb 22, 2023

cc @cloud-fan @wangyum

@wangyum
Copy link
Member

wangyum commented Feb 23, 2023

@zml1206 Could you update the PR title to [SPARK-42525][SQL] Collapse ...?

@zml1206 zml1206 changed the title [SPARK-42525][CORE]collapse two adjacent windows with the same partition/order in subquery [SPARK-42525][SQL]collapse two adjacent windows with the same partition/order in subquery Feb 23, 2023
@zml1206 zml1206 changed the title [SPARK-42525][SQL]collapse two adjacent windows with the same partition/order in subquery [SPARK-42525][SQL] Collapse two adjacent windows with the same partition/order in subquery Feb 23, 2023
@wangyum wangyum closed this in 0c3c819 Feb 26, 2023
@wangyum
Copy link
Member

wangyum commented Feb 26, 2023

Merged to master.

@cloud-fan
Copy link
Contributor

the change LGTM but the PR title is a bit confusing. How is it related to subquery?

@zml1206
Copy link
Contributor Author

zml1206 commented Feb 27, 2023

the change LGTM but the PR title is a bit confusing. How is it related to subquery?

subquery is one of the cases where the qualifiers are different,it's really confusing, is there anything else I can do to modify it

@cloud-fan
Copy link
Contributor

... with semantically-same partition/order

@zml1206 zml1206 changed the title [SPARK-42525][SQL] Collapse two adjacent windows with the same partition/order in subquery [SPARK-42525][SQL] Collapse two adjacent windows with semantically-same partition/order Feb 27, 2023
zml1206 added a commit to zml1206/spark that referenced this pull request Jul 7, 2023
…ion/order in subquery

### What changes were proposed in this pull request?
Extend the CollapseWindow rule to collapse Window nodes, when one window in subquery.

### Why are the changes needed?

```
select a, b, c, row_number() over (partition by a order by b) as d from
( select a, b, rank() over (partition by a order by b) as c from t1) t2

== Optimized Logical Plan ==
before
Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS d#26], [a#11], [b#12 ASC NULLS FIRST]
+- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS c#25], [a#11], [b#12 ASC NULLS FIRST]
   +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 replicas)
         +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
            +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#7]
               +- *(1) MapElements org.apache.spark.sql.DataFrameSuite$$Lambda$1517/16288483683a479fda, obj#5: scala.Tuple2
                  +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: java.lang.Long
                     +- *(1) Range (0, 10, step=1, splits=2)

after
Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS d#26], [a#11], [b#12 ASC NULLS FIRST]
+- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 replicas)
      +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
         +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#7]
            +- *(1) MapElements org.apache.spark.sql.DataFrameSuite$$Lambda$1518/19280286724d7a64ca, obj#5: scala.Tuple2
               +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: java.lang.Long
                  +- *(1) Range (0, 10, step=1, splits=2)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes apache#40115 from zml1206/SPARK-42525.

Authored-by: zml1206 <zhuml1206@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants