Skip to content

Conversation

@nyingping
Copy link
Contributor

@nyingping nyingping commented Feb 15, 2022

What changes were proposed in this pull request?

At present, the sliding window adopts the form of expand + filter, but in some cases, filter is not necessary.

Filtering is required if the sliding window is irregular. When the window length is divided by the slide length the result is an integer (I believe this is also the case for most work scenarios in practice for sliding window), there is no need to filter, which can save calculation resources and improve performance.

Why are the changes needed?

save calculation resources and improve performance.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

UT and benchmark.

simple benchmark in this commit ,thanks HeartSaVioR@d532b6f


-------case 1

spark.range(numOfRow)
.selectExpr("CAST(id AS timestamp) AS time")
.select(window(col("time"), "15 seconds", "3 seconds", "2 seconds"))
.count()

Result:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_291-b10 on Windows 10 10.0
AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
sliding windows:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
old logic                                           799            866          70         12.5          79.9       1.0X
new logic                                            58             68           9        171.2           5.8      13.7X

-------case 2

spark.range(numOfRow)
.selectExpr("CAST(id AS timestamp) AS time")
.select(window(col("time"), "10 seconds", "5 seconds", "2 seconds"))
.count()

Result:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_291-b10 on Windows 10 10.0
AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
sliding windows:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
old logic                                           359            409          52         27.8          35.9       1.0X
new logic                                            47             54           4        212.6           4.7       7.6X

@github-actions github-actions bot added the SQL label Feb 15, 2022
@nyingping
Copy link
Contributor Author

I'll check it later

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK.
@viirya Would you mind validating this fix as a follow-up of #35362? Thanks in advance!

@viirya
Copy link
Member

viirya commented Feb 16, 2022

Is this a follow-up of #35362? Looks like a different one. But seems okay. Will re-check it later.

@HeartSaVioR
Copy link
Contributor

Yeah I meant additional optimization along with previous one. Sorry if I confused you.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@nyingping
Copy link
Contributor Author

Sorry,it's my fault.I mixed the update history of the branch of the previous with the present, caused interference and misunderstanding.

@nyingping nyingping changed the title [SPARK-38214][SS]No need to filter data when the sliding window length is not redundant [SPARK-38214][SS]No need to filter windows when windowDuration is multiple of slideDuration Feb 17, 2022
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending tests. Thanks for the contribution!

@HeartSaVioR
Copy link
Contributor

I'll leave this in a day to see the chance of another reviews from others. I'll merge this tomorrow if there's no new feedback.

@HeartSaVioR
Copy link
Contributor

OK, no feedback on working hour in US timezone.

Thanks! Merging to master.

@HeartSaVioR
Copy link
Contributor

Thanks @nyingping for the contribution! I merged into master.

@nyingping
Copy link
Contributor Author

@HeartSaVioR Thank you for review very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants