-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38214][SS]No need to filter windows when windowDuration is multiple of slideDuration #35526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
improve structured streaming window of calculated
|
I'll check it later |
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
|
Is this a follow-up of #35362? Looks like a different one. But seems okay. Will re-check it later. |
|
Yeah I meant additional optimization along with previous one. Sorry if I confused you. |
|
Can one of the admins verify this patch? |
|
Sorry,it's my fault.I mixed the update history of the branch of the previous with the present, caused interference and misunderstanding. |
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending tests. Thanks for the contribution!
sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
Outdated
Show resolved
Hide resolved
|
I'll leave this in a day to see the chance of another reviews from others. I'll merge this tomorrow if there's no new feedback. |
|
OK, no feedback on working hour in US timezone. Thanks! Merging to master. |
|
Thanks @nyingping for the contribution! I merged into master. |
|
@HeartSaVioR Thank you for review very much! |
What changes were proposed in this pull request?
At present, the sliding window adopts the form of expand + filter, but in some cases, filter is not necessary.
Filtering is required if the sliding window is irregular. When the window length is divided by the slide length the result is an integer (I believe this is also the case for most work scenarios in practice for sliding window), there is no need to filter, which can save calculation resources and improve performance.
Why are the changes needed?
save calculation resources and improve performance.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
UT and benchmark.
simple benchmark in this commit ,thanks HeartSaVioR@d532b6f
-------case 1
Result:
-------case 2
Result: