-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14353] Dataset Time Window window API for R
#12141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #54819 has finished for PR 12141 at commit
|
| #'} | ||
| setMethod("window", signature(x = "Column"), | ||
| function(x, windowDuration, slideDuration = NULL, startTime = NULL) { | ||
| if (!is.null(slideDuration) && !is.null(startTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check we check the type of windowDuration, slideDuration and startTime?
|
@brkyvz LGTM, only one minor comment. |
|
@davies Addressed your comment |
|
Test build #54917 has finished for PR 12141 at commit
|
|
Retest this please
|
| "var_samp", | ||
| "weekofyear", | ||
| "when", | ||
| "window", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be added to https://github.com/apache/spark/blob/master/R/pkg/R/generics.R?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to check window from the stats package can still be called?
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/window.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. window can be called using stats::window
|
retest this please |
|
Test build #54930 has finished for PR 12141 at commit
|
|
Test build #54934 has finished for PR 12141 at commit
|
|
Test build #54938 has finished for PR 12141 at commit
|
|
Test build #54946 has finished for PR 12141 at commit
|
| })) | ||
| maskedCompletely <- masked[!funcHasAny] | ||
| namesOfMaskedCompletely <- c("cov", "filter", "sample") | ||
| namesOfMaskedCompletely <- c("cov", "filter", "sample", "window") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this fails that means the current definition will block the stats::window function completely, (ie. user will not be able to call it without altering the code with stats:: prefix)
this is in fact delibrate - to avoid masking the function by accident. for instance, please set the generic (in R/pkg/R/generics.R) as window(x, ...) to match the stats package definition - generally that would allow both to be called at the same time.
@sun-rui @shivaram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... Thanks for pointing it out!
|
Test build #54953 has finished for PR 12141 at commit
|
|
LGTM, merging into master. |
What changes were proposed in this pull request?
The
windowfunction was added to Dataset with this PR.This PR adds the R API for this function.
With this PR, SQL, Java, and Scala will share the same APIs as in users can use:
window(timeColumn, windowDuration)window(timeColumn, windowDuration, slideDuration)window(timeColumn, windowDuration, slideDuration, startTime)In Python and R, users can access all APIs above, but in addition they can do
window(timeColumn, windowDuration, startTime=...)that is, they can provide the startTime without providing the
slideDuration. In this case, we will generate tumbling windows.How was this patch tested?
Unit tests + manual tests