-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19818][SparkR] rbind should check for name consistency of input data frames #17159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The current implementation accepts data frames with different schemas. See issues below: |
|
Test build #73888 has finished for PR 17159 at commit
|
|
Test build #73895 has finished for PR 17159 at commit
|
|
Test build #73897 has finished for PR 17159 at commit
|
|
hmm... this is somewhat by design in Spark - Do you see this as something that might be unexpected for R users (in which case |
|
@felixcheung OK, did not know it was by design. It does seem that the |
|
I think it's a good idea to get SparkR |
|
Makes sense. Made changes to rbind and added tests. Please take a look. Thanks. |
|
Test build #73941 has finished for PR 17159 at commit
|
|
Test build #73947 has finished for PR 17159 at commit
|
| #' | ||
| #' Union two or more SparkDataFrames. This is equivalent to \code{UNION ALL} in SQL. | ||
| #' Union two or more SparkDataFrames by row. In constrast to \link{union}, this method | ||
| #' requires that the input SparkDataFrames have the same column names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just say, as in R's rbind, this method requires...
btw, should we care about data type matching - does R's rbind check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated doc. R's rbind seems to do type conversion similarly to union:
df <- data.frame(name = c("Michael", "Andy", "Justin"), age = c(1, 30, 19))
df2 <- df
df2$age <- as.character(df2$age)
rbind(df, df2)
name age
1 Michael 1
2 Andy 30
3 Justin 19
4 Michael 1
5 Andy 30
6 Justin 19
str(rbind(df, df2))
'data.frame': 6 obs. of 2 variables:
$ name: Factor w/ 3 levels "Andy","Justin",..: 3 1 2 3 1 2
$ age : chr "1" "30" "19" "1" ...
|
Test build #73954 has finished for PR 17159 at commit
|
|
merged to master. thanks |
What changes were proposed in this pull request?
Added checks for name consistency of input data frames in union.
How was this patch tested?
new test.