Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joins with custom suffixes #48

Merged
merged 4 commits into from
May 14, 2017

Conversation

christophsax
Copy link
Contributor

Support for the suffix argument in joins.

This can be directly used as the suffixes argument in merge.data.table. The two non-merge methods, semi_join and anti_join, do not have a suffix argument.

Also added a small test.

fixes #40.

@lionel-
Copy link
Member

lionel- commented May 11, 2017

Thanks. You can use git pull --rebase to avoid those merge commits by the way.

@lionel- lionel- requested a review from krlmlr May 11, 2017 19:49
Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I wonder if extracting a function with argument list x, y, by, copy, suffix, all.x, all.y will improve this code.

R/joins.R Outdated
by <- dplyr::common_by(by, x, y)
y <- dplyr::auto_copy(x, y, copy = copy)
out <- merge(x, y, by.x = by$x, by.y = by$y, all = FALSE, allow.cartesian = TRUE)
out <- merge(x, y, by.x = by$x, by.y = by$y, all = FALSE, suffixes = suffix,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write

out <- merge(
  x, y, by.x = ...,
  all = FALSE, ..., allow.cartesian = TRUE
)

We do have a style guide, but I'm not sure this particular part is covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the first argument is on a newline, all arguments should be on their own line and the closing parentheses. Otherwise you fill the lines to the 72nd or 80th column and leave the closing parenthese with the final argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd admit multiple related arguments on the same line in multi-line calls. No need to occupy too much vertical space. What's the rationale behind your approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use either a vertical arrangement or a horizontal one. Here I would use the latter. I don't think the layout you suggested is standard in the tidyverse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decision will be more important in the styler project, CC @lorenzwalthert. For this PR I'm fine with either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I stick to this section in the style guide and to @krlmlr's suggestion to put related arguments on one line, I would write:

merge(
  x, y, 
  by.x = by$x, by.y = by$y, 
  all.x = TRUE, 
  suffixes = suffix, 
  allow.cartesian = TRUE
)

Ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we extract it in a function, we would have something like this?

join_using_merge <- function(x, y, by, copy, suffix, 
                             all.x = FALSE, all.y = FALSE){
  by <- dplyr::common_by(by, x, y)
  y <- dplyr::auto_copy(x, y, copy = copy)
  out <- merge(
    x, y, 
    by.x = by$x, by.y = by$y, 
    all.x = all.x, all.y = all.y
    suffixes = suffix, 
    allow.cartesian = TRUE
  )
  grouped_dt(out, groups(x)) 
}

#' @rdname join.tbl_dt
inner_join.data.table <- function(x, y, by = NULL, copy = FALSE, 
                                  suffix = c(".x", ".y"), ...){
  join_using_merge(x, y, by = by, copy = copy, suffix = suffix)
}

#' @rdname join.tbl_dt
left_join.data.table <- function(x, y, by = NULL, copy = FALSE, 
                                  suffix = c(".x", ".y"), ...){
  join_using_merge(x, y, by = by, copy = copy, suffix = suffix, all.x = TRUE)
}

#' @rdname join.tbl_dt
right_join.data.table <- function(x, y, by = NULL, copy = FALSE, 
                                  suffix = c(".x", ".y"), ...){
  join_using_merge(x, y, by = by, copy = copy, suffix = suffix, all.y = TRUE)
}

#' @rdname join.tbl_dt
full_join.data.table <- function(x, y, by = NULL, copy = FALSE, 
                                  suffix = c(".x", ".y"), ...){
  join_using_merge(x, y, 
    by = by, 
    copy = copy, 
    suffix = suffix, 
    all.x = TRUE, all.y = TRUE
    )
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krlmlr, happy with the code above? Would commit in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the join_using_merge() function. Please unindent the closing paren in the call in full_join.data.table().

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just a few very minor remarks before merge.

R/joins.R Outdated
out <- merge(x, y, by.x = by$x, by.y = by$y, all.y = TRUE, allow.cartesian = TRUE)
grouped_dt(out, groups(x))
left_join.data.table <- function(x, y, by = NULL, copy = FALSE,
suffix = c(".x", ".y"), ...){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review indention.

R/joins.R Outdated

#' @rdname join.tbl_dt
full_join.data.table <- function(x, y, by = NULL, copy = FALSE,
suffix = c(".x", ".y"), ...){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

NEWS.md Outdated
@@ -1,6 +1,8 @@
# dtplyr 0.0.2.9000

- joins use extended `merge.data.table()` and the `on` argument, introduced in
- Joins with custom suffixes (#40).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make this a bit more verbose, and credit yourself?

@krlmlr krlmlr merged commit 2308ff2 into tidyverse:master May 14, 2017
@krlmlr
Copy link
Member

krlmlr commented May 14, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Joins with custom suffixes on tbl_dt()
3 participants