Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to coalesce by column with data frames? #48

Closed
DavisVaughan opened this issue Apr 24, 2020 · 1 comment · Fixed by #80
Closed

Option to coalesce by column with data frames? #48

DavisVaughan opened this issue Apr 24, 2020 · 1 comment · Fixed by #80

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Apr 24, 2020

Using the vctrs definition of a "missing row" being a missing value for data frames, coalesce() might not do what you expect. Here, only the row with all missing values is updated. It might be nice to have a way to update each column separately.

You could map2() over the data frames, but that would require that you'd already casted them to the same data frame type, and I don't think it generalizes that nicely to >2 data frames

It is possible that we need an idea of vec_coalesce() and df_coalesce() for this new case

# devtools::install_github("r-lib/funs")

library(funs)

df1 <- data.frame(x = c(NA, 1, NA), y = c(1, NA, NA))
df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2))

df1
#>    x  y
#> 1 NA  1
#> 2  1 NA
#> 3 NA NA

coalesce(df1, df2)
#>    x  y
#> 1 NA  1
#> 2  1 NA
#> 3  2  2

Created on 2020-04-24 by the reprex package (v0.3.0)

Inspired by
https://github.com/tidyverse/dplyr/pull/5142/files#diff-3680f0191de36a0e61d4b24cdb1ab150R149

rows_patch.data.frame <- function(x, y, by = NULL, ..., copy = FALSE, inplace = NULL) {
  y <- auto_copy(x, y, copy = copy)
  y_key <- df_key(y, by)
  x_key <- df_key(x, names(y_key))
  df_inplace(inplace)

  idx <- vctrs::vec_match(y[y_key], x[x_key])
  # FIXME: Check key in x? https://github.com/r-lib/vctrs/issues/1032

  # FIXME: Do we need vec_coalesce()
  new_data <- map2(x[idx, names(y)], y, coalesce)

  x[idx, names(y)] <- new_data
  x
}
@lionel-
Copy link
Member

lionel- commented Jul 15, 2020

Also tackled in tidyverse/dplyr#5334

df1 <- data.frame(x = c(NA, 1, NA), y = c(1, NA, NA))
df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2))

dplyr::coalesce(df1, df2)
#>   x y
#> 1 2 1
#> 2 1 2
#> 3 2 2

funs::coalesce(df1, df2)
#>    x  y
#> 1 NA  1
#> 2  1 NA
#> 3  2  2

I'm tempted to generally offer a direction argument when semantics are useful across rows and across columns. But in this case, a potentially better way to tackle this is the "complete-cases" viewpoint. This might be more consistent. Currently the row-coalescence behaviour is a bit off because the target row must be completely missing, but the source row might not be:

df1 <- data.frame(x = c(NA, 1, NA), y = c(NA, NA, NA))
df2 <- data.frame(x = c(2, 2, 2), y = c(2, 2, 2))
df3 <- data.frame(x = c(NA, 3, 3), y = c(3, 3, 3))

# Only fully missing rows are coalesced
funs::coalesce(df1, df2)
#>   x  y
#> 1 2  2
#> 2 1 NA
#> 3 2  2

# But we allow partially missing coalescence
funs::coalesce(df1, df3)
#>    x  y
#> 1 NA  3
#> 2  1 NA
#> 3  3  3

# Once partially filled out, no more coalescence is possible
funs::coalesce(df1, df3, df2)
#>    x  y
#> 1 NA  3
#> 2  1 NA
#> 3  3  3

Davis will add a complete cases predicate to vctrs but how do we slice-coalesce the values? Maybe we need a binary vec_coalesce() operation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants