Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map_dfc() fails when the result contains list columns #376

Closed
yutannihilation opened this issue Aug 23, 2017 · 3 comments · Fixed by #912
Closed

map_dfc() fails when the result contains list columns #376

yutannihilation opened this issue Aug 23, 2017 · 3 comments · Fixed by #912
Labels
breaking change ☠️ API change likely to affect existing code bug an unexpected problem or unintended behavior map 🗺️ vctrs ♣️
Milestone

Comments

@yutannihilation
Copy link
Member

Sometimes we have a data.frame-like list and want to apply some function and harvest the result as data.frame. map_dfc() is quite useful for this purpose:

not_nested <- list(
  col1 = c("Apple", "Orange"),
  col2 = c("Baseball", "Football")
)

purrr::map_dfc(not_nested, purrr::map_chr, sprintf, fmt = "I like %s")
#> # A tibble: 2 x 2
#>            col1            col2
#>           <chr>           <chr>
#> 1  I like Apple I like Baseball
#> 2 I like Orange I like Football

But it fails if the result contains list columns:

nested <- list(
  col1 = list(
    c("Apple", "Banana"),
    c("Orange")
  ),
  col2 = list(
    c("Baseball", "Soccer"),
    c("Football")
  )
)

purrr::map_dfc(nested, purrr::map, sprintf, fmt = "I like %s")
#> Error in cbind_all(x): Argument 2 must be length 2, not 1

Is it possible to get the result as data.frame like bellow?

tibble::as_tibble(
  purrr::map(nested, purrr::map, sprintf, fmt = "I like %s")
)
#> # A tibble: 2 x 2
#>        col1      col2
#>      <list>    <list>
#> 1 <chr [2]> <chr [2]>
#> 2 <chr [1]> <chr [1]>

(As the error indicates, this seems up to cbind_all(). Should I file this issue to dplyr's repo?)

@hadley hadley added the bug an unexpected problem or unintended behavior label Feb 5, 2018
@lionel-
Copy link
Member

lionel- commented Nov 16, 2018

The problem is that bind_cols() and bind_rows() implicitly splice inputs:

dplyr::bind_cols(y = list(3, 4))
#> # A tibble: 1 x 2
#>      V1    V2
#>   <dbl> <dbl>
#> 1     3     4

dplyr::bind_cols(list(y = 3:4, z = 4:5))
#> # A tibble: 2 x 2
#>       y     z
#>   <int> <int>
#> 1     3     4
#> 2     4     5

So list-columns are spliced as well unless we wrap all inputs:

dplyr::bind_cols(x = 1:2, y = list(3, 4))
#> Error: Argument 2 must be length 2, not 1

dplyr::bind_cols(list(x = 1:2, y = list(3, 4)))
#> # A tibble: 2 x 2
#>       x y
#>   <int> <list>
#> 1     1 <dbl [1]>
#> 2     2 <dbl [1]>

These functions need to replaced with versions that use explicit splicing with !!! before they can support list-columns.

In the meantime, we can wrap list-columns in lists to protect them. This might break existing code, but I think it's worth moving towards a consistent handling of lists in data frames.

@lionel- lionel- added breaking change ☠️ API change likely to affect existing code and removed breaking change ☠️ API change likely to affect existing code labels Nov 16, 2018
@lionel-
Copy link
Member

lionel- commented Nov 19, 2018

Will need to wait until a vctrs replacement for bind_cols() / bind_rows().

@lionel- lionel- added this to the vctrs milestone Nov 30, 2018
@DavisVaughan DavisVaughan mentioned this issue Jun 18, 2019
2 tasks
@hadley
Copy link
Member

hadley commented Aug 24, 2022

I think we can probably now switch from dplyr::bind_cols() to vctrs::vec_cbind():

library(purrr)

nested <- list(
  col1 = list(
    c("Apple", "Banana"),
    c("Orange")
  ),
  col2 = list(
    c("Baseball", "Soccer"),
    c("Football")
  )
)

str(vctrs::vec_cbind(!!!map(nested, map, sprintf, fmt = "I like %s")))
#> 'data.frame':    2 obs. of  2 variables:
#>  $ col1:List of 2
#>   ..$ : chr  "I like Apple" "I like Banana"
#>   ..$ : chr "I like Orange"
#>  $ col2:List of 2
#>   ..$ : chr  "I like Baseball" "I like Soccer"
#>   ..$ : chr "I like Football"

Created on 2022-08-24 by the reprex package (v2.0.1)

hadley added a commit that referenced this issue Sep 8, 2022
The key idea is to introduce a new family of "combining" functions: `list_c()`, `list_rbind()`, and `list_cbind()`, which replace `flatten_lgl()`, `flatten_int()`, `flatten_dbl()`, `flatten_chr()` (now `list_c()`), `flatten_dfc()` (`list_cbind()`), and `flatten_dfr()` (`list_rbind()`). The new functions are straightforward wrappers around vctrs functions, but somehow feel natural in purrr to me. 

This leaves `flatten()`, which had a rather idiosyncratic interface. It's now been replaced by `list_flatten()` which now always removes a single layer of list hierarchy (and nothing else). While working on this I realised that this was actually what `splice()` did, so overall this feels like a major improvement in naming consistency.

With those functions in place we can deprecate `map_dfr()` and `map_dfc()` which are actually "flat" map functions because they combine, rather than simplify, the results. They have never actually belonged with `map_int()` and friends because they don't satisfy the invariant `length(map(.x, .f)) == length(.x)`, and `.f` must return a length-1 result. This also strongly implies that `flat_map()` would just be `map_c()` and is thus not necessary.

* Fixes #376 by deprecating `map_dfc()`
* Fixes #405 by clearly ruling against `map_c()`
* Fixes #472 by deprecating `map_dfr()`
* Fixes #575 by introducing `list_c()`, `list_rbind()`, and `list_cbind()`
* Fixes #757 by deprecating `flatten_dfr()` and `flatten_dfc()`
* Fixes #758 by introducing `list_rbind()` and `list_cbind()`
* Part of #900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code bug an unexpected problem or unintended behavior map 🗺️ vctrs ♣️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants