Implement `map_vec()` #683

DavisVaughan · 2019-06-18T22:38:46Z

This PR implements map_vec(), a vctrs powered map() that attempts to simplify the output automatically by computing the common type of the result elements. If a common type can be found, the result will be simplified to that type. If not, a list is returned.

This is a draft PR for now as I'm not currently proposing that it be merged. I'd like to get feedback and see what others think about it.

Add tests
Add examples

This function maintains the invariant that:

vec_size(.x) == vec_size(map_vec(.x))

After glancing at the issues, it seems this PR touches: #679, #633, #472, #376

For now, here are a number of examples that demonstrate the features of map_vec():

library(vctrs)
library(purrr)
library(tibble)
options(rlang_backtrace_on_error = "reminder")

# Auto simplified to an integer vector
map_vec(1:2, ~ .x)
#> [1] 1 2

# Prevent simplification with `.ptype = list()`
map_vec(1:2, ~ .x, .ptype = list())
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2

# If a `.ptype` is specified, and casting to that type
# is not possible, an error is raised
map_vec(1:2, ~ .x, .ptype = factor())
#> Can't cast <integer> to <factor<>>

# If simplification is possible, all elements must have size 1,
# otherwise an error is raised
map_vec(1:2, ~ if (.x == 1L) 1:2 else 3)
#> Result 1 must be a single double, not an integer vector of length 2

# If you use `.ptype = list()`, this is pseudo-relaxed
# (The size of the output is still the same as the 
# size of the input (2), but each inner element can vary in size)
map_vec(1:2, ~ if (.x == 1L) 1:2 else 3, .ptype = list())
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 3

# The best thing about `map_vec()` is its flexibility with other
# non-atomic types, for example, simplifying to a date vector
map_vec(1:2, ~ Sys.Date() + .x)
#> [1] "2019-06-19" "2019-06-20"

# If a common type cannot be determined, a list is returned
map_vec(list(1, "x"), ~ .x)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "x"

# Note that just because a common type isn't found doesn't mean you
# can't still forcibly cast to a certain type.
map_vec(list(1, "x"), ~ .x, .ptype = character())
#> [1] "1" "x"

# The underlying `vec_c()` engine knows how to combine 
# data frame output, but this isn't a `map_dfr()` replacement
map_vec(1:2, ~ tibble(x = .x))
#> # A tibble: 2 x 1
#>       x
#>   <int>
#> 1     1
#> 2     2

# You can enforce the structure of the data frame output with a ptype.
# This has the same result as before but coerces the integers to characters
ptype <- tibble(x = character())
map_vec(1:2, ~ tibble(x = .x), .ptype = ptype)
#> # A tibble: 2 x 1
#>   x    
#>   <chr>
#> 1 1    
#> 2 2

# And this errors
map_vec(1:2, ~ tibble(x = .x, y = 1), .ptype = ptype)
#> Lossy cast from <tbl_df<
#>   x: integer
#>   y: double
#> >> to <tbl_df<x:character>>.
#> Dropped variables: `y`

# Or you can enforce a partial structure with a partial_frame()
# This ensures that a `y` column exists
partial_ptype <- partial_frame(y = numeric())
map_vec(1:2, ~ tibble(x = .x), .ptype = partial_ptype)
#> # A tibble: 2 x 2
#>       x     y
#>   <int> <dbl>
#> 1     1    NA
#> 2     2    NA

# Turning on strict mode forces the user to provide a .ptype
withr::with_options(
  list(vctrs.no_guessing = TRUE),
  map_vec(1:5, ~ .x)
)
#> Error: strict mode is activated; you must supply complete `.ptype`.

DavisVaughan · 2019-06-18T22:45:07Z

A few questions:

Should a .name_repair argument be passed to vec_c()?
If a .ptype is specified and vec_c() fails because it cannot cast to the .ptype, we currently get a vctrs cast error. Should that be more informative? Can it be (we don't have the index where the cast failure occured)?

DavisVaughan · 2019-06-18T22:51:10Z

Other things vctrs can help with:

Using vec_rbind() in map_dfr()
Using vec_cbind() in map_dfc()
Possibly powering map_dbl() and friends by just doing map_vec(.ptype = double())?
Altering the current recycling rules in purrr. Current thinking suggests that this should produce an error purrr::map2(1:2, integer(), ~ .x)

lionel- · 2019-06-19T06:45:49Z

Last time we talked about this with Hadley, the idea was to have funs::s() to simplify an output. Then our sapply would be s(map(...)). This approach seems better than multiplying the number of functions in purrr / other packages as it is more composable.

Similarly, map_int(...) could be int(map(...)). However the typed variants are more efficient because they can omit the intermediate representation as a list. In general, performance is a reason to provide variants.

Does this suggest we should have map_vec() in addition to s(map()) if it can help performance? The prototype should be computed in advance and then we'd build the vector and vec_slice into it. This would require exporting some of the C API from vctrs.

DavisVaughan · 2019-06-21T13:28:42Z

I see that mention of funs::s() here. I guess it would be like s(x, ptype = NULL)?

I am a bit worried about performance here. If purrr was being implemented from scratch today, I feel like a C version of map_vec() would still exist just for its performance gains. I think even map_int() and some of the more common variants might even still exist because it is so much easier to type than map_vec(x, fun, .ptype = int()) when you want to guarantee that the result is an integer + have the performance.

I do think generally funs::s() will be a nice solution to not multiply the number of these kinds of variants in other packages, but purrr feels like an exception.

DavisVaughan · 2019-06-29T23:15:13Z

Thought: How can we compute the prototype of the output in advance if we need all elements of the output to determine the common type? Unless, .ptype is supplied, we can't? Which means 2 allocations are needed, one for a list to contain the n results, and then one for the common type container of size n.

hadley · 2022-08-27T18:55:43Z

Superseded by #894

DavisVaughan added 3 commits June 18, 2019 18:13

Document

cb783b4

Import vctrs

93a4f37

Implement map_vec()

662edaf

DavisVaughan mentioned this pull request Jul 22, 2019

Basic map implementation r-lib/vctrs#495

Closed

Update map_vec() semantics with new understanding of list handling

6fda8fd

hadley mentioned this pull request Aug 27, 2022

treat vctrs record rcrd as a list #819

Closed

hadley closed this Aug 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `map_vec()` #683

Implement `map_vec()` #683

DavisVaughan commented Jun 18, 2019 •

edited

Loading

DavisVaughan commented Jun 18, 2019

DavisVaughan commented Jun 18, 2019 •

edited

Loading

lionel- commented Jun 19, 2019

DavisVaughan commented Jun 21, 2019 •

edited

Loading

DavisVaughan commented Jun 29, 2019

hadley commented Aug 27, 2022

Implement map_vec() #683

Implement map_vec() #683

Conversation

DavisVaughan commented Jun 18, 2019 • edited Loading

DavisVaughan commented Jun 18, 2019

DavisVaughan commented Jun 18, 2019 • edited Loading

lionel- commented Jun 19, 2019

DavisVaughan commented Jun 21, 2019 • edited Loading

DavisVaughan commented Jun 29, 2019

hadley commented Aug 27, 2022

Implement `map_vec()` #683

Implement `map_vec()` #683

DavisVaughan commented Jun 18, 2019 •

edited

Loading

DavisVaughan commented Jun 18, 2019 •

edited

Loading

DavisVaughan commented Jun 21, 2019 •

edited

Loading