Basic map implementation #495

hadley · 2019-07-22T15:24:23Z

Need to benchmark the two strategies: in the first we just put in a list and then vec_c() at the end; in the second we assume that the type is mostly stable so we can create the space for output in the beginning, and only occasionally change.

vec_map <- function(.x, .f, ..., .ptype = NULL) {
  .x <- lapply(.x, .f, ...)
  vec_c(!!!.x, .ptype = .ptype)
}


vec_map <- function(.x, .f, ..., .ptype = NULL) {
  first <- .x[[1]]
  first_out <- .f(.x, ...) 
  
  out <- vec_na(first_out, vec_size(.x)) 
  vec_slice(out, 1) <- first_out

  for (i in seq2(2, vec_size(.x))) {
    outi <- .f(vec_slice(x, i), ...)
    if (vec_is(outi, out)) {
      # change type
    }
    vec_slice(out, i) <- outi
  }
  
  out
}

The text was updated successfully, but these errors were encountered:

hadley · 2019-07-22T15:26:10Z

My implementation isn't quite right because vec_slice() is like [, but we need something like [[.

DavisVaughan · 2019-07-22T15:45:02Z

Size 1 constraint too?

library(vctrs)

vec_map <- function(.x, .f, ..., .ptype = NULL) {
  .x <- lapply(.x, .f, ...)
  vec_c(!!!.x, .ptype = .ptype)
}

vec_map(list(1, 1:2), function(x) {x})
#> [1] 1 1 2

^{Created on 2019-07-22 by the reprex package (v0.2.1)}

hadley · 2019-07-22T16:02:17Z

Oh yeah, good point.

DavisVaughan · 2019-07-22T16:37:25Z

Related / my attempt: tidyverse/purrr#683

lionel- · 2019-07-22T16:39:01Z

Shouldn't we expect vec_map() to pass scalar rows to .fn?

lionel- · 2019-07-22T16:39:45Z

I would expect the [[ extraction with map_vec() rather than vec_map().

DavisVaughan · 2019-07-22T16:40:29Z

i.e. .f(vec_slice(.x, i), ...) rather than .f(.x[[i]], ...). If this is to replace compat-purrr.R then maybe it should do [[ to be less surprising. If it should be vctrs-y then it seems like passing scalar rows to .f would be correct. But then again we don't have a [[ variant, so maybe if we had that then we could say we were "always iterating over the elements returned by vec_rip()".

lionel- · 2019-07-22T18:43:17Z

Since there is no hurry for replacing compat-purrr.R, getting the semantics vctrs-y seems more important.

Unlike slide() which always returns a list, vec_map() would have to wrap in a list or data frame to return ragged elements, otherwise we get a size error.

lionel- · 2019-07-22T18:44:57Z

Also purrr is now about as lightweight as vctrs in terms of dependencies, so purrr itself can replace compat-purrr.R

DavisVaughan · 2019-07-23T12:21:37Z

@lionel- and i talked about this a bit more, based on our notes I have come up with this for vec_map(), which is subtly different from map() when working with lists:

library(vctrs)

vec_map <- function(.x, .f, ..., .ptype = list()) {
  if (is.null(.ptype)) {
    .ptype <- list()
    simplify <- TRUE
  } else {
    simplify <- FALSE
  }
  
  .f <- as_function(.f)
  
  out <- vec_map_impl(.x, .f, ..., .ptype = .ptype)
  
  if (simplify) {
    out <- vec_c(!!!out)
  }
  
  out
}

vec_map_impl <- function(.x, .f, ..., .ptype) {
  out <- vec_init(.ptype, vec_size(.x)) 
  
  for (i in seq2(1, vec_size(.x))) {
    elt <- .f(vec_slice(.x, i), ...)
    elt <- vec_cast(elt, .ptype)
    out <- vec_assign(out, i, elt)
  }
  
  out
}

# each element is castable to a length 1 list
vec_map(1:3, ~.x)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3

# cannot do this, unlike map(). c(.x, 1) is cast to a length 2 list, which
# cannot be assigned to a single location in the output
vec_map(1:3, ~c(.x, 1))
#> Error: Incompatible lengths: 2, 1

# must wrap in list for it to work
vec_map(1:3, ~list(c(.x, 1)))
#> [[1]]
#> [1] 1 1
#> 
#> [[2]]
#> [1] 2 1
#> 
#> [[3]]
#> [1] 3 1

vec_map(1:3, ~.x, .ptype = double())
#> [1] 1 2 3

vec_map(1:3, ~.x, .ptype = NULL)
#> [1] 1 2 3

vec_map(1:3, ~if (.x == 1) {"ugh"} else {1}, .ptype = NULL)
#> No common type for `..1` <character> and `..2` <double>.

hadley · 2019-07-23T12:55:13Z

That assumes that .ptype is supplied; I'm mostly interested in the case where it's not and we want to work as efficiently as possible — if we assume that the function is type-stable, we can figure out the type based on the first result, and use it, only widening the type if it becomes necessary later on. I want to understand the performance characteristics of the two approaches so we can figure out which approach to take inside of dplyr.

lionel- · 2019-07-23T13:04:27Z

The simplify step seems suspect to me. Could you init a ptype vector, then cast each result to it and assign-splice the results outside the loop in out? The results would be generated by map().

DavisVaughan · 2019-07-23T18:37:09Z

Yea i think the simplify part is wrong.

cars <- mtcars[1:3,]

# this is wrong
vec_map(1:3, ~list(cars), .ptype = NULL)
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 5 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 6 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 7 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 8 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 9 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

# ^ should return what this does
vec_map(1:3, ~list(list(cars)), .ptype = NULL)
#> [[1]]
#>                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 
#> [[2]]
#>                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 
#> [[3]]
#>                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

DavisVaughan · 2019-07-24T13:36:53Z

(This is not related to the performance of the two methods mentioned in the original comment, but this seems relevant to the conversation anyways)

A few more notes that came out of some slack conversation that might be useful to have documented:

purrr::map() can be thought of as a special function that preserves the type and size of each of the outputs from .f, without any casting.

This makes it useful as a building block for flat-map techniques, with map_dfr() being a good example as just map() + bind_rows() (which can return an object with a size != vec_size(.x)).

map_int() and the other suffix functions are different, and don't have the flat-map semantics. Instead they force the output from .f to conform to the required type (integer) and have size 1.

vec_map() should be considered more like map_int() than map(). In fact, if there was a map_list() then that would be equivalent to the defaults of vec_map() (.ptype = list()). vec_map() will force the output from .f to have type list, and size 1.

This results in the following difference between vec_map() and map():

library(purrr)
library(vctrs)
library(rlang, warn.conflicts = FALSE)

vec_map <- function(.x, .f, ..., .ptype = list()) {
  .f <- as_function(.f)
  out <- lapply(.x, .f, ...)
  
  if (vec_size_common(!!! out) != 1L) {
    abort("All results must have size 1.")
  }
  
  vec_c(!!! out, .ptype = .ptype)
}

# maintain type/size of `.f` result, no matter what that may be
map(1:2, ~c(.x, 1))
#> [[1]]
#> [1] 1 1
#> 
#> [[2]]
#> [1] 2 1

# force result of `.f` to be a list of size 1
vec_map(1:2, ~c(.x, 1))
#> All results must have size 1.

# fails because of how this works (which is expected and good).
# result of `.f` is cast to list, but becomes length 2
vec_cast(c(2, 1), list())
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 1

# must wrap in a list of size 1
vec_map(1:2, ~list(c(.x, 1)))
#> [[1]]
#> [1] 1 1
#> 
#> [[2]]
#> [1] 2 1

I think this means:

map() should not ever get a .ptype argument
In theory map_int() could use vec_map(.ptype = int())?
slide() should work similarly, and not get a .ptype argument
But there could be vec_slide() which works like vec_map(), and then slide_int() could be built from that

lionel- · 2022-09-12T15:53:03Z

Closing in favour of tidyverse/purrr#894

lionel- added the feature a feature request or enhancement label Apr 20, 2020

maxheld83 mentioned this issue Feb 25, 2021

treat vctrs record rcrd as a list tidyverse/purrr#819

Closed

lionel- closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic map implementation #495

Basic map implementation #495

hadley commented Jul 22, 2019 •

edited

Loading

hadley commented Jul 22, 2019

DavisVaughan commented Jul 22, 2019

hadley commented Jul 22, 2019

DavisVaughan commented Jul 22, 2019

lionel- commented Jul 22, 2019

lionel- commented Jul 22, 2019

DavisVaughan commented Jul 22, 2019 •

edited

Loading

lionel- commented Jul 22, 2019

lionel- commented Jul 22, 2019

DavisVaughan commented Jul 23, 2019 •

edited

Loading

hadley commented Jul 23, 2019

lionel- commented Jul 23, 2019

DavisVaughan commented Jul 23, 2019

DavisVaughan commented Jul 24, 2019

lionel- commented Sep 12, 2022

Basic map implementation #495

Basic map implementation #495

Comments

hadley commented Jul 22, 2019 • edited Loading

hadley commented Jul 22, 2019

DavisVaughan commented Jul 22, 2019

hadley commented Jul 22, 2019

DavisVaughan commented Jul 22, 2019

lionel- commented Jul 22, 2019

lionel- commented Jul 22, 2019

DavisVaughan commented Jul 22, 2019 • edited Loading

lionel- commented Jul 22, 2019

lionel- commented Jul 22, 2019

DavisVaughan commented Jul 23, 2019 • edited Loading

hadley commented Jul 23, 2019

lionel- commented Jul 23, 2019

DavisVaughan commented Jul 23, 2019

DavisVaughan commented Jul 24, 2019

lionel- commented Sep 12, 2022

hadley commented Jul 22, 2019 •

edited

Loading

DavisVaughan commented Jul 22, 2019 •

edited

Loading

DavisVaughan commented Jul 23, 2019 •

edited

Loading