vec_get() #141

DavisVaughan · 2018-11-26T13:31:57Z

As a complement to vec_slice() @romainfrancois and I have been discussing the utility of a function that would extract 1 observation, but would be more of an analog to [[ than [.

Possible implementation:

vec_rip <- function (x, i) {
  
  if (is.logical(i)) {
    i <- which(i)
  }
  
  stopifnot(is.integer(i) || is.character(i))
  
  # this is new 
  stopifnot(length(i) == 1L)
  
  if (is.null(x)) {
    NULL
  }
  else if (is.data.frame(x)) {
    out <- lapply(x, `[`, i)
    vec_restore(out, x)
  }
  else if (is_vector(x)) {
    d <- vec_dims(x)
    if (d == 1) {
      # this is all that changed?
      if (is.object(x)) {
        # x[i]
        x[[i]]
      }
      else {
        .subset2(x, i)
        # x[[i, drop = FALSE]]
      }
    }
    else if (d == 2) {
      x[i, , drop = FALSE]
    }
    else {
      miss_args <- rep(list(missing_arg()), d - 1)
      eval_bare(expr(x[i, !!!miss_args, drop = FALSE]))
    }
  }
  else {
    stop("`x` must be a vector", call. = FALSE)
  }
}

It would function somewhat like:

vec_rip(list(x = 1, y = 2), 1L)
#> [1] 1
vec_slice(list(x = 1, y = 2), 1L)
#> $x
#> [1] 1

# Not sure about what these should do:
vec_rip(mtcars, 1L)
#>   mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1  21   6  160 110  3.9 2.62 16.46  0  1    4    4
vec_rip(matrix(1:12, nrow = 3), 1L)
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    4    7   10

vec_slice(starwars$films, 1L)
#> [[1]]
#> [1] "Revenge of the Sith"     "Return of the Jedi"     
#> [3] "The Empire Strikes Back" "A New Hope"             
#> [5] "The Force Awakens"

vec_rip(starwars$films, 1L)
#> [1] "Revenge of the Sith"     "Return of the Jedi"     
#> [3] "The Empire Strikes Back" "A New Hope"             
#> [5] "The Force Awakens"

# vec_rip() can only extract 1 observation at a time
vec_slice(starwars$films, 1:2)
#> [[1]]
#> [1] "Revenge of the Sith"     "Return of the Jedi"     
#> [3] "The Empire Strikes Back" "A New Hope"             
#> [5] "The Force Awakens"      
#> 
#> [[2]]
#> [1] "Attack of the Clones"    "The Phantom Menace"     
#> [3] "Revenge of the Sith"     "Return of the Jedi"     
#> [5] "The Empire Strikes Back" "A New Hope"

vec_rip(starwars$films, 1:2)
#> Error in vec_rip(starwars$films, 1:2): length(i) == 1L is not TRUE

~~We are currently undecided on what it "should" do for data frames and matrices. A couple ideas:~~

1) Return the 1 row observation as is (so the same as vec_slice(), this is shown above). This doesn't feel right.
2) Extract the 1 row observation, then coerce it to some lower level type. For data.frames, a list and for matrices, a vector.

~~If 2) is chosen, one question that came up is "what should it do for list columns"? Two possibilities:~~

data("starwars", package = "dplyr")
starwars <- starwars[,c("species", "films")]
one_ob <- vctrs::vec_slice(starwars, 1L)

# don't drop a dimension
as.list(one_ob)
#> $species
#> [1] "Human"
#> 
#> $films
#> $films[[1]]
#> [1] "Revenge of the Sith"     "Return of the Jedi"     
#> [3] "The Empire Strikes Back" "A New Hope"             
#> [5] "The Force Awakens"

# drop 1 dimension
lapply(one_ob, .subset2, 1)
#> $species
#> [1] "Human"
#> 
#> $films
#> [1] "Revenge of the Sith"     "Return of the Jedi"     
#> [3] "The Empire Strikes Back" "A New Hope"             
#> [5] "The Force Awakens"

# but with better support for this
# so it doesn't unravel (i.e. better support 
# for vctr rcrd types)
.subset2(as.POSIXlt(Sys.time()), 1)
#> [1] 26.99161

The text was updated successfully, but these errors were encountered:

romainfrancois · 2018-11-26T13:37:14Z

This would be useful in purrr::map() situations when you iterate on some custom vctr or rcrd type

romainfrancois · 2018-11-26T17:31:27Z

Perhaps related to tidyverse/dplyr#3789

hadley · 2018-11-26T17:33:34Z

To get this included in vctrs, you need to answer the question: What are the invariants that this function would satisfy?

jennybc · 2018-11-26T18:01:25Z

A word I would like to insert into this discussion is "atom". As in, atomic vectors have atoms that are of the same type. I think maybe vec_rip() gets one atom? I don't have great words around it, but part of what makes list-columns hard for people is that it often feels like you're dealing with one extra layer of indexing. I see the appeal of working this out.

romainfrancois · 2018-11-26T18:08:36Z

I've been staring at this for some time now, and yes this comes from iterating on list columns (from purrr, or my toy zap).

Not sure about this, here's a draft:

- atom_type(vctrs<T>) = vctrs<T>
- atom_type(list_of<T>) = <T>

- vec_size(atom_type(vctrs<T>)) == 1
- vec_size(atom_type(list_of<T>)) == <??>

- type( !!!atom_type<T>) = <T>

DavisVaughan · 2018-11-26T22:07:51Z

There was a bit of a misunderstanding between @romainfrancois and I regarding list column behavior. I've marked out the relevant info above.

At this point, I think the implementation is just:

vec_rip <- function(x, i) {
  stopifnot(length(i) == 1L)
  out <- vec_slice(x, i)
  
  if (rlang::is_bare_list(x)) {
    out <- out[[1]]
  }
  
  out
}

I think this all makes sense in the context of atoms if you think of them as "a basic building block." Notice that for "a list of elements", the building block is a single element and not a list of length 1 (which aligns with what vec_rip() is trying to achieve).


The basic building block of	a vector of doubles	is a single double.
The basic building block of	a list of elements	is a single element.
The basic building block of	a data frame	is a single row.
The basic building block of	a matrix is	? (a row or a single entry?)
The basic building block of	an array is	?

I'm a little unsure of what the "atom" of a matrix should be. 1 row would make sense, but every element in the matrix is of the same type so 1 single element might also make sense. Same for arrays.

The only other caveat is that a single rcrd should be treated as 1 object, and not as a list of elements.

romainfrancois · 2018-11-27T06:49:43Z

Could be vec_atom(). The way i think about this is this function should return something sensible for when you iterate on « observations » of the vctr type.

lionel- · 2018-11-28T11:01:03Z

At this point, I think the implementation is just:

Not quite because extracting an observation from an atomic vector should zap the names and all vector-related attributes.

I think instead of atom, we should think about observations. Then this function would better be called vec_obs() and would be what a purrr::map_obs() mapper would use to subset its input.

lionel- · 2018-11-28T12:13:10Z

Not quite because extracting an observation from an atomic vector should zap the names and all vector-related attributes.

Hmm... Is that true? What would an observation from a factor be in that case? A level?

factor(c("a", "b"))[[1]]
#> [1] a
#> Levels: a b

# Does that make sense?
vec_obs(factor(c("a", "b")), 1)
#> [1] "a"

lionel- · 2018-11-28T12:45:12Z

Taking an observation from a data frame is a type-preserving operation. Taking an observation from a list or list_of is not? This problem is harder than I thought. Maybe, somehow, because data frames are atomic (scalars), when taken as a whole?

Slight connection to why flatten(list(mtcars, mtcars)) is (or should be) a no-op, the list is already flat because mtcars is a scalar.

@jennybc's intuition that the notion atomicity is important was spot on.

Still it's not clear what parts of the type should be dropped/preserved when taking an observation of an atomic object. Should we trust the designers of S that vector names are dropped under [[? Has an observation of a matrix lost its row name?

lionel- · 2018-12-04T11:05:28Z

If a data frame can be seen as an atomic vector, when subsetting observations, this would imply that there should be a notion of missing row. Is a missing row a row full of missings? Do we need a predicate are_na() (using plural to denote vectorisation) that would return TRUE for rows that have only missing observations? Can it be that for atomic types implemented with recursive data structures, is_na(x) is every(x, is_na)?

Interestingly:

mtcars[1, ][NA, ]
#>    mpg cyl disp hp drat wt qsec vs am gear carb
#> NA  NA  NA   NA NA   NA NA   NA NA NA   NA   NA

hadley · 2018-12-04T13:46:03Z

(@lionel- that's just mtcars[NA_integer_, ] or vec_na(mtcars))

hadley · 2019-01-28T20:22:51Z

Also worth thinking about this case:

x <- package_version("1.1.1")
identical(x, x[[1]])
#> [1] TRUE

^{Created on 2019-01-28 by the reprex package (v0.2.1.9000)}

krlmlr · 2019-01-28T21:58:19Z

The inverse operation seems interesting too. Would it be too bad if vec_c() wrapped all non-vector objects in a list? The current behavior seems suboptimal either way:

qr <- qr(lm(y ~ x, data.frame(x = 1, y = 1)))
class(qr)
#> [1] "qr"
typeof(qr)
#> [1] "list"
vctrs::vec_c(qr)
#> $qr
#>   (Intercept) x
#> 1           1 1
#> attr(,"assign")
#> [1] 0 1
#> 
#> $qraux
#> [1] 1 1
#> 
#> $pivot
#> [1] 1 2
#> 
#> $tol
#> [1] 1e-07
#> 
#> $rank
#> [1] 1

^{Created on 2019-01-28 by the reprex package (v0.2.1.9000)}

hadley · 2019-03-04T16:22:07Z

Another thought: maybe we need to more explicitly talk about "container" types, and then vec_rip() would only apply to a container.

hadley · 2019-03-05T15:38:33Z

I think this operation should error if you're attempting to index into a vector that is not a recursive, e.g.:

vec_recursive <- function(x) UseMethod("vec_recursive")
vec_recursive.list_of <- function(x) TRUE
vec_recursive.data.frame <- function(x) TRUE
vec_recursive.default <- function(x) {
  if (is_bare_list(x)) {
    TRUE
  } else if (is_vector(x)) {
    FALSE
  } else {
    stop("Non-vector")
  }
}

vec_extract <- function(x, i) {
  stopifnot(length(i) == 1L)
  stopifnot(vec_recursive(x))
  
  vec_slice(x, i)[[1]]
}

This clearly distinguishes it from vec_slice() and I don't think the using [[ to strip attributes is important (or commonly used) enough to be worth mimicking.

Note that vec_extract() would always preserve the invariants of the container class, i.e. x %>% vec_extract(i) %>% vec_size() equals vec_size(x) if x is a data frame, and x %>% vec_extract(i) %>% vec_type() equals x %>% attr("ptype") is x is a list_of.

lionel- · 2019-03-06T15:41:46Z

I think vec_recursive() is not sufficient to determine atomicity. An S3 list can be:

A recursive vector type, e.g. list_of and data frames.
An atomic vector type, e.g. record vector types like dates.
An atom type, e.g. lm objects.

If vec_recursive() returns TRUE, we know we have a vector, but if it returns FALSE we don't know for sure we have an atom.

We might need a more general vec_kind() generic that would return one of:

"atom" for non vectors (the default for S3 lists and non-vector base types)
"atomic" (the default for S3 and bare atomics)
"recursive" (the default for bare lists and data frames)

hadley · 2019-03-06T20:54:18Z

Why does that distinction matter for indexing?

lionel- · 2019-03-06T21:04:54Z

I don't know if it matters for extraction (though vec_is_recursive() shouldn't throw an error with non-vectors to make recursive extraction possible). But that distinction is necessary for vec_is() / vec_assert() without arguments. Otherwise, vec_assert(model_fit) won't throw.

DavisVaughan · 2019-07-04T18:17:45Z

I'm running into the case where it would be nice to have the assignment form of this, [[<-. I do think whatever vctrs solution we come up with for that should work with recursive and non-recursive types. For instance, I might be filling list elements, or I might be filling double elements.

vec_assign() is obviously not what I need to be using here. I want some generic function that can [[<- a single element in both lists and double vectors.

library(vctrs)

out <- vec_init(list(), 3)
elt <- 2:3

# cant use vec_assign, of course
vec_assign(out, 1, elt)
#> No common type for `value` <integer> and `x` <list>.

# this is what i want
out[[1]] <- elt
out
#> [[1]]
#> [1] 2 3
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL

# but i also might "init" a dbl
out <- vec_init(double(), 3)
elt <- 2

# can use vec_assign
vec_assign(out, 1, 2)
#> [1]  2 NA NA

# and i can use this
out[[1]] <- 2
out
#> [1]  2 NA NA

Update) For data frames I think this needs to be able to replace a column, not a row.

romainfrancois · 2019-08-14T08:11:24Z

Looking into dplyr::summarise() for rowwise data frames, I'm going to need something like vec_rip()

lionel- · 2019-08-14T09:09:22Z

It'll probably be called vec_get(), but it's still not clear what the semantics should be.

@DavisVaughan suggested it would return rows as lists in case of a data frame, or as a vector of dimensionality n - 1 in case of an array (for instance a dimless vector if a matrix).

In that case, rowwise extraction in data frames becomes a recursive operation in the sense that you can dig deep into them by calling recursively [[ / vec_get(), in the same way you can dig deep into lists. Arrays become kinda recursive as well, though they can't return a different type.

Despite being recursive, this operation doesn't really connect to algorithms ordinarily used with recursive data structures such as flatten(). However, as Davis noted, vec_get(vec_get(x, i), j) on a 2d structure consistently retrieves the i, j element, which is interesting.

DavisVaughan · 2019-08-14T11:26:08Z

@romainfrancois, so vec_slice() wouldn't be enough for rowwise data frames? Could you share why not? Would returning the first row as a list like this be helpful? A use case might guide the discussion a bit

romainfrancois · 2019-08-15T12:40:29Z

Probably it's the whole rowwise() approach that makes it a problem. When we have a list column x, we want x to stand for an element of the list, not a length 1 list with the element

…_get() r-lib/vctrs#141

jennybc · 2019-08-15T14:03:55Z

When we have a list column x, we want x to stand for an element of the list, not a length 1 list with the element

This is what I was getting at way, way above:

A word I would like to insert into this discussion is "atom". As in, atomic vectors have atoms that are of the same type. I think maybe vec_rip() gets one atom? I don't have great words around it, but part of what makes list-columns hard for people is that it often feels like you're dealing with one extra layer of indexing.

DavisVaughan · 2019-08-15T15:39:17Z

What @romainfrancois does here has essentially the same implementation of what I used when attempting to reimplement flatten() with vctrs semantics.
tidyverse/dplyr@ebbd15c#diff-ccca386aac53cf0029fb15ebff8901d5R206

The vec_get() I used:
https://github.com/DavisVaughan/vctrs/blob/84e0800269805a89ee26c995d1d63ab3bcff2e8c/R/get.R#L5

Essentially you vec_slice() unless something is a list (but not a data frame), in which case you [[i]].

(This is different than what Lionel mentioned above about the other semantics I proposed)

DavisVaughan · 2019-11-12T14:30:46Z

I think that it would also be nice to have the equivalent of vec_chop() for whatever vec_get() ends up being.

So:

vec_slice(x, i) -> vec_chop(x, indices = NULL)

vec_get(x, pos) -> vec_*(x, positions = NULL)

positions would be NULL or a list of single positions like list(1, "x", 3)

It would be useful for vec_cast.list.data.frame() and vec_cast.list.default()

DavisVaughan changed the title ~~vctrs_rip()~~ vec_rip() Nov 26, 2018

hadley mentioned this issue Nov 28, 2018

pluck() converts zero-length objects to NULL tidyverse/purrr#480

Closed

lionel- mentioned this issue Mar 5, 2019

Should vec_equal() compare names? #221

Closed

lionel- added the op:slice label Mar 22, 2019

hadley added the type:container 👛 label Jun 3, 2019

romainfrancois added a commit to tidyverse/dplyr that referenced this issue Aug 15, 2019

temporary workaround for summarise_data_mask() vs rowwise() until vec…

ebbd15c

…_get() r-lib/vctrs#141

DavisVaughan changed the title ~~vec_rip()~~ vec_get() Oct 29, 2019

lionel- added the theory label Apr 20, 2020

mgirlich mentioned this issue Aug 10, 2020

reminder: pluck() converts empty elements to NULL mgirlich/tibblify#11

Closed

DavisVaughan mentioned this issue Sep 2, 2020

Add vec_slice2() and vec_assign2() #1228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vec_get() #141

vec_get() #141

DavisVaughan commented Nov 26, 2018 •

edited

Loading

romainfrancois commented Nov 26, 2018

romainfrancois commented Nov 26, 2018

hadley commented Nov 26, 2018

jennybc commented Nov 26, 2018

romainfrancois commented Nov 26, 2018

DavisVaughan commented Nov 26, 2018 •

edited

Loading

romainfrancois commented Nov 27, 2018

lionel- commented Nov 28, 2018

lionel- commented Nov 28, 2018 •

edited

Loading

lionel- commented Nov 28, 2018 •

edited

Loading

lionel- commented Dec 4, 2018

hadley commented Dec 4, 2018

hadley commented Jan 28, 2019

krlmlr commented Jan 28, 2019

hadley commented Mar 4, 2019

hadley commented Mar 5, 2019 •

edited

Loading

lionel- commented Mar 6, 2019 •

edited

Loading

hadley commented Mar 6, 2019

lionel- commented Mar 6, 2019

DavisVaughan commented Jul 4, 2019 •

edited

Loading

romainfrancois commented Aug 14, 2019

lionel- commented Aug 14, 2019

DavisVaughan commented Aug 14, 2019

romainfrancois commented Aug 15, 2019

jennybc commented Aug 15, 2019

DavisVaughan commented Aug 15, 2019

DavisVaughan commented Nov 12, 2019

vec_get() #141

vec_get() #141

Comments

DavisVaughan commented Nov 26, 2018 • edited Loading

romainfrancois commented Nov 26, 2018

romainfrancois commented Nov 26, 2018

hadley commented Nov 26, 2018

jennybc commented Nov 26, 2018

romainfrancois commented Nov 26, 2018

DavisVaughan commented Nov 26, 2018 • edited Loading

romainfrancois commented Nov 27, 2018

lionel- commented Nov 28, 2018

lionel- commented Nov 28, 2018 • edited Loading

lionel- commented Nov 28, 2018 • edited Loading

lionel- commented Dec 4, 2018

hadley commented Dec 4, 2018

hadley commented Jan 28, 2019

krlmlr commented Jan 28, 2019

hadley commented Mar 4, 2019

hadley commented Mar 5, 2019 • edited Loading

lionel- commented Mar 6, 2019 • edited Loading

hadley commented Mar 6, 2019

lionel- commented Mar 6, 2019

DavisVaughan commented Jul 4, 2019 • edited Loading

romainfrancois commented Aug 14, 2019

lionel- commented Aug 14, 2019

DavisVaughan commented Aug 14, 2019

romainfrancois commented Aug 15, 2019

jennybc commented Aug 15, 2019

DavisVaughan commented Aug 15, 2019

DavisVaughan commented Nov 12, 2019

DavisVaughan commented Nov 26, 2018 •

edited

Loading

DavisVaughan commented Nov 26, 2018 •

edited

Loading

lionel- commented Nov 28, 2018 •

edited

Loading

lionel- commented Nov 28, 2018 •

edited

Loading

hadley commented Mar 5, 2019 •

edited

Loading

lionel- commented Mar 6, 2019 •

edited

Loading

DavisVaughan commented Jul 4, 2019 •

edited

Loading