Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review vec_is_list() check on vec_proxy() #1208

Open
DavisVaughan opened this issue Aug 5, 2020 · 5 comments
Open

Review vec_is_list() check on vec_proxy() #1208

DavisVaughan opened this issue Aug 5, 2020 · 5 comments
Labels

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Aug 5, 2020

@lionel- and I discussed the concept of a list-rcrd in more detail. Essentially this is a list that has some vectorized attributes that can get sliced alongside the list. We determined that one possible implementation of this idea is (in my own words):

  • The list itself is used as the core data structure.

  • The special vectorized attributes are stored as an attribute of the list. The easiest thing to do would be to store them as a named list in a single attribute called something like vctrs:::vectorized.

  • The list explicitly inherits from "list", along with whatever extra class is used on top of that. Say its vctrs_list_rcrd.

  • The vec_proxy.vctrs_list_rcrd method should return a data frame. The first column is a list-col containing the list. The remaining columns are the vectorized attributes.

This would currently work in vctrs, except for the fact that vec_is_list() has the following check: If x inherits explicitly from "list", then vec_proxy(x) must also be a list. This should currently fail here (although I think it doesn't because of a bug).

I don't remember exactly why we have that restriction, but it seems reasonable that we might try to remove it to allow these list-rcrds to work.

One possible implementation follows, but as @lionel- mentioned to me, the important part is that this doesn't require any vctrs specific helper, and can live completely outside vctrs.

library(vctrs)

new_list_rcrd <- function(x, vectorized, ..., class = character()) {
  structure(
    x, 
    `vctrs:::vectorized` = vectorized, 
    ..., 
    class = c(class, "vctrs_list_rcrd", "list")
  )
}

vec_proxy.vctrs_list_rcrd <- function(x, ...) {
  vectorized <- attr(x, "vctrs:::vectorized")
  x <- unclass(x)
  cols <- list(`vctrs:::data` = x)
  cols <- c(cols, vectorized)
  new_data_frame(cols)
}

vec_restore.vctrs_list_rcrd <- function(x, to, ...) {
  data <- x[["vctrs:::data"]]
  x[["vctrs:::data"]] <- NULL
  new_list_rcrd(data, x)
}

x <- new_list_rcrd(
  x = list(a = 1:3, b = 2:5), 
  vectorized = list(
    special1 = c(TRUE, FALSE),
    special2 = c("x", "y")
  )
)

x
#> $a
#> [1] 1 2 3
#> 
#> $b
#> [1] 2 3 4 5
#> 
#> attr(,"vctrs:::vectorized")
#> attr(,"vctrs:::vectorized")$special1
#> [1]  TRUE FALSE
#> 
#> attr(,"vctrs:::vectorized")$special2
#> [1] "x" "y"
#> 
#> attr(,"class")
#> [1] "vctrs_list_rcrd" "list"

vec_slice(x, c(1, 1, 2))
#> $a
#> [1] 1 2 3
#> 
#> $a
#> [1] 1 2 3
#> 
#> $b
#> [1] 2 3 4 5
#> 
#> attr(,"vctrs:::vectorized")
#> attr(,"vctrs:::vectorized")$special1
#> [1]  TRUE  TRUE FALSE
#> 
#> attr(,"vctrs:::vectorized")$special2
#> [1] "x" "x" "y"
#> 
#> attr(,"class")
#> [1] "vctrs_list_rcrd" "list"

Created on 2020-08-05 by the reprex package (v0.3.0)

@lionel-
Copy link
Member

lionel- commented Aug 5, 2020

Probably best to use fields instead of vectorised so it's consistent with new_rcrd().

@lionel-
Copy link
Member

lionel- commented Aug 5, 2020

Summary of the main ideas:

  • Explicit inheritance from one of the base types is indicative of the storage type. In particular, the storage cannot be proxied. Inheriting from "list" when the object typeof() isn't "list" is an internal error.

  • Explicit inheritance also indicates the other attributes of that type: numeric for "integer", listness for "list", etc. In particular, vec_is_list() is only true for objects that explictly inherit from "list".

  • We can still proxy lists so that they have vectorised attributes. But unlike record vectors, we don't allow proxied storage. We require the main data structure to be the list storage, the proxy just provides additional vectorised attributes.

  • As a consequence, [[-based operations don't need to take the proxy. Only [-based operations need the proxy (and probably need to restore afterwards). In practice, this means that we don't take the proxy when we work on elements of the list. When we work on the list itself, we take the proxy and use generic vctrs operations (because the proxy might be a data frame).

@lionel-
Copy link
Member

lionel- commented Aug 5, 2020

Regarding how to implement [[<- and $<- operations in vctrs we could do it in the following way.

For x[[1]] <- y with x a list, we just do vec_slice(x, 1) <- list(y):

  • Converts list(y) to x. This allows the class of x to initialise vectorised attributes for y.
  • Uses vec_slice<- as usual

It's going to be inefficient to update lists elements by elements, so it's always preferable to vectorise the assignment.

@lionel-
Copy link
Member

lionel- commented Aug 17, 2020

Regarding a helper class for record lists and record data frames, I think vctrs::new_rcrd() should gain a data = NULL argument. When supplied, the fields would be stored in an attribute.

@lionel-
Copy link
Member

lionel- commented Aug 26, 2020

#1226 adds a list record helper class for unit tests.

#1228 implements [[ and [[<- as outlined in #1208 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants