Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table conversion turns the class of <expr> column into <list> column #4040

Closed
JodyStats opened this issue Nov 12, 2019 · 2 comments
Closed
Labels
non-atomic column e.g. list columns, S4 vector columns

Comments

@JodyStats
Copy link

JodyStats commented Nov 12, 2019

library(data.table)
library(dplyr)
#> 
#> Attaching package: "dplyr"
#> The following objects are masked from "package:data.table":
#> 
#>     between, first, last
#> The following objects are masked from "package:stats":
#> 
#>     filter, lag
#> The following objects are masked from "package:base":
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
library(drake)

plan <- drake::drake_plan(
  expr = dat[, col := val]
)

The drake_plan returns a tibble to be manipulated. The problem is that one of the variable being "expr" class and when I convert it to data.table the "expr" class turns into "list" class. I do not intend to manipulate this "expr" column and it"s required for downstream processes, its "expr" class must be unchanged after being converted into data.table from tibble. Is this a bug that can be fixed? Otherwise, what is the solution and best practice? Thank you in advance.

# command column class of <expr>
plan
#> # A tibble: 1 x 2
#>   target command              
#>   <chr>  <expr>               
#> 1 expr   dat[, `:=`(col, val)]

# command column class of <expr>
plan %>% mutate(col = "val")
#> # A tibble: 1 x 3
#>   target command               col  
#>   <chr>  <expr>                <chr>
#> 1 expr   dat[, `:=`(col, val)] val

# command column class of <list>
plan <- data.table(plan)[, col := "val"]
plan
#>    target command    col
#>    <char>  <list> <char>
#> 1:   expr  <call>    val

# command column class of <list>
as_tibble(plan)
#> # A tibble: 1 x 3
#>   target command    col  
#>   <chr>  <list>     <chr>
#> 1 expr   <language> val

Created on 2019-11-12 by the reprex package (v0.3.0)

@TysonStanley
Copy link
Member

This looks like expected behavior. @MichaelChirico just made a PR regarding rbindlist() being able to handle columns of type <expr> but it sounds like it isn't going to be something that changes in data.table().

However, with that in mind, consider the following information:

typeof(plan$command)
#> [1] "list"
typeof(plan$command[[1]])
#> [1] "language"
typeof(data.table(plan)[, command])
#> [1] "list"
typeof(data.table(plan)[, command][[1]])
#> [1] "language"

So the unchanged object plan has the same types--a list column with the first object (and only object) being a "language" object (or expression)--as when you change it to a data table. This means it should behave similarly, I believe, at least regarding the type of the column.

Is drake running into issues after it is changed to a data table?

@jangorecki
Copy link
Member

jangorecki commented May 18, 2020

@JodyStats If expression would keep its elements type stable (language/symbol), then that would make perfect sense, but expression mixes other types, evaluated atomic scalars.
I was going to provide some minimal example, but it seems that base data.frame does not even let you create data.frame having expression column. To make it work you have to trick it and add column later on, not in the constructor function, but obviously that leads to a problems...

d=data.frame(a=1)
d$b = expression(1+2)
d
#  a                 b
#1 1 expression(1 + 2)
str(d)
#'data.frame':	1 obs. of  2 variables:
# $ a: num 1
# $ b:  expression(1 + 2)
rbind(d,d)
#  a                        b
#1 1 expression(1 + 2, 1 + 2)
#2 1                     <NA>
#Warning message:
#In format.data.frame(if (omit) x[seq_len(n0), , drop = FALSE] else x,  :
#  corrupt data frame: columns will be truncated or padded with NAs

I would suggest to wrap expression in a list if you want to keep it in data.[frame|table] column. If that is causing any other issues (performance time, memory) than just adapting the code to manipulate it, please let us know, it will be some argument to handle that specially.
Closing for now.
In future, please provide minimal example that does not require to install extra, unrelated packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

No branches or pull requests

3 participants