Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected return from transmute() on nested data #20

Closed
leungi opened this issue Jul 28, 2019 · 9 comments
Closed

unexpected return from transmute() on nested data #20

leungi opened this issue Jul 28, 2019 · 9 comments

Comments

@leungi
Copy link

leungi commented Jul 28, 2019

Reprex below.

Issue: select() produced desired result, but not transmute(). The extra list(...) generated in the expression from table.express may be the issue.

  suppressMessages(library(tidyverse))
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'ggplot2' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'readr' was built under R version 3.5.2
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'stringr' was built under R version 3.5.3
#> Warning: package 'forcats' was built under R version 3.5.3
  suppressMessages(library(table.express))
  suppressMessages(library(data.table))
#> Warning: package 'data.table' was built under R version 3.5.3
  
iris %>% 
  nest(-Species) %>% 
  unnest(data)
#> # A tibble: 150 x 5
#>    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>    <fct>          <dbl>       <dbl>        <dbl>       <dbl>
#>  1 setosa           5.1         3.5          1.4         0.2
#>  2 setosa           4.9         3            1.4         0.2
#>  3 setosa           4.7         3.2          1.3         0.2
#>  4 setosa           4.6         3.1          1.5         0.2
#>  5 setosa           5           3.6          1.4         0.2
#>  6 setosa           5.4         3.9          1.7         0.4
#>  7 setosa           4.6         3.4          1.4         0.3
#>  8 setosa           5           3.4          1.5         0.2
#>  9 setosa           4.4         2.9          1.4         0.2
#> 10 setosa           4.9         3.1          1.5         0.1
#> # ... with 140 more rows

irisDT <- iris %>% 
  nest(-Species) %>% 
  as.data.table()

irisDT[, unlist(data, recursive=FALSE), by = 'Species']
#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   1:    setosa          5.1         3.5          1.4         0.2
#>   2:    setosa          4.9         3.0          1.4         0.2
#>   3:    setosa          4.7         3.2          1.3         0.2
#>   4:    setosa          4.6         3.1          1.5         0.2
#>   5:    setosa          5.0         3.6          1.4         0.2
#>  ---                                                            
#> 146: virginica          6.7         3.0          5.2         2.3
#> 147: virginica          6.3         2.5          5.0         1.9
#> 148: virginica          6.5         3.0          5.2         2.0
#> 149: virginica          6.2         3.4          5.4         2.3
#> 150: virginica          5.9         3.0          5.1         1.8

irisDT %>% 
  group_by(Species) %>% 
  select(unlist(data, recursive=FALSE))
#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   1:    setosa          5.1         3.5          1.4         0.2
#>   2:    setosa          4.9         3.0          1.4         0.2
#>   3:    setosa          4.7         3.2          1.3         0.2
#>   4:    setosa          4.6         3.1          1.5         0.2
#>   5:    setosa          5.0         3.6          1.4         0.2
#>  ---                                                            
#> 146: virginica          6.7         3.0          5.2         2.3
#> 147: virginica          6.3         2.5          5.0         1.9
#> 148: virginica          6.5         3.0          5.2         2.0
#> 149: virginica          6.2         3.4          5.4         2.3
#> 150: virginica          5.9         3.0          5.1         1.8

irisDT %>% 
  group_by(Species) %>% 
  transmute(unlist(data, recursive=FALSE))
#>        Species                          V1
#>  1:     setosa 5.1,4.9,4.7,4.6,5.0,5.4,...
#>  2:     setosa 3.5,3.0,3.2,3.1,3.6,3.9,...
#>  3:     setosa 1.4,1.4,1.3,1.5,1.4,1.7,...
#>  4:     setosa 0.2,0.2,0.2,0.2,0.2,0.4,...
#>  5: versicolor 7.0,6.4,6.9,5.5,6.5,5.7,...
#>  6: versicolor 3.2,3.2,3.1,2.3,2.8,2.8,...
#>  7: versicolor 4.7,4.5,4.9,4.0,4.6,4.5,...
#>  8: versicolor 1.4,1.5,1.5,1.3,1.5,1.3,...
#>  9:  virginica 6.3,5.8,7.1,6.3,6.5,7.6,...
#> 10:  virginica 3.3,2.7,3.0,2.9,3.0,3.0,...
#> 11:  virginica 6.0,5.1,5.9,5.6,5.8,6.6,...
#> 12:  virginica 2.5,1.9,2.1,1.8,2.2,2.1,...

irisDT %>% 
  start_expr() %>% 
  group_by(Species) %>% 
  transmute(unlist(data, recursive=FALSE))
#> .DT_[, list(unlist(data, recursive = FALSE)), by = list(Species)]

Created on 2019-07-28 by the reprex package (v0.2.1)

@asardaes
Copy link
Owner

That is correct, transmute ends up being "too simple-minded" for some cases, since it always puts everything inside a call to list. I can't think of an easy way to identify whether list should be used or not, although I don't think there's anything wrong with using select for this.

transmute could have a parameter .enlist that, when set to FALSE, leaves the expression as given. It wouldn't be so nice, but I'm not sure if there's a better option.

@leungi
Copy link
Author

leungi commented Jul 30, 2019

Noted.

I'm leaning towards the .enlist option, for use case as shown below, which is a common workflow in tidyverse - mutate() + map() + unnest().

  suppressMessages(library(tidyverse))
  suppressMessages(library(data.table))
  
iris %>% 
  nest(-Species) %>% 
  unnest(data)
#> # A tibble: 150 x 5
#>    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>    <fct>          <dbl>       <dbl>        <dbl>       <dbl>
#>  1 setosa           5.1         3.5          1.4         0.2
#>  2 setosa           4.9         3            1.4         0.2
#>  3 setosa           4.7         3.2          1.3         0.2
#>  4 setosa           4.6         3.1          1.5         0.2
#>  5 setosa           5           3.6          1.4         0.2
#>  6 setosa           5.4         3.9          1.7         0.4
#>  7 setosa           4.6         3.4          1.4         0.3
#>  8 setosa           5           3.4          1.5         0.2
#>  9 setosa           4.4         2.9          1.4         0.2
#> 10 setosa           4.9         3.1          1.5         0.1
#> # ... with 140 more rows

irisDT <- iris %>% 
  nest(-Species) %>% 
  as.data.table()

irisDT[, unlist(data, recursive=FALSE), by = 'Species']
#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   1:    setosa          5.1         3.5          1.4         0.2
#>   2:    setosa          4.9         3.0          1.4         0.2
#>   3:    setosa          4.7         3.2          1.3         0.2
#>   4:    setosa          4.6         3.1          1.5         0.2
#>   5:    setosa          5.0         3.6          1.4         0.2
#>  ---                                                            
#> 146: virginica          6.7         3.0          5.2         2.3
#> 147: virginica          6.3         2.5          5.0         1.9
#> 148: virginica          6.5         3.0          5.2         2.0
#> 149: virginica          6.2         3.4          5.4         2.3
#> 150: virginica          5.9         3.0          5.1         1.8

irisDT[, unnest(.SD, data)]
#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   1:    setosa          5.1         3.5          1.4         0.2
#>   2:    setosa          4.9         3.0          1.4         0.2
#>   3:    setosa          4.7         3.2          1.3         0.2
#>   4:    setosa          4.6         3.1          1.5         0.2
#>   5:    setosa          5.0         3.6          1.4         0.2
#>  ---                                                            
#> 146: virginica          6.7         3.0          5.2         2.3
#> 147: virginica          6.3         2.5          5.0         1.9
#> 148: virginica          6.5         3.0          5.2         2.0
#> 149: virginica          6.2         3.4          5.4         2.3
#> 150: virginica          5.9         3.0          5.1         1.8

## loading table.express renders data.table code unusable
suppressMessages(library(table.express))

irisDT[, unnest(.SD, data)]
#> Error in .subset(x, j): invalid subscript type 'list'

irisDT %>% 
  select(unnest(.SD, data))
#> Error in .subset(x, j): invalid subscript type 'list'

irisDT %>% 
  group_by(Species) %>%
  select(unlist(data, recursive=FALSE))
#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   1:    setosa          5.1         3.5          1.4         0.2
#>   2:    setosa          4.9         3.0          1.4         0.2
#>   3:    setosa          4.7         3.2          1.3         0.2
#>   4:    setosa          4.6         3.1          1.5         0.2
#>   5:    setosa          5.0         3.6          1.4         0.2
#>  ---                                                            
#> 146: virginica          6.7         3.0          5.2         2.3
#> 147: virginica          6.3         2.5          5.0         1.9
#> 148: virginica          6.5         3.0          5.2         2.0
#> 149: virginica          6.2         3.4          5.4         2.3
#> 150: virginica          5.9         3.0          5.1         1.8

@asardaes
Copy link
Owner

Are you sure that is the example you meant to post?

@leungi
Copy link
Author

leungi commented Jul 30, 2019

I extracted out the troublesome piece - unnest(), using the iris data set.

The actual application is something like this - nested model in tibble.

@asardaes
Copy link
Owner

Ok. Yes, I think it's worth adding the parameter, it could also work for cases when you want to calculate something that should return a simple vector instead of a 1-column data.table.

asardaes added a commit that referenced this issue Jul 31, 2019
@asardaes asardaes closed this as completed Aug 1, 2019
@leungi
Copy link
Author

leungi commented Aug 1, 2019

Thanks for the prompt update, as usual!

@leungi
Copy link
Author

leungi commented Aug 2, 2019

Tested and worked as designed, but I'm still perplexed why this fails.

suppressMessages(library(data.table))
suppressMessages(library(tidyverse))

irisDT <- as.data.table(iris) %>%
  .[, nest(.SD), by = Species]

# works
irisDT[, unnest(.SD, data)]
#>        Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   1:    setosa          5.1         3.5          1.4         0.2
#>   2:    setosa          4.9         3.0          1.4         0.2
#>   3:    setosa          4.7         3.2          1.3         0.2
#>   4:    setosa          4.6         3.1          1.5         0.2
#>   5:    setosa          5.0         3.6          1.4         0.2
#>  ---                                                            
#> 146: virginica          6.7         3.0          5.2         2.3
#> 147: virginica          6.3         2.5          5.0         1.9
#> 148: virginica          6.5         3.0          5.2         2.0
#> 149: virginica          6.2         3.4          5.4         2.3
#> 150: virginica          5.9         3.0          5.1         1.8

## loading table.express renders data.table code unusable
suppressMessages(library(table.express))

irisDT %>%
  start_expr() %>%
  transmute(unnest(.SD, data), .enlist = FALSE)
#> .DT_[, unnest(.SD, data)]

# same expression as data.table but fails
irisDT %>%
  transmute(unnest(.SD, data), .enlist = FALSE)
#> Error in .subset(x, j): invalid subscript type 'list'

Created on 2019-08-02 by the reprex package (v0.2.1)

@asardaes
Copy link
Owner

asardaes commented Aug 2, 2019

That is annoying...

Apparently, the data.frame method for tidyr::unnest has a step with the call: dplyr::transmute(dplyr::ungroup(data), !!!quos). Before you load table.express, transmute dispatches to the data.frame method from dplyr and it works as usual. After loading table.express, a new data.table method for transmute is registered, and that is now chosen within unnest, and that doesn't work, although I also don't know exactly why.

If you execute irisDT[, unnest(.SD, data)] after loading table.express you'll notice that it also stops working. If you do options(datatable.verbose = TRUE), you'll see something starting with cedta decided 'tidyr' wasn't data.table aware. Here is call stack with [[1L]] applied:. I don't know the details, but indeed any package that doesn't explicitly import data.table into its namespace cannot execute data.table operations.

That is very unfortunate because it means that any package that internally uses any tidyverse packages without importing data.table will probably break as soon as table.express registers data.table-specific methods for the dplyr generics, and I don't think there's anything I can do about it. I don't know if data.table does special things internally and therefore has that requirement that any package that wants to use it must import it, or if it's an R thing.

I'll open a new issue and see if I can figure a workaround.

@leungi
Copy link
Author

leungi commented Aug 4, 2019

2e0ed10 solves it.

I noticed as well that part of data.table stops working after loading table.express, as per my example. I'l check to ensure that future bug reports are not due to this, and tracked in #21 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants