-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proper nest/unnest functions #3672
Comments
Re the mysterious output, yeah that does look odd (having As far as I know, nest and unnest are feasible in data.table already, though there is a proposal to add a fast unnest: #2146 (Wrapping in |
sure you can simply use
and this seems to work
but the rest seems to break down the list-column format. Ideally, I would like to use |
wow but the nest() ticket has been open for more than 2 years! |
@randomgambit your problem is that you used assignment by reference on top of an aggregating call, it seems like data.table doesn't like it too much, this works : library(data.table)
mydf <- data.table(group = c(1,1,1,2,2,2),
val = c('hello', 'world','hello', 'world','hello', 'world'),
col = c(1,2,3,4,5,6))
mydf <- mydf[, .(listcol=list(data.table(val, col))), by=group]
mydf [, newval := group + 1]
mydf
#> group listcol newval
#> 1: 1 <data.table> 2
#> 2: 2 <data.table> 3 This also works, with the strange effect that mydf needs to called twice to be printed (bug ?) :
We could create a function
As this function doesn't make sense outside of Could you be more precise about what you would like to do with As for unnest it seems that we can get help from tidyr without any conversion to tibble. It might be possible to get the best of both worlds after all!
|
This is expected behavior since at least version 1.9.6; see here for reference. |
Ah that is pretty great. As for
Here is in my opinion what makes
Here, by just looking at the R console I can see that myreg is a I would love being able to encapsulate DTs or tibbles in a DT and get the same functionality. But this requires proper Here, it is important to be able to Thanks! |
The only real difference I see in the data.table approach (below what I would do...) is the print doesn't show the size of the objects in the list column. library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following object is masked from 'package:purrr':
#>
#> transpose
#> The following objects are masked from 'package:dplyr':
#>
#> between, coalesce, first, last
mydf <- tibble(group = c(1,1,1,2,2,2),
val = c('hello', 'world','hello', 'world','hello', 'world'),
col = c(1,2,3,4,5,6))
myreg <- function(df){
lm(col ~ I(val), data = df)
}
mycoef <- function(obj){
obj %>% broom::tidy()
}
mydt <- setDT(mydf)
mydt[, .(data = list(.SD)), group] %>%
.[, {
myreg = lapply(data, FUN = function(x) myreg(x))
myoutput = lapply(myreg, FUN = function(x) mycoef(x))
list(data = data, myreg = myreg, myoutput = myoutput)
}, group]
#> group data myreg myoutput
#> 1: 1 <data.table> <lm> <tbl_df>
#> 2: 2 <data.table> <lm> <tbl_df> # OR this other way with more intermediary steps and unnesting at the end
mydt[, .(data = list(.SD)), group] %>%
.[, .(data, myreg = lapply(data, FUN = function(x) myreg(x))), group] %>%
.[, .(data, myreg, myoutput = lapply(myreg, FUN = function(x) mycoef(x))), group] %>%
unnest(myoutput)
#> group term estimate std.error statistic p.value
#> 1: 1 (Intercept) 2.00000e+00 1.000000 2.000000e+00 0.2951672
#> 2: 1 I(val)world -5.43896e-16 1.732051 -3.140185e-16 1.0000000
#> 3: 2 (Intercept) 5.00000e+00 1.414214 3.535534e+00 0.1754797
#> 4: 2 I(val)world 8.15844e-16 1.732051 4.710277e-16 1.0000000 Created on 2019-07-05 by the reprex package (v0.3.0) |
This is what I thought, we're so used at considering data.table and tidyverse as competing paradigms that we don't see how much they can work hand in hand, and would probably even more if more effort was invested in that direction :
I agree that it prints less nice. And I find the need to switch between assignment by copy and reference annoying, but to be fair it's quite nice and readable. I tried to use Now to
|
ah Daniel I hadn't seen your answer, well that gives more options! |
Very interesting. I am all if for a better integration between DT and tidyverse.
Is this related to #3682? Also, perhaps the best solution is to make |
Actually I was wrong to describe it as a bug, it's an unexported method so it's reasonable that it works on tibbles only, and actually the following works so no need to file anything (except it could support data.table as a special class, as it doesn't now) :
|
The trick to use |
got it. Also this syntax is quite interesting My understanding is that this only works because |
Calling it hack is not appropriate, each data table query that doesn't update by reference will copy. The hack about TRUE is that it does shallow copy, not really relevant here. Making copies on [ is a regular behaviour. |
Thanks Jan, I was thinking |
I think we can close this issue as duplicate of #2146. We don't need fast nest function because AFAIU it is a matter of wrapping in a list. Any comments? |
Hello,
I am trying to get the equivalent of
nest()
/unnest()
in data.table and I wonder if a new function would make more sense. Consider this little exampleI am not sure the same can be done with data.table. Look at the mysterious output I get after this. Am I missing something?
The text was updated successfully, but these errors were encountered: