Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the class of objects inherited from data.table in list column may get changed #4324

Closed
HJAllen opened this issue Mar 27, 2020 · 12 comments · Fixed by #4354
Closed

the class of objects inherited from data.table in list column may get changed #4324

HJAllen opened this issue Mar 27, 2020 · 12 comments · Fixed by #4354
Assignees
Labels
bug non-atomic column e.g. list columns, S4 vector columns
Milestone

Comments

@HJAllen
Copy link

HJAllen commented Mar 27, 2020

I am developing a class inheriting data.table but having trouble with the derived class being dropped. As can be seen in the example, when assigning the derived class to a list column, the class is initially retained but is subsequently dropped. Interestingly, it is only dropped for elements upon which an evaluation is performed.

I found one SO thread discussing similar but stated the issue had been resolved in 1.9.4. I am using 1.12.8.
https://stackoverflow.com/questions/24881842/deriving-a-new-class-from-data-table

Is this expected behavior?

# Minimal reproducible example

> tmp <- data.table(A = 1:5, B = letters[1:5])
> 
> tmp[, obj := .(.({
+   data.table(C = 1:5) %>%
+     setattr("class", c("newclass", class(.)))
+ })), by = A]
> tmp
   A B        obj
1: 1 a <newclass>
2: 2 b <newclass>
3: 3 c <newclass>
4: 4 d <newclass>
5: 5 e <newclass>
> 
> str(tmp)
Classes ‘data.table’ and 'data.frame':	5 obs. of  3 variables:
 $ A  : int  1 2 3 4 5
 $ B  : chr  "a" "b" "c" "d" ...
 $ obj:List of 5
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, ".internal.selfref")=<externalptr> 
> 
> class(tmp[1, obj[[1]]])
[1] "data.table" "data.frame"```
> tmp
   A B          obj
1: 1 a <data.table>
2: 2 b   <newclass>
3: 3 c   <newclass>
4: 4 d   <newclass>
5: 5 e   <newclass>

# Output of sessionInfo()

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] exoR_0.0.0.9000   data.table_1.12.8 purrr_0.3.3       tidyr_1.0.2       ggplot2_3.3.0    
 [6] dplyr_0.8.5       lubridate_1.7.4   LHfuns_0.0.0.9000 gfuns_0.0.0.9000  magrittr_1.5     

loaded via a namespace (and not attached):
 [1] xfun_0.12         zoo_1.8-7         tidyselect_1.0.0  remotes_2.1.1     pander_0.6.3     
 [6] lattice_0.20-40   RMariaDB_1.0.8    colorspace_1.4-1  vctrs_0.2.4       testthat_2.3.2   
[11] usethis_1.5.1     rlang_0.4.5       pkgbuild_1.0.6    pillar_1.4.3      glue_1.3.2       
[16] withr_2.1.2       DBI_1.1.0         bit64_0.9-7       readxl_1.3.1      sessioninfo_1.1.1
[21] lifecycle_0.2.0   stringr_1.4.0     cellranger_1.1.0  munsell_0.5.0     gtable_0.3.0     
[26] devtools_2.2.2    memoise_1.1.0     knitr_1.28        strucchange_1.5-2 callr_3.4.2      
[31] ps_1.3.2          fansi_0.4.1       Rcpp_1.0.3        scales_1.1.0      backports_1.1.5  
[36] desc_1.2.0        pkgload_1.0.2     fs_1.3.2          bit_1.1-15.2      hms_0.5.3        
[41] packrat_0.5.0     digest_0.6.25     stringi_1.4.6     processx_3.4.2    grid_3.5.3       
[46] rprojroot_1.3-2   cli_2.0.2         tools_3.5.3       sandwich_2.5-1    tibble_2.1.3     
[51] crayon_1.3.4      pkgconfig_2.0.3   ellipsis_0.3.0    xml2_1.2.5        prettyunits_1.1.1
[56] roxygen2_7.1.0    assertthat_0.2.1  rstudioapi_0.11   R6_2.4.1          compiler_3.5.3   
@jangorecki
Copy link
Member

Thanks for filling out the report.
Are all those attached and loaded packages required to reproduce your issue? if not, then your example is definietly not a minimal one. It is also not obvious how they have been loaded or attached, as that part of the code seems to be missing.

@HJAllen
Copy link
Author

HJAllen commented Mar 27, 2020

Sorry. I ran it again without loading everything. Same result.

Code:

library(data.table)

tmp <- data.table(A = 1:5, B = letters[1:5])
tmp[, obj := .(.({
  x <- data.table(C = 1:5)
  setattr(x, "class", c("newclass", class(x)))
  x
})), by = A]

tmp

str(tmp)

class(tmp[1, obj[[1]]])

tmp

sessionInfo()

Result:

> tmp <- data.table(A = 1:5, B = letters[1:5])
> tmp[, obj := .(.({
+   x <- data.table(C = 1:5)
+   setattr(x, "class", c("newclass", class(x)))
+   x
+ })), by = A]
> 
> tmp
   A B        obj
1: 1 a <newclass>
2: 2 b <newclass>
3: 3 c <newclass>
4: 4 d <newclass>
5: 5 e <newclass>
> 
> str(tmp)
Classes ‘data.table’ and 'data.frame':	5 obs. of  3 variables:
 $ A  : int  1 2 3 4 5
 $ B  : chr  "a" "b" "c" "d" ...
 $ obj:List of 5
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
  ..$ :Classes ‘newclass’, ‘data.table’ and 'data.frame':	5 obs. of  1 variable:
  .. ..$ C: int  1 2 3 4 5
  .. ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, ".internal.selfref")=<externalptr> 
> 
> class(tmp[1, obj[[1]]])
[1] "data.table" "data.frame"
> 
> tmp
   A B          obj
1: 1 a <data.table>
2: 2 b   <newclass>
3: 3 c   <newclass>
4: 4 d   <newclass>
5: 5 e   <newclass>
> 
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3    

@shrektan
Copy link
Member

shrektan commented Mar 28, 2020

If the element inside the list is also a data.table object, the other class attributes will be removed when the element is touched inside data.table::[. It looks like a bug to me.

The mini repro example

library(data.table)
val_dt <- data.table(COL = 'dt')
class(val_dt) <- c('aaa', class(val_dt))
val_df <- data.frame(COL = 'df', stringsAsFactors = FALSE)
class(val_df) <- c('aaa', class(val_df))
dt <- data.table(
  A = 1:2,
  B = list(val_dt, val_dt),
  C = list(val_df, val_df)
)
dt
#>        A          B          C
#>    <int>     <list>     <list>
#> 1:     1 <aaa[1x1]> <aaa[1x1]>
#> 2:     2 <aaa[1x1]> <aaa[1x1]>
dt[1, B[[1]]]
#>       COL
#>    <char>
#> 1:     dt
dt[1, C[[1]]]
#>   COL
#> 1  df
dt
#>        A                 B          C
#>    <int>            <list>     <list>
#> 1:     1 <data.table[1x1]> <aaa[1x1]>
#> 2:     2        <aaa[1x1]> <aaa[1x1]>

Created on 2020-03-28 by the reprex package (v0.3.0)

@shrektan shrektan added the bug label Mar 28, 2020
@shrektan shrektan changed the title Attributes not preserved in list column element inherited from data.table in list column may lost attributes Mar 28, 2020
@shrektan shrektan self-assigned this Mar 31, 2020
@shrektan
Copy link
Member

shrektan commented Apr 4, 2020

Just a note to myself, there're actually two issues, considering DT[,j] and when j returns a data.table:

  1. j will be forced to share the same class as DT(a fix for [R-Forge #5296] Class lost when selecting columns #64), but it should do that only when DT is a "plain" data.table (no other classes)

  2. j actually returns the reference of the stored data.table, when j is a list of data.tables, but it should copy the result.


After a 2nd though, the no.2 point may be expected behavior, as the user should be aware of that. I'm looking forward to your inputs.

@shrektan shrektan changed the title element inherited from data.table in list column may lost attributes the class of objects inherited from data.table in list column may get changed Apr 5, 2020
@jangorecki jangorecki added the non-atomic column e.g. list columns, S4 vector columns label Apr 6, 2020
@HJAllen
Copy link
Author

HJAllen commented Apr 14, 2020

Any progress on this? I see shrektan noted "looking forward to your inputs". Does this mean you are waiting for me or someone on the data.table team? I'm not intending to be pushy but am having to work around the issue in a package under development and my life would be much easier if I could rely on the attribute being retained.

@shrektan
Copy link
Member

@HJAllen The issue you bring up should have been addressed by #4354 but I think the PR needs @mattdowle 's review before merging it. In the meantime, you could use the PR version via remotes::install_github('rdatatable/data.table#4354').

@HJAllen
Copy link
Author

HJAllen commented Apr 15, 2020

Thanks! I will give it a try.

@HJAllen
Copy link
Author

HJAllen commented Apr 24, 2020

I confirm that the fix solves the problem. Thanks!

@HJAllen HJAllen closed this as completed Apr 24, 2020
@jangorecki
Copy link
Member

lets not close this issue till the fix is not yet merged to master branch

@jangorecki jangorecki reopened this Apr 24, 2020
@HJAllen
Copy link
Author

HJAllen commented May 26, 2020

Sorry to bother, but any idea when this might get merged?

@tdeenes
Copy link
Member

tdeenes commented Nov 18, 2020

@jangorecki Any updates on why the fix by @shrektan can not be merged? I just ran into the same problem in a real-world scenario. I see there are some merge conflicts but all of them is trivial to resolve (willing to help if required).

@jangorecki
Copy link
Member

@tdeenes AFAIK the fix can be merged, it is just that we have a long queue of open PRs.

@jangorecki jangorecki added this to the 1.13.5 milestone Nov 18, 2020
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants