Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.internal.selfref warning when using ..x notation #4876

Closed
thothal opened this issue Jan 15, 2021 · 3 comments
Closed

.internal.selfref warning when using ..x notation #4876

thothal opened this issue Jan 15, 2021 · 3 comments
Labels
programming parameterizing queries: get, mget, eval, env

Comments

@thothal
Copy link

thothal commented Jan 15, 2021

Issue

Consider the following reprex:

library(data.table)
f <- function(l) l[, t := 1]
d <- data.table(i = 1:2, l = list(data.table(x=1), data.table(x=2)))
k <- "l"
lapply(d[, ..k][, l], f)

will raise the following warning

In `[.data.table`(l, , `:=`(t, 1)) :
Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.

Same results for

lapply(d[, .SD, .SDcols = "l"][, l], f)

while the following code does not issue a warning but has the severe consequence that I have to hard code the column names:

lapply(d[, .(l)][, l], f)
lapply(d[, l], f)

Use Case

In my real code I want to use purrr::pmap on a data.table, but the data.table contains more columns than the function has arguments. I could add ... to capture the unnecessary columns (which I will do as a workaround now). But my attempt to filter the data.table via ..necessary_cols upfront, failed and it took me frankly quite a while to hunt the error down.

Thus, overall I am looking for a solution where I can select a dynamic subset of columns (..x would do the trick but does throw the warning) in conjunction with a lapply alike function.

Session Info

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

attached packages:
[1] data.table_1.13.2

loaded via a namespace (and not attached):
[1] compiler_3.6.3
@jangorecki jangorecki added the programming parameterizing queries: get, mget, eval, env label Jan 15, 2021
@jangorecki
Copy link
Member

Thank you for reporting. I tested solution in #4304 and it seems to address your use case well.

lapply(d[, .(.k), env=list(.k=k)][, l], f)
lapply(d[, .k, env=list(.k=k)], f)

Of course it does not qualify to close this issue.

@tlapak
Copy link
Contributor

tlapak commented Jan 15, 2021

I don't quite get what you're trying to do. Your reprex still has the column name hard coded. And it actually doesn't do anything.

I would argue that sticking data.tables into another data.table is kinda fundamentally not a good idea. The thing is that most ways of subsetting a data.table by columns return a new data.table with a copy of the selected columns. That's why you get the error. The only exception is when you use the format dt[, colname] which gives you just the column and that's just for consistency with data.frame. The fact that there's no copy here when the selected column is of type list is the actual bug I think.

Note that most of the ways you've tried actually don't do what I think you want it to do. The original data.table is not modified except in the lapply(d[, l], f) version:

lapply(d[, l], f)
d[1, l]

#[[1]]
#   x t
#1: 1 1

lapply(d[, ..k][, l], f)
d[1, l]
#[[1]]
#   x
#1: 1

You can currently use lapply(d[[k]], f):

lapply(d[[k]], f)
d[1, l]
#[[1]]
#   x t
#1: 1 1

I still maintain that this isn't a good idea and the fact that this does work might still be considered a bug.

@thothal
Copy link
Author

thothal commented Jan 18, 2021

Well, in the sake of getting a reprex I boiled the code down to a pretty useless example, which, however, reproduced the error.

To provide some more context, look at the (not so minimal) example, which illustrates how I encountered the code in the wildlife. If you see some major flaws with this design, I am always happy to learn and will happily adapt to a better design.

I was creating a (not so minimal) example which resembles a bit more the real use case and I saw the major flaw with my approach. As highlighted by @tlapak the code w/o the copy does not work as intended, as my_dat$data is not changed by reference (unless I would reassign my_dat$data in the observer, which would create yet another copy). However, in my real case (which is way bigger) this did not show up that clearly (if showed up at all). Thus, the need of copy was not that apparent.

This whole exercise of creating a minimal reproducible example did definitely help to gain a better understanding of how and when data is copied in R. Thus, the whole exercise was super useful.

However, I tend to disagree that a data.table in a data.table is per-se (or fundamentally) a bad thing. But you should be knowing when a (silent) copy is made to be sure that you operate on the the right data.

Thanks for the fruitful discussion. I agree that this is not a bug, but by design and due to my not so complete understanding of how the internals of data.table work, but this was clarified now. I will close this issue consequently. Thanks!

library(shiny)
library(data.table)
library(DT)
library(purrr)

dt_module_ui <- function(id, some_ui_param) {
   ns <- NS(id)
   tagList(h3(some_ui_param),
           actionButton(ns("add"), "Add random column"),
           actionButton(ns("reset"), "Reset to Default"),
           DT::dataTableOutput(ns("dat")))
}

dt_module_server <- function(id, default) {
   moduleServer(
      id,
      function(input, output, session) {
         my_dat <- reactiveValues(data = NULL, refresh = 0)
         
         observeEvent(input$reset, {
            ## we definitely should create a copy of the data.table
            ## this avoids the warning alltogether
            my_dat$data <- copy(default) 
         }, ignoreNULL = FALSE)
         
         observeEvent(input$add, {
            ## need the refresh trigger b/c shiny would not detect by ref changes
            my_dat$refresh <- my_dat$refresh + 1
            new_col <- paste0("col_", input$add)
            my_dat$data[, (new_col) := runif(.N)]
         })
         
         output$dat <- DT::renderDataTable({
            my_dat$refresh
            datatable(my_dat$data, options = list(dom = "t"), class = "compact display")
         })
      }
   )
}

my_module_params <- data.table(id = paste0("dat", 1:3),
                               some_ui_param = paste("Data Example", 1:3),
                               default = list(
                                  data.table(x = 1:10, src = "Data 1"),
                                  data.table(x = 11:20, src = "Data 2"),
                                  data.table(x = 21:30, src = "Data 3")
                               ))

args_ui <- intersect(formalArgs(dt_module_ui),
                     names(my_module_params))

args_server <- intersect(formalArgs(dt_module_server),
                         names(my_module_params))

ui <- fluidPage(
   pmap(my_module_params[, ..args_ui], dt_module_ui)
)

server <- function(input, output, session) {
   module_handlers <- pmap(my_module_params[, ..args_server], dt_module_server)
}

shinyApp(ui, server)

@thothal thothal closed this as completed Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
programming parameterizing queries: get, mget, eval, env
Projects
None yet
Development

No branches or pull requests

3 participants