Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:= changes address of a data table #1729

Closed
vspinu opened this issue Jun 5, 2016 · 5 comments · Fixed by #3545
Closed

:= changes address of a data table #1729

vspinu opened this issue Jun 5, 2016 · 5 comments · Fixed by #3545
Milestone

Comments

@vspinu
Copy link

vspinu commented Jun 5, 2016

I have narrowed down this to a specific dataset which I attach. It happens only when loaded from rds. If I read same data with fread the problem does not occur.

I am disguising rds file with txt extension in order to be able to upload it to github.

address_change <- function(df){
    cat("address before:", address(df), "\n")
    df[, c("new_var") := 1]
    cat("address after:", address(df), "\n")
}

tt <- readRDS("tt.txt")
address_change(tt)
"new_var" %in% names(tt)

I see consistent change of address on every trial:

> address_change(tt)
address before: 0x8aacf60 
address after: 0xbd35150 
> "new_var" %in% names(tt)
[1] FALSE

Any ideas of how to get around this without a copy?

devtools::session_info("data.table")
Session info ---------------------------------------------------------------------------------------------------------
 setting  value                                 
 version  R version 3.2.4 RC (2016-03-02 r70278)
 system   x86_64, linux-gnu                     
 ui       X11                                   
 language                                       
 collate  C                                     
 tz       Europe/Amsterdam                      
 date     2016-06-05                            

Packages -------------------------------------------------------------------------------------------------------------
 package    * version date       source                                
 data.table * 1.9.7   2016-06-01 Github (Rdatatable/data.table@6c12e25)

tt.txt

@jangorecki
Copy link
Member

jangorecki commented Jun 5, 2016

Are you aware of FAQ: 5.3 Reading data.table from RDS or RData file? I think it answers your question.

@arunsrinivasan arunsrinivasan mentioned this issue Jun 5, 2016
33 tasks
@vspinu
Copy link
Author

vspinu commented Jun 5, 2016

Thanks, I was not aware of alloc.col, but I don't see how that answers my question. The FAQ item states that the DT is re-allocated on next by reference operation with a warning. I don't see the warning.

@vspinu
Copy link
Author

vspinu commented Jun 5, 2016

That FAQ item indeed explains why the copy happens and why I don't see the change outside of a function call. I guess this issue then boils down to the missing warning.

@arunsrinivasan
Copy link
Member

Seems like the warning message was removed a long time ago.. but it needs to be there.

@MichaelChirico
Copy link
Member

verbose helps clear things up a bit:

address_change <- function(df){
    cat("df address before:", address(df), "\n")
    df[, c("new_var") := 1, verbose = TRUE]
    cat("df address after:", address(df), "\n")
}

tt <- readRDS("tt.txt")
cat('tt address before:', address(tt), '\n')
address_change(tt)
cat('tt address after:', address(tt), '\n')
"new_var" %in% names(tt)

Has output:

tt address before: 0x7f61c6b5fc70 
df address before: 0x7f61c6b5fc70 
Detected that j uses these columns: c 
.internal.selfref ptr is NULL. This is expected and normal for a data.table loaded from disk. If not, please report to data.table issue tracker.
Growing vector of column pointers from truelength  0  to  1025 . A shallow copy has been taken, see ?alloc.col. Only a potential issue if two variables point to the same data (we can't yet detect that well) and if not you can safely ignore this. To avoid this message you could alloc.col() first, deep copy first using copy(), wrap with suppressWarnings() or increase the 'datatable.alloccol' option.
.internal.selfref ptr is NULL. This is expected and normal for a data.table loaded from disk. If not, please report to data.table issue tracker.
Assigning to all 1000 rows
RHS_list_of_columns == false
df address after: 0x7f61cf6b5200 
tt address after: 0x7f61c6b5fc70 
[1] FALSE
  1. A bit strange that ".internal.selfref ptr is NULL." is repeated
  2. We could include advice to setDT after readRDS in this verbose message as well, as that would have helped:
tt <- readRDS("tt.txt")
setDT(tt)
cat('tt address before:', address(tt), '\n')
address_change(tt)
cat('tt address after:', address(tt), '\n')
"new_var" %in% names(tt)

has output

tt address before: 0x7f61cc964f40 
df address before: 0x7f61cc964f40 
Detected that j uses these columns: c 
Assigning to all 1000 rows
RHS_list_of_columns == false
df address after: 0x7f61cc964f40 
tt address after: 0x7f61cc964f40 
[1] TRUE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants