Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table randomly crashes R when using set on a DT with attributes #6410

Closed
hongyuanjia opened this issue Aug 27, 2024 · 9 comments · Fixed by #6419
Closed

data.table randomly crashes R when using set on a DT with attributes #6410

hongyuanjia opened this issue Aug 27, 2024 · 9 comments · Fixed by #6419

Comments

@hongyuanjia
Copy link
Contributor

Minimal reproducible example

I found that set() crashes R when an attribute is present in a DT. The code below may reproduce the error, but it should be run multiple times. I have no idea why.

mtcars <- data.table::setDT(data.table::copy(mtcars))
attr(mtcars, "bullet") <- "some test"
data.table::set(mtcars, NULL, "test", NA_character_)

 *** caught segfault ***
address 0xffffffffffffffff, cause 'invalid permissions'

Output of sessionInfo()

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas
.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapa
ck.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nvimcom_0.9.45

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         knitr_1.46        rlang_1.1.3      
 [5] xfun_0.44         processx_3.8.4    purrr_1.0.2       styler_1.10.3    
 [9] glue_1.7.0        clipr_0.8.0       htmltools_0.5.8.1 ps_1.7.6         
[13] fansi_1.0.6       rmarkdown_2.27    R.cache_0.16.0    evaluate_0.23    
[17] tibble_3.2.1      fastmap_1.2.0     yaml_2.3.8        lifecycle_1.0.4  
[21] compiler_4.4.0    fs_1.6.4          pkgconfig_2.0.3   rstudioapi_0.16.0
[25] R.oo_1.26.0       R.utils_2.12.3    digest_0.6.35     R6_2.5.1         
[29] reprex_2.1.0      utf8_1.2.4        pillar_1.9.0      parallel_4.4.0   
[33] callr_3.7.6       magrittr_2.0.3    R.methodsS3_1.8.2 tools_4.4.0      
[37] withr_3.0.0      

@tdeenes
Copy link
Member

tdeenes commented Aug 27, 2024

I can reproduce it in a vanilla R session with data.table 1.15.4:

test <- function() {
  mtcars <- data.table::setDT(data.table::copy(mtcars))
  ## this is the line which causes the problem (`setattr` eliminates it)
  attr(mtcars, "bullet") <- "some test"
  data.table::set(mtcars, NULL, "test", NA_character_)
}
for (i in 1:10) {
  print(i)
  test()
}

sessionInfo:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Linux Mint 21

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.1    data.table_1.15.4

When using setattr() instead of attr() to add the "bullet" attribute, the routine never segfaults, at least on my system:

test2 <- function() {
  mtcars <- data.table::setDT(data.table::copy(mtcars))
  data.table::setattr(mtcars, "bullet", "some test")
  data.table::set(mtcars, NULL, "test", NA_character_)
}
## this runs forever
repeat { test2() }

@tdeenes
Copy link
Member

tdeenes commented Aug 27, 2024

@hongyuanjia As an immediate fix, you can use setattr() - which you should do on data.tables, anyways.

@hongyuanjia
Copy link
Contributor Author

@hongyuanjia Thanks for the temp solution.

@ben-schwen
Copy link
Member

Did some digging. The crash is happening in this line:

SET_VECTOR_ELT(dt, coln, targetcol=allocNAVectorLike(thisvalue, nrow));

And it only happens when the data.table has been created with setDT

@rikivillalba
Copy link
Contributor

rikivillalba commented Aug 28, 2024

Probably related to what is warned in ?setalloccol Details section, as in R do_attrgets() (that is called by attr) creates a shallow duplicate of the object. Also if you do setalloccol(mtcars) after attr<- no sf is caught.

@aitap
Copy link
Contributor

aitap commented Aug 28, 2024

assign calls _selfrefok, which checks for a shallow, non-over-allocated copy. It is noticed and the TRUELENGTH of the object is adjusted:

Thread 1 "R" hit Watchpoint 2: ((VECSEXP)dt)->vecsxp.truelength

Old value = 1035
New value = 11
0x00005555556d9851 in SET_TRUELENGTH (x=x@entry=0x555555e7cee8, v=11) at ../../../src/main/memory.c:4067
4067    void (SET_TRUELENGTH)(SEXP x, R_xlen_t v) { SET_TRUELENGTH(CHK2(x), v); }
(gdb) bt
#0  0x00005555556d9851 in SET_TRUELENGTH (x=x@entry=0x555555e7cee8, v=11) at ../../../src/main/memory.c:4067
#1  0x00007ffff42bf74a in _selfrefok (x=x@entry=0x555555e7cee8, checkNames=checkNames@entry=TRUE, verbose=verbose@entry=FALSE) at assign.c:136
#2  0x00007ffff42c73e6 in selfrefnamesok (verbose=FALSE, x=0x555555e7cee8) at assign.c:144
#3  assign (dt=0x555555e7cee8, rows=0x5555559c4d20, cols=0x5555578af538, newcolnames=0x5555578af4c8, values=0x55555883e3f0) at assign.c:516

data.table/src/assign.c

Lines 135 to 136 in be97437

if (x!=R_ExternalPtrAddr(prot) && !ALTREP(x))
SET_TRUELENGTH(x, LENGTH(x)); // R copied this vector not data.table, it's not actually over-allocated

...but it is too late for the TRUELENGTH checks on dt itself:

(gdb) p names
$5 = (SEXP) 0x5555584756a0
(gdb) p tag
$6 = (SEXP) 0x5555584756a0
(gdb) p checkNames
$7 = TRUE
(gdb) n
_selfrefok (x=x@entry=0x555555e7cee8, checkNames=checkNames@entry=TRUE, verbose=verbose@entry=FALSE) at assign.c:137
137       return checkNames ? names==tag : x==R_ExternalPtrAddr(prot);
(gdb) fini
Run till exit from #0  _selfrefok (x=x@entry=0x555555e7cee8, checkNames=checkNames@entry=TRUE, verbose=verbose@entry=FALSE) at assign.c:137
0x00007ffff42c73e6 in assign (dt=0x555555e7cee8, rows=0x5555559c4d20, cols=0x5555578af538, newcolnames=0x5555578af4c8, values=0x55555883e3f0) at assign.c:516
516         if (!selfrefnamesok(dt,verbose))
Value returned is $8 = 1
(gdb) p oldtncol 
$9 = 1035

data.table/src/assign.c

Lines 507 to 516 in be97437

oldtncol = TRUELENGTH(dt); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more.
if (oldtncol<oldncol) {
if (oldtncol==0) error(_("This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or setalloccol() on it first (to pre-allocate space for new columns) before assigning by reference to it.")); // #2996
error(_("Internal error: oldtncol(%d) < oldncol(%d). Please report to data.table issue tracker, including result of sessionInfo()."), oldtncol, oldncol); // # nocov
}
if (oldtncol>oldncol+10000L) warning(_("truelength (%d) is greater than 10,000 items over-allocated (length = %d). See ?truelength. If you didn't set the datatable.alloccol option very large, please report to data.table issue tracker including the result of sessionInfo()."),oldtncol, oldncol);
if (oldtncol < oldncol+LENGTH(newcolnames))
error(_("Internal error: DT passed to assign has not been allocated enough column slots. l=%d, tl=%d, adding %d"), oldncol, oldtncol, LENGTH(newcolnames)); // # nocov
if (!selfrefnamesok(dt,verbose))

I suppose that a selfrefok check just before TRUELENGTH(dt) on line 507 would have caught this.

@hongyuanjia
Copy link
Contributor Author

On my side, there are cases in which R did not crash, and I am pretty sure I saw the error message from Line 515.

@tdhock
Copy link
Member

tdhock commented Aug 29, 2024

It is a heisenbug on my windows system: test() does not always crash R, but eventually it will if you run it many times.

> test <- function() {
+ mtcars <- data.table::setDT(data.table::copy(mtcars))
+ attr(mtcars, "bullet") <- "some test"
+ data.table::set(mtcars, NULL, "test", NA_character_)
+ }
> test()
> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.1    tools_4.4.1       data.table_1.15.4
> test()
> test()
> test()
> test()
> test()
> test()
> test()
> test()
> test()
> for (i in 1:10) {
+ print(i)
+ test()
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7

Process R exited abnormally with code 5 at Thu Aug 29 08:57:23 2024

@tdhock
Copy link
Member

tdhock commented Sep 12, 2024

I get an error on the first try on windows, using selfref_attr branch, yay!

R version 4.4.1 (2024-06-14 ucrt) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> setwd('c:/Users/hoct2726/R')
> test <- function() {
+ mtcars <- data.table::setDT(data.table::copy(mtcars))
+ attr(mtcars, "bullet") <- "some test"
+ data.table::set(mtcars, NULL, "test", NA_character_)
+ }
> test()
Error in data.table::set(mtcars, NULL, "test", NA_character_) : 
  It appears that at some ealier point, attributes of this data.table have been reassigned. Please ensure to use setattr() rather than attr<-. Otherwise, please report to data.table issue tracker.
> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.4.1     tools_4.4.1        data.table_1.16.99
> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants