Skip to content

Core dumping R if using a serialized 'big.matrix' object #100

Open
@HenrikBengtsson

Description

@HenrikBengtsson

Hi, I discovered that bigmemory core dumps (=crashes/terminates) parallel workers if attempting to use 'big.memory' objects. This appears to be because there is an assumption that the object is always used in the same R process that it was created in, which does not work because of the external pointer. Here is a minimal reproducible example:

In one R session, do:

X <- bigmemory::as.big.matrix(matrix(1:2, nrow = 2L))
saveRDS(X, "X.rds")
quit("no")

In another R session, do:

> library(bigmemory)
> X <- readRDS("X.rds")
> X[1,1]

 *** caught segfault ***
address 0x10, cause 'memory not mapped'

Traceback:
 1: CGetNrow(x@address)
 2: nrow(x)
 3: nrow(x)
 4: CCleanIndices(as.double(i), as.double(nrow(x)))
 5: GetElements.bm(x, i, j)
 6: .local(x, i, j, ..., drop)
 7: X[1, 1]
 8: X[1, 1]

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
R is aborting now ...
Segmentation fault (core dumped)
> library(bigmemory)
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/tags/R-4-0-2/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/tags/R-4-0-2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bigmemory_4.5.36

loaded via a namespace (and not attached):
[1] bigmemory.sri_0.1.3 compiler_4.0.2      Rcpp_1.0.5  

Suggestion

Instead of core dumping, detect the problem and give an informative error message:

> library(bigmemory)
> X <- readRDS("X.rds")
> X[1,1]
Error: It is not possible to use a 'big.matrix' that was created in another R process

I don't know the internals, but I assume the problem is that the external pointer:

> X <- bigmemory::as.big.matrix(matrix(1:2, nrow = 2L))
> X@address
<pointer: 0x560457f389b0>

is used without making sure it is still valid.

PS. I consider this a quite serious bug since it can core dump R and parallel workers in R and it's hard to protect against it. People who run parallel code might not even know that bigmemory is used as part of some other package they rely on. This is the first package that I know of that use external pointers that also core dumps R, cf. https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html. It looks like those other packages can detect the problem and prevent core dumping, so, hopefully, it is not to hard to protect against this.

EDIT 2020-08-15 @ 18:11 UTC: Clarified that the bug fix should be to give an informative error message instead of core dumping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions