Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory problem in Rcpp? #547

Closed
jorainer opened this issue Aug 30, 2016 · 11 comments
Closed

Potential memory problem in Rcpp? #547

jorainer opened this issue Aug 30, 2016 · 11 comments

Comments

@jorainer
Copy link

Dear developers!

We have some randomly occurring errors in the MSnbase package (see issue lgatto/MSnbase#151) that I believe are somehow related with the Rcpp package. First, what makes me believe this is a memory problem is that a) it occurs randomly and b) it can be solved by a call to gc(). Why I believe it's related to Rcpp: in MSnbase we use facilities from Bioconductor's mzR package (https://github.com/sneumann/mzR) to read files. mzR uses routines implemented in C++ to read these files.
Based on how the errors are happening I have the feeling that memory is not cleaned up properly after calling the C++ functions.

In the example below we read the header of a file and create some objects after that. Note that the errors do not appear without the calls to new. Also, calling gc() after each rm(msd) fixes the problem.

library(MSnbase)
library(msdata)
f <- system.file("microtofq/MM8.mzML", package = "msdata")
for (i in 1:5000) {
    if (i %% 200 == 0)
        cat(i, "\n")
    ## open the file.
    msd <- mzR::openMSfile(f)
    fullHead <- mzR::header(msd)
    mzR::close(msd)
    rm(msd)
    ## Process
    process <- new("MSnProcess",
                   processing = paste0("Data loaded [", date(), "]"),
                   files = f,
                   smoothed = NA)
    ## pdata
    .pd <- data.frame(sampleNames = basename(f))
    rownames(.pd) <- .pd$sampleNames
    pdata <- new("NAnnotatedDataFrame",
                 data = .pd)
}
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function

I would be grateful for any hints how to solve this differently to calling gc after each import.

All packages used are from Bioconductor, so you could grap them using

source("http://www.bioconductor.org/biocLite.R")
biocLite(c("MSnbase", "mzR", "msdata"))

My sessionInfo:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin15.6.0/x86_64 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] msdata_0.12.0       MSnbase_1.21.8      ProtGenerics_1.4.0 
[4] BiocParallel_1.6.3  mzR_2.6.3           Rcpp_0.12.6        
[7] Biobase_2.32.0      BiocGenerics_0.18.0

loaded via a namespace (and not attached):
 [1] magrittr_1.5          IRanges_2.6.1         zlibbioc_1.18.0      
 [4] doParallel_1.0.10     munsell_0.4.3         colorspace_1.2-6     
 [7] impute_1.46.0         lattice_0.20-33       foreach_1.4.3        
[10] stringr_1.0.0         plyr_1.8.4            tools_3.3.1          
[13] mzID_1.10.2           grid_3.3.1            gtable_0.2.0         
[16] affy_1.50.0           iterators_1.0.8       digest_0.6.9         
[19] preprocessCore_1.34.0 affyio_1.42.0         reshape2_1.4.1       
[22] ggplot2_2.1.0         S4Vectors_0.10.2      codetools_0.2-14     
[25] MALDIquant_1.15       limma_3.28.16         stringi_1.1.1        
[28] BiocInstaller_1.22.3  pcaMethods_1.64.0     scales_0.4.0         
[31] XML_3.98-1.4          stats4_3.3.1          vsn_3.40.0           
@eddelbuettel
Copy link
Member

eddelbuettel commented Aug 30, 2016

I would be nice if you could mock it down to something not involved other dependencies -- if it really is just object creation then that should be possible.

Barring that, can you try memory-profiling eg via valgrind, as describe in Writing R Extensions? Rcpp sees pretty widespread usage, and we so far have no indication we are doing something wrong objects are created or are being unwound. That said, S4 (which is suspect is involved here) may see less widespread use than other parts. Your help in making this more robust would be appreciated, particularly if you could create _minimally reproducible example. Here, in

    msd <- mzR::openMSfile(f)
    fullHead <- mzR::header(msd)
    mzR::close(msd)
    rm(msd)

and the reported calling gc() after each rm(msd) fixes [it] -- could it be that mzR does something in way that is not quite correct?

@jorainer
Copy link
Author

Thanks for your quick reply. I'll definitely try debugging with valgrind.
The problem with the minimal example is that mzR uses external code, i.e. is interfacing the proteowizard C++ code via Rcpp. The problem could thus be in mzR (I didn't see anything obvious there) or further down in proteowizard.

It is just strange that the errors appear when I create new S4 objects using new downstream of the mzR code.

@eddelbuettel
Copy link
Member

mZR was pretty early in using Rcpp Modules, and we as I recall have seen other (unrelated) issues with mzR still using a first generation loader for Modules. That is unlikely to be the issue here, but I guess our best bet may be to mock a similar Modules structure. Either we find a bug in Rcpp, or we develop a more robust framework for mzR to use...

@thirdwing
Copy link
Member

mzR was by me more than two years ago. I will take a look at next week.

@thirdwing
Copy link
Member

I have spent several hours on this. In my personal opinion, this seems not about Rcpp or mzR. It is more likely some conflicts with Biobase, because the following code works without any problem.

library(methods)
f <- system.file("microtofq/MM8.mzML", package = "msdata")

setClass("dummy",
         slots = c(content = "character"),
         prototype = prototype(
             content = "none"
         ))

for (i in 1:5000) {
    if (i %% 200 == 0) {
        cat(i, "\n")
    }
    ## open the file.
    msd <- mzR::openMSfile(f)
    fullHead <- mzR::header(msd)
    mzR::close(msd)
    rm(msd)
    ##gc()
    process <- new("dummy", content = "something")
}

@jorainer
Copy link
Author

Yes, that was also my conclusion. Thus I looked at the Biobase source but did not find anything strange there either. My only guess would be that it might have to do with class inheritance and something going on there in the depths of R, but I couldn't find any hard evidence for that.

@eddelbuettel
Copy link
Member

Shall we close this here then?

@jorainer
Copy link
Author

yes, thanks for all your help!

@eddelbuettel
Copy link
Member

Sorry 'bout the bug. As I sometimes joke, friends don't let friends rely on S4 methods :)

Maybe a small reproducible bug report could be sent to the BioBase / mzR side of things?

@jorainer
Copy link
Author

this bug comes actually from MSnbase, went to mzR and ended here in Rcpp ;) the "fix" with the gc() call works for now, but when I find the time I'll dig deeper

@thirdwing
Copy link
Member

thirdwing commented Sep 21, 2016

I will try to do that. Maybe we can find some interesting things by
tracing down this weird bug.

KK

On 09/21/2016 10:41 AM, Dirk Eddelbuettel wrote:

Sorry 'bout the bug. As I sometimes joke, friends don't let friends
rely on S4 methods :)

Maybe a small reproducible bug report could be sent to the BioBase /
mzR side of things?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#547 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABebVTaySP3kiwKhgYfi4Oupm2tR7bM0ks5qsUItgaJpZM4JwN3M.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants