Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing unit tests - memory leaks? #151

Closed
lgatto opened this issue Aug 15, 2016 · 19 comments
Closed

Failing unit tests - memory leaks? #151

lgatto opened this issue Aug 15, 2016 · 19 comments

Comments

@lgatto
Copy link
Owner

lgatto commented Aug 15, 2016

This is a follow up on issue #138. There are still randomly failing unit tests.

> f
[1] "/home/lg390/R/x86_64-pc-linux-gnu-library/3.3/msdata/proteomics/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz"
> tmp <- replicate(1e2, x <- readMSData2(f))
> tmp <- replicate(1e2, x <- readMSData2(f))
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
Error in (function (x)  : attempt to apply non-function
>

The randomly failing test have been observed when running source("testthat.R") and R CMD check (albeit less often with the former). The randomness of the failures could point to memory leaks.

@lgatto
Copy link
Owner Author

lgatto commented Aug 15, 2016

lg390@elyacin [02:44:05] [~/dev/00_github] [master *]
-> % make check PKG=MSnbase REL=1 VIG=0
echo "0" | cmp --silent - /tmp/MSnbase.buildflags || echo "0" > /tmp/MSnbase.buildflags
"/opt/Rpatched/bin/R" CMD build --no-build-vignettes MSnbase/ | \
    COLOURS=1 /home/lg390/dev/00_github/maker//include//color-output.sh
* checking for file ‘MSnbase/DESCRIPTION’ ... OK
* preparing ‘MSnbase’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* looking to see if a ‘data/datalist’ file should be added
* building ‘MSnbase_1.99.0.tar.gz’

{ time -f "time: %e" "/opt/Rpatched/bin/R" CMD check --no-build-vignettes MSnbase_1.99.0.tar.gz 2>&1; } | \
    COLOURS=1 /home/lg390/dev/00_github/maker//include//color-output.sh && \
    grep "WARNING" MSnbase.Rcheck/00check.log > /dev/null ; \
    if [ $? -eq 0 ] ; then exit 1; fi
* using log directory ‘/home/lg390/dev/00_github/MSnbase.Rcheck’
* using R version 3.3.1 Patched (2016-08-02 r71022)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* using option ‘--no-build-vignettes’
* checking for file ‘MSnbase/DESCRIPTION’ ... OK
* this is package ‘MSnbase’ version ‘1.99.0’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘MSnbase’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Unexported objects imported by ':::' calls:
  ‘Biobase:::.showAnnotatedDataFrame’ ‘MALDIquant:::.estimateNoise’
  ‘MALDIquant:::.localMaxima’ ‘MALDIquant:::.movingAverage’
  ‘MALDIquant:::.savitzkyGolay’ ‘MALDIquant:::.which.closest’
  See the note in ?`:::` about the use of this operator.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking compiled code ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  A collection of 10 features of interest.
  Iterations of EM: 
  1...2...3...4...5...6...7...8...9...10...11...
  [1] 0.07947339
  testthat results ================================================================
  OK: 983 SKIPPED: 0 FAILED: 4
  1. Error: Compare subsetting between OnDiskMSnExp and MSnExp (@test_OnDiskMSnExp.R#228) 
  2. Error: Compare subsetting between OnDiskMSnExp and MSnExp (@test_OnDiskMSnExp.R#228) 
  3. Error: Compare subsetting between OnDiskMSnExp and MSnExp (@test_OnDiskMSnExp.R#228) 
  4. Error: Compare subsetting between OnDiskMSnExp and MSnExp (@test_OnDiskMSnExp.R#228) 

  Error: testthat unit tests failed
  Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... WARNING
Package vignettes without corresponding PDF/HTML:
   ‘benchmarking.Rmd’
   ‘MSnbase-demo.Rnw’
   ‘MSnbase-development.Rnw’
   ‘MSnbase-io.Rnw’

* checking running R code from vignettes ...
   ‘benchmarking.Rmd’ ... OK
   ‘MSnbase-demo.Rnw’ ... OK
   ‘MSnbase-development.Rnw’ ... OK
   ‘MSnbase-io.Rnw’ ... OK
 NONE
* checking re-building of vignette outputs ... SKIPPED
* checking PDF version of manual ... OK
* DONE

Status: 1 ERROR, 1 WARNING, 1 NOTE
See
  ‘/home/lg390/dev/00_github/MSnbase.Rcheck/00check.log’
for details.

Command exited with non-zero status 1
time: 604.94 (check time should be < 5 min)
make: *** [check-only] Error 1

Same command, but failing test was at test_OnDiskMSnExp.R#74!

@lgatto lgatto reopened this Aug 15, 2016
@jorainer
Copy link
Collaborator

Heck; I'll have a look at this next week. If it's a memory leak it's easy to track by setting gctorture(on = TRUE) around the failing command.

@lgatto
Copy link
Owner Author

lgatto commented Aug 18, 2016

The problem is that, as previously, it's not the same set of unit tests that fail.
(I have been too busy these last couple of days to look further into it.)

@jorainer
Copy link
Collaborator

jorainer commented Aug 24, 2016

The first error (in readMSData2) is unlikely a memory leak, since we're not creating any objects there at the C-level (thus, if there is a memory leak it's not our fault, so to say).

To validate I run readMSData2 with gctorture(on = TRUE) and this caused no problems.

@jorainer
Copy link
Collaborator

For readMSData2, the Error in (function (x) : attempt to apply non-function is randomly thrown for new calls (e.g. new("NAnnotatedDataFrame", data = .pd)); I'll try to dig a little deeper.

@lgatto
Copy link
Owner Author

lgatto commented Aug 24, 2016

I have also have error popping up for new("NAnnotatedDataFrame", data = .pd), but I don't think the error is actually in NAnnotatedDataFrame or AnnotatedDataFrame (unless there's a bug in dispatching to the super-class).

@jorainer
Copy link
Collaborator

OK, I think I've got the culprit: seems to me that the mzR is doing some nasty things, eventually not freeing memory or similar. What I did:

Run the following test function 5000 times with different settings:

library(MSnbase)
library(msdata)
f <- msdata::proteomics(full.names = TRUE, pattern = "TMT_Erwinia")

test_apply_non_function <- function(files, readFile = FALSE,
                                    readHeader = FALSE, doGc = FALSE,
                                    backend = "Ramp") {
    if (readFile) {
        cat("1.")
        msdata <- mzR::openMSfile(files, backend = backend)
        if (readHeader) fullhd <- mzR::header(msdata)
        mzR::close(msdata)
        msdata <- NULL
        if (doGc) gc()
    }
    cat("2.")
    pdata <- NULL
    ## Create 'fdata' and 'pdata' objects
    if (is.null(pdata)) {
        .pd <- data.frame(sampleNames = basename(files))
        rownames(.pd) <- .pd$sampleNames
        pdata <- new("NAnnotatedDataFrame", data = .pd)
    }
    cat("3.")
}

Running this with:

for (i in 1:5000) {
    cat(i, "...")
    tmp <- test_apply_non_function(f, readFile = FALSE, doGc = FALSE,
                                   backend = "Ramp")
    cat("OK\n")
}

Works nicely, but

for (i in 1:5000) {
    cat(i, "...")
    tmp <- test_apply_non_function(f, readFile = TRUE, doGc = FALSE,
                                   backend = "Ramp")
    cat("OK\n")
}

throws the famous attempt to apply non-function errors. Switching on the garbage collector after each close:

for (i in 1:5000) {
    cat(i, "...")
    tmp <- test_apply_non_function(f, readFile = TRUE, doGc = TRUE,
                                   backend = "Ramp")
    cat("OK\n")
}

works nicely. This is why I expect that the problem is actually coming from the mzR package.

Switching to the pwiz backend unfortunately does not work, as file handles are not properly closed for pwiz (reported as issue sneumann/mzR#48).

@lgatto
Copy link
Owner Author

lgatto commented Aug 25, 2016

Thank you @jotsetung - I agree with you that the problem is in mzR. I see the following things that can be done

  1. Fix the Ramp backend - probably not straightforward at all.
  2. Properly close pwiz - I have pinged the person that implemented it.
  3. In the meantime, add a call to gc() every time we access data on disk.

What do you think?

@jorainer
Copy link
Collaborator

@lgatto I agree. I screened a little through the Ramp code, but no idea where the problem could be (guess some memory not freed properly).
Adding a gc() would do it for now; but performance-wise that would be too bad if we would have to stick with that for long (especially for the on-demand import).

@lgatto
Copy link
Owner Author

lgatto commented Aug 25, 2016

Given that PR sneumann/mzR#49 has now been merged, option 2 above seems to be the way to go. The first one that pushes usage of the pwiz backend closes this issue, ok :-)

@jorainer
Copy link
Collaborator

Guess what, using backend = "pwiz" doesn't resolve the problem. The only way how I can get rid of this random error is by calling gc() after each mzR::close(msdata). Funnily enough I even don't have to remove the msdata variable before. Somehow it feels like there is some memory management problem in Rcpp.

@lgatto
Copy link
Owner Author

lgatto commented Aug 26, 2016

Do you think we should consider bringing this up on the Rcpp list?

@jorainer
Copy link
Collaborator

jorainer commented Aug 26, 2016

I'm trying to do some more tests (to be sure that's not our fault) and will define a neat test case showing the problem.

@jorainer
Copy link
Collaborator

OK, these are the results thus far:
I used the following test function:

test_fun <- function(file, backend = "Ramp", doGc = FALSE, newClass = FALSE,
                     newAdf = FALSE) {
    msd <- mzR::openMSfile(file, backend = backend)
    mzR::close(msd)
    rm(msd)
    if (newClass)
        nc <- new("dummy", content = "something")
    if (newAdf) {
        adf <- new("AnnotatedDataFrame", data = data.frame(a = 1:10))
    }
    if (doGc) gc()
}

Running

library(mzR)
f <- system.file("microtofq/MM8.mzML", package = "msdata")

for (i in 1:5000) {
    if (i %% 200 == 0)
        cat(i, "\n")
    test_fun(f)
}

No problem. Creating a new object with new after closing the connection:

## Define a dummy class
setClass("dummy",
         slots = c(content = "character"),
         prototype = prototype(
             content = "none"
         ))

for (i in 1:5000) {
    if (i %% 200 == 0)
        cat(i, "\n")
    test_fun(f, newClass = TRUE)
}

All OK. BUT:

library(Biobase)
for (i in 1:5000) {
    if (i %% 200 == 0)
        cat(i, "\n")
    test_fun(f, newAdf = TRUE)
}
200 
400 
600 
800 
1000 
1200 
1400 
1600 
1800 
2000 
2200 
2400 
2600 
2800 
3000 
3200 
3400 
3600 
3800 
Error in (function (x)  (from MSnbase_torture_test.R!15615a0v#9) : attempt to apply non-function

Again, calling gc helps. What is strange is that this error appears creating a new AnnotatedDataFrame, but does not for the /dummy/ object above.

I'll have a look at the Biobase package to see whether something strange is going on there.

@jorainer
Copy link
Collaborator

Update on the results above: I've tested the function creating different types of objects after the open/close of the mzML file: it fails for ExpressionSet (BioBase), but works for IRanges (IRanges), DataFrame (S4Vectors) and SummarizedExperiment (SummarizedExperiment; that should be equivalent in complexity to an ExpressionSet).

@jorainer
Copy link
Collaborator

Let's celebrate!!! Rcpp version 0.12.6.4 did the trick! The last version of Rcpp from github somehow seems to fix the problem!!!

hooray

@jorainer
Copy link
Collaborator

too sad, was too early; still getting the error now when using readMSData2.

@jorainer
Copy link
Collaborator

What bugs me is that the same test using readMSData doesn't produce these errors. In principle, both functions do the same, just readMSData2 reads less data (no spectrum data).

jorainer added a commit that referenced this issue Aug 30, 2016
o Add gc() call after data import in readMSData2.
o Add gc() call after spectrum/a import in spectrapply.
@jorainer
Copy link
Collaborator

This should be fixed now - for real - (91255f1). It uses now gc() calls after each mzR::close() which is not that elegant but seems to work.

@lgatto lgatto closed this as completed Aug 30, 2016
lgatto pushed a commit that referenced this issue Oct 27, 2016
* master: (312 commits)
  close #153: update low/highMZ checks in validateOnDiskMSnSet
  suggest microbenchmark and suppress message in benchark vig
  Add unit test for issue #118
  Fix failing tests and example due to r120413 in Biobase
  Fix for loop that iterates over spectra to work with empty objects
  Spectra import without need to read mzML header
  Intermediate fix for issue #151
  add unit test for extractMgfSpectrum2Info
  fix warning in readMgfData if TITLE contains multiple =
  split MSnExp tests and new [[ error test
  fix OnDiskMSnExp [[ subsetting - closes issue 152
  coding style
  add gc to readMSData2 - (hopefully) closes #151
  coding style
  use centroided argument in plot.Spectrum.Spectrum example; see issue #150
  coding style
  set mode in calculateFragments example
  remove expecting warning when peak picking centroided spectrum, not silently returns object
  fix centoided test in spectrum functions - see issue #150
  centroided accessor with na.fail unit test
  ...

From: Laurent <lg390@cam.ac.uk>

git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@120585 bc3139a8-67e5-0310-9ffc-ced21a209358
lgatto pushed a commit that referenced this issue Apr 9, 2017
* master: (312 commits)
  close #153: update low/highMZ checks in validateOnDiskMSnSet
  suggest microbenchmark and suppress message in benchark vig
  Add unit test for issue #118
  Fix failing tests and example due to r120413 in Biobase
  Fix for loop that iterates over spectra to work with empty objects
  Spectra import without need to read mzML header
  Intermediate fix for issue #151
  add unit test for extractMgfSpectrum2Info
  fix warning in readMgfData if TITLE contains multiple =
  split MSnExp tests and new [[ error test
  fix OnDiskMSnExp [[ subsetting - closes issue 152
  coding style
  add gc to readMSData2 - (hopefully) closes #151
  coding style
  use centroided argument in plot.Spectrum.Spectrum example; see issue #150
  coding style
  set mode in calculateFragments example
  remove expecting warning when peak picking centroided spectrum, not silently returns object
  fix centoided test in spectrum functions - see issue #150
  centroided accessor with na.fail unit test
  ...

From: Laurent <lg390@cam.ac.uk>

git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@120585 bc3139a8-67e5-0310-9ffc-ced21a209358
lgatto pushed a commit that referenced this issue Sep 7, 2017
* master: (312 commits)
  close #153: update low/highMZ checks in validateOnDiskMSnSet
  suggest microbenchmark and suppress message in benchark vig
  Add unit test for issue #118
  Fix failing tests and example due to r120413 in Biobase
  Fix for loop that iterates over spectra to work with empty objects
  Spectra import without need to read mzML header
  Intermediate fix for issue #151
  add unit test for extractMgfSpectrum2Info
  fix warning in readMgfData if TITLE contains multiple =
  split MSnExp tests and new [[ error test
  fix OnDiskMSnExp [[ subsetting - closes issue 152
  coding style
  add gc to readMSData2 - (hopefully) closes #151
  coding style
  use centroided argument in plot.Spectrum.Spectrum example; see issue #150
  coding style
  set mode in calculateFragments example
  remove expecting warning when peak picking centroided spectrum, not silently returns object
  fix centoided test in spectrum functions - see issue #150
  centroided accessor with na.fail unit test
  ...

From: Laurent <lg390@cam.ac.uk>

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@120585 bc3139a8-67e5-0310-9ffc-ced21a209358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants