[MSData::Spectrum::getMZIntensityPairs()] Sizes do not match. #170

jorainer · 2016-11-17T13:50:16Z

I'm experiencing some random errors again:
I have a set of 690 mzML files, select 12 of them for further analysis, filter on retention time and get the following error when calling spectra on the OnDiskMSnExp:

Error: BiocParallel errors
  element index: 6
  first error: [MSData::Spectrum::getMZIntensityPairs()] Sizes do not match.

At first I thought that it must be my files, but when I select randomly selected 12 other files the same error occurred. So it's most likely not these specific files, also, the element index is not always 6.
Also without filtering I get errors.

And what makes me really wondering is that sometimes, especially if called repeatedly, the function works without errors.

The text was updated successfully, but these errors were encountered:

jorainer · 2016-11-17T15:09:37Z

It can't be the files, since I can read them without problems using readMSData.

lgatto · 2016-11-17T18:55:05Z

Where does the BiocParallel error come from, what operation is done in parallel? Is this one OnDiskMSnExp with 12 files and the filtering is done in parallel over the fileNames()?

jorainer · 2016-11-17T21:17:58Z

The spectra call does simply read the data in, creates the Spectrum1 objects and returns that. The most obvious difference to the way how readMSData reads the file(s) is that it iterates over the individual acquisitionNum and reads one spectrum at a time, while for OnDiskMSnExp I'm reading all spectra in one call (with mzR::peaks(fileh, idx), where idx is the index of all spectra after filtering by rt).

I'll try to do some more tests tomorrow, also with other files to exclude that it's the files.

jorainer · 2016-11-18T11:46:11Z

Some updates:
Loading a single file as OnDiskMSnExp and calling spectra on that does not cause the error, also if this is done on all 12 files sequentially. I only get the error if I load the 12 files as one experiment and call spectra on this one OnDiskMSnExp.

lgatto · 2016-11-18T12:03:05Z

This is consistent with the BiocParallel error. Progress, I suppose...

jorainer · 2016-11-18T12:41:54Z

Actually, the error comes from ProteinWizard

mzR::peaks -> RcppWiz::getPeakList(x) which loads the spectrum and extracts the mz intensity pairs with getMZIntensityPairs . The error is thrown by this function if the sizes of mz and intensity arrays don't match. That's what I understand from the error.

jorainer · 2016-11-18T13:07:00Z

Now that's getting strange. the same files but differently converted to mzML (using vendor settings and without zlib compression of the binary data) and the error doesn't happen again.
problem of zlib?

jorainer · 2016-11-20T09:48:11Z

even more strange: when I process the files all in one go I don't get the error, when I save the OnDiskMSnExp close R load the object again and call spectra I get the error!

Need to get to a reproducible example using test files from msdata.

lgatto · 2016-11-20T12:04:44Z

Somehow, on-disk objects shouldn't be saved/load, although, in theory, it should work if the raw files haven't been modified/moved. I just tried with a single file, and, indeed, it works. It is really puzzling. Could it be that the saving/loading error is only a red herring, and the problem lies somewhere else, deeper (for example your commit 9898ece9a70764fb7b748bca024877c0cea44623)

jorainer · 2016-11-20T13:29:45Z

Yes, I think it was only by chance that it worked and than failed again. So, saving/loading might not be it.
Also, I can't reproduce this error with other files than mine. I'll try some different settings tomorrow to convert the original wiff files into mzML.

The only thing I know so far is that I get a segfault if I use the ramp backend (memory not mapped) and the [MSData::Spectrum::getMZIntensityPairs()] Sizes do not match. if I use pwiz. So apparently the binary spectra data that is returned does somehow not match. Why it sometimes works (e.g. if I load one file at a time) and sometimes not (if I load all files at once) I can't explain.

jorainer · 2016-11-21T08:21:33Z

GOT IT!
In the function to read the spectrum values on the fly (invoked by spectrapply) I was using the mzR::peaks method without first reading the spectras' headers. Now, if I add a mzR::header call before the mzR::peaks (even without needing the header information), it works.
Eventually that way the C++ code silently reads additional information (e.g. on how the spectrum data is encoded) that is missing if the header information is not read.

I'll do some more tests and push the changes once fixed.

o Ensure that header information is read too if spectra data is loaded for OnDiskMSnExp objects.

jorainer · 2016-11-24T08:25:12Z

Closing issue as it seems to be fixed for good. @lgatto could you eventually dump version and push to svn?

lgatto · 2016-11-24T20:39:58Z

Done. Version 2.1.2 on hedgehog

CHANGES IN VERSION 2.1.2
------------------------
 o Update readMSnSet2 to save filename <2016-11-09 Wed>
 o Ensure that header information is read too if spectra data is
   loaded for OnDiskMSnExp objects (see issue #170) <2016-11-24 Thu>

I still have to extract your #170 commit and push to release 3.4.

lgatto · 2016-11-24T21:09:21Z

I still have to extract your #170 commit and push to release 3.4.

Done too.

o Ensure that header information is read too if spectra data is loaded for OnDiskMSnExp objects. From: jotsetung <johannes.rainer@gmail.com> git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@124451 bc3139a8-67e5-0310-9ffc-ced21a209358

* master: update news Fix issue #170 Add spectrapply method and backend option Fix unit test error due to recent changes Add bpi method (issue #168) set filename only when input is a character Update readMSnSet2 to save filename Cite Lazar 2016 in vignette imputation section add imputatation paper to bib update news and description fix typo in impute man page new github devel version From: Laurent <lg390@cam.ac.uk> git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@124452 bc3139a8-67e5-0310-9ffc-ced21a209358

o Ensure that header information is read too if spectra data is loaded for OnDiskMSnExp objects. From: jotsetung <johannes.rainer@gmail.com> git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@124451 bc3139a8-67e5-0310-9ffc-ced21a209358

* master: update news Fix issue #170 Add spectrapply method and backend option Fix unit test error due to recent changes Add bpi method (issue #168) set filename only when input is a character Update readMSnSet2 to save filename Cite Lazar 2016 in vignette imputation section add imputatation paper to bib update news and description fix typo in impute man page new github devel version From: Laurent <lg390@cam.ac.uk> git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@124452 bc3139a8-67e5-0310-9ffc-ced21a209358

- Remove the code inserted to fix issue #170; fixes issue #216. - Add tests to the torture.R script checking if the #170 error still happens.

jorainer · 2017-07-17T07:38:41Z

Digging deeper into the pwiz code trying to understand why this error occurs. Calling mzR::header before mzR::peaks solves the issue but has a major effect on performance, especially on gzipped mzML files. Thus I'm investigating if I can fix the issue somehow within mzR.

jorainer · 2017-07-17T08:09:07Z

Updates related to mzR code and changes are available in sneumann/mzR#112

jorainer · 2017-07-19T12:30:00Z

After extensive tests and evaluation of multiple approaches the only solution to this issue seems to be the original solution, i.e. to call mzR::header before reading the data with mzR::peaks. It might however be slightly modified as it does not seem to be required to read all header.

lgatto · 2017-07-19T12:47:07Z

Thanks!

jorainer · 2017-07-20T16:14:23Z

After some tests (many more to come), the issue reported here seems to occur only on macOS and there also only on one specific set of mzML files. So, if all further tests run smoothly, my suggestion would be to make the fix an option, but to disable it by default (more explanations later).

Below are some benchmark tests for just reading data using mzR::peaks and the fixes that call in addition mzR::header:

library(mzR)
library(msdata)
library(microbenchmark)

## Define the functions to compare.
only_peaks <- function(x) {
    fh <- mzR::openMSfile(x)
    pks <- mzR::peaks(fh)
    mzR::close(fh)
}

peaks_with_all_headers <- function(x) {
    fh <- mzR::openMSfile(x)
    hdr <- mzR::header(fh)
    pks <- mzR::peaks(fh)
    mzR::close(fh)
}

peaks_with_last_header <- function(x) {
    fh <- mzR::openMSfile(x)
    hdr <- mzR::header(fh, length(fh))
    pks <- mzR::peaks(fh)
    mzR::close(fh)
}

## mzML
fl <- system.file("microtofq/MM14.mzML", package = "msdata")
microbenchmark(only_peaks(fl), peaks_with_all_headers(fl),
	       peaks_with_last_header(fl), times = 10)
Unit: milliseconds
                       expr      min       lq     mean   median       uq
             only_peaks(fl) 44.89906 45.89676 47.75040 47.15564 49.25066
 peaks_with_all_headers(fl) 71.15074 73.36380 80.23435 74.95574 80.91604
 peaks_with_last_header(fl) 66.75709 67.77629 80.98064 69.63443 74.46741
      max neval cld
  51.4870    10  a 
 106.6319    10   b
 167.8683    10   b

Not unexpectedly, the call without header is the fastest. For mzML files reading only the header of the last spectrum is also faster than reading all of them.

Next on a gzipped mzML file:

## gzipped mzML
fl <- system.file("proteomics/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz", package = "msdata")
microbenchmark(only_peaks(fl), peaks_with_all_headers(fl),
	       peaks_with_last_header(fl), times = 10)
Unit: seconds
                       expr      min       lq     mean   median       uq
             only_peaks(fl) 13.39147 13.52713 13.64382 13.56836 13.80904
 peaks_with_all_headers(fl) 27.62570 27.78864 28.13496 27.99689 28.56590
 peaks_with_last_header(fl) 15.50221 15.67585 16.01584 15.87251 16.03055
      max neval cld
 14.14337    10 a  
 29.14699    10   c
 17.84753    10  b

Now that's considerably slower. Reading the header information from all spectra has really poor performance, while reading the last header is better.

Next we evaluate an mzXML file:

fl <- system.file("lockmass/LockMass_test.mzXML", package = "msdata")
microbenchmark(only_peaks(fl), peaks_with_all_headers(fl),
	       peaks_with_last_header(fl), times = 10)
Unit: milliseconds
                       expr       min       lq      mean    median        uq
             only_peaks(fl)  67.81239  68.1742  70.10934  68.55077  72.94026
 peaks_with_all_headers(fl) 122.98311 126.4965 129.60679 127.83370 131.70625
 peaks_with_last_header(fl) 100.18529 101.1152 104.02154 102.28500 108.03445
       max neval cld
  75.26939    10 a  
 139.63150    10   c
 111.65219    10  b

Similar to the mzML file, reading just the data is fastest, data + last header second and data + all header is about twice as slow.
Below we repeat the test on the same file, but compressed:

## At last with the same file but gzipped...
fl <- "/Users/jo/data/2017/mzXML/1405_blk1.mzXML.gz"
microbenchmark(only_peaks(fl), peaks_with_all_headers(fl),
	       peaks_with_last_header(fl), times = 10)
Unit: seconds
                       expr      min       lq     mean   median       uq
             only_peaks(fl) 15.58146 15.76791 15.97458 15.96889 16.06633
 peaks_with_all_headers(fl) 30.09662 30.22185 30.64439 30.48337 30.82167
 peaks_with_last_header(fl) 28.57678 28.68009 29.43082 28.90038 29.33533
      max neval cld
 16.65828    10 a  
 31.81355    10   c
 33.78522    10  b

Also here, reading just the data using mzR::peaks is fastest. Reading the header from a single spectrum instead of all does not make a big difference here.

Summarizing:

performance-wise it might be helpful to remove the additional call to header.
Gzipped files have a very bad impact on performance - it's better to compress the (binary) data within the mzML files.

jorainer · 2017-07-21T05:41:52Z

The first runs for my torture tests are ready:

library(mzR)
SN <- "/Users/jo/data/2016/2016-11/NoSN/"
## SN <- "/Users/jo/data/2017/2017_02/"
## SN <- "/Users/jo/data/2016/2016_06/"
## SN <- "/Users/jo/data/2017/nalden01/"

fl <- dir(SN, full.names = TRUE)

torture_test <- function(files, FUN, iterations = 10) {
    for (i in 1:iterations) {
	cat("\nIteration", i, "of", iterations, "\n\n")
	for (j in 1:length(fl)) {
	    if (j %% 20 == 0)
		cat(j, "files processed\n")
	    FUN(fl[j])
	}
    }
}

fail_fun <- function(x) {
    fh <- mzR::openMSfile(x)
    pks <- mzR::peaks(fh)
    mzR::close(fh)
}
torture_test(fl, FUN = fail_fun)

In brief, the test opens each file, extracts the data from each spectrum in the file using mzR::peaks and closes the file again. This is repeated 10 times on the files in one folder.
The fail_fun represents the way how we would usually read data - but this caused the errors described at the top in this issue. I re-run all the tests on 4 different sets of mzML files:

2016-11/NoSN/: 690 files on which I first got the error (on macOS).
2016_06: 609 other mzML files from our lab.
2017_02: 160 mzML files from our lab.
nalden01: 9 mzML files from another lab.
I'll run the torture tests on:
macOS
Linux
Windows
to evaluate if I get the error on all 3 platforms.

A note on the mzML files from our lab: they are converted from ABI wiff format to mzML using proteowizard on Windows 7.

jorainer · 2017-07-21T05:47:40Z

Results for macOS:

2016-11/NoSN: 1x OK, 3x FAIL. torture test failed 3 times (each time in the first iteration after ~ 40 files), and succeeded 1 time.
2016_06: 2x OK, 1x FAIL.
2017_02: 3x FAIL (first iteration after ~120 files).
nalden01: 4x OK.

As described above, the error occurs randomly, although more frequently on certain files - but not always.

sessionInfo:

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin16.7.0/x86_64 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.11.5   Rcpp_0.12.12

loaded via a namespace (and not attached):
[1] compiler_3.4.1      ProtGenerics_1.9.0  parallel_3.4.1     
[4] Biobase_2.37.2      codetools_0.2-15    BiocGenerics_0.23.0

- Add an option fastLoad to disable the additional mzR::header call executed before each mzR::peaks call to fetch data on-demand for OnDiskMSnExp objects. This partially reverts the fix for issue #170 as this seems to be macOS and file specific. - Add related unit tests and documentation.

jorainer · 2017-07-21T06:56:10Z

Results for Linux:

2016-11/NoSN: 2x OK.
2017_06: 2x OK.
2017_02: 4x OK.
nalden01: 4x OK.

Apparently, on Linux there is no problem using just mzR::peaks.

sessionInfo:

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu/x86_64 (64-bit)
Running under: Linux Mint 18.1

Matrix products: default
BLAS: /home/jo/R/2017-07/R-3.4.1-BioC3.6-devel/lib/R/lib/x86_64/libRblas.so
LAPACK: /home/jo/R/2017-07/R-3.4.1-BioC3.6-devel/lib/R/lib/x86_64/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.11.5   Rcpp_0.12.12

loaded via a namespace (and not attached):
[1] compiler_3.4.1      ProtGenerics_1.9.0  parallel_3.4.1     
[4] Biobase_2.37.2      codetools_0.2-15    BiocGenerics_0.23.0

jorainer · 2017-07-24T04:29:23Z

Results for Windows:

2016-11/NoSN: 2x OK.
2016_06: 2x OK.
2017_02: 2x OK.
nalden01: 2x OK.

Also on Windows using mzR::peaks without mzR::header before is working.

sessionInfo:

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] mzR_2.11.4   Rcpp_0.12.11

loaded via a namespace (and not attached):
[1] compiler_3.4.1      ProtGenerics_1.9.0  parallel_3.4.1
[4] Biobase_2.37.2      codetools_0.2-15    BiocGenerics_0.23.0

jorainer · 2017-07-24T04:34:22Z

Conclusion from these tests:

mzR::peaks without mzR::header fails only on macOS but works on Linux and Windows.
Add an option to MSnbase to enable/disable reading in addition the header for the spectra. Enable this by default on macOS systems.

I will add this (and in addition remove the additional gc(); issue #151) in the function called by spectrapply and perform torture tests to ensure that all is properly working.

jorainer · 2017-07-24T13:05:58Z

Next I'm running torture tests using MSnbase functions and methods:

library(MSnbase)

torturing <- function(x) {
    tmp <- readMSData2(x, msLevel. = 1)
    register(SerialParam())
    for (i in 1:10) {
        cat("--- ", i, " ---", "\n")
        cat("first spectrapply\n")
        sp <- MSnbase::spectrapply(tmp, FUN = function(z) {max(mz(z))})
        rm(sp)
        gc()
        cat("second spectrapply\n")
        sp <- MSnbase::spectrapply(tmp, FUN = function(z) {max(mz(z))})
        rm(sp)
        gc()
        tmp <- filterRt(tmp, rt = c(5, 500))
        cat("third spectrapply after filter rt\n")
        sp <- MSnbase::spectrapply(tmp, FUN = function(z) {max(mz(z))})
        cat("\n\n")
    }
}

This function is run on the same sets of test files on macOS, Linux and Windows.
Settings enabled in the spectrapply:

No additional call to gc() in the function called by spectrapply (issue Failing unit tests - memory leaks? #151).
On macOS the function reads the header of the last spectrum prior to reading the data, for all other systems this is skipped (as it does not seem to be required).

- Update torture script to evaluate the fastLoad option (not reading header prior to read data) and the removal of the additional gc() call in spectrapply. - Tune the functions called by spectrapply,OnDiskMSnExp. - Automatically disable fastLoad on macOS.

jorainer · 2017-07-25T09:02:20Z

torture test results for macOS:
Each run with fastLoad = TRUE failed with error message:

Error in object@backend$getPeakList(x) : 
  [MSData::Spectrum::getMZIntensityPairs()] Sizes do not match.

With fastLoad = FALSE:

2016-11/NoSN: 2x OK.
2017_02: 2x OK.
nalden01: 1x OK.
2016_06: 2x OK.

So, for macOS we definitely have to use fastLoad = FALSE. Apart from that all seems to be fine.

jorainer · 2017-07-26T13:23:24Z

torture test results for Linux:
The tests were run with fastLoad = TRUE:

2016/NoSN: 2x OK.
2017_02: 2x OK.
nalden01: 2x OK.
2016_06: 2x OK.

For Linux there seems to be no need to call mzR::header before mzR::peaks which can speed up things considerably.

jorainer · 2017-07-29T17:07:10Z

Finally, torture test results for Windows:
Tests run with fastLoad = TRUE:

2016/NoSN: 2x OK.
2017_02: 2x OK.
nalden01: 2x OK.
2016_06: 2x OK.

Looks like also on Windows fastLoad = TRUE works nicely.

o Ensure that header information is read too if spectra data is loaded for OnDiskMSnExp objects. From: jotsetung <johannes.rainer@gmail.com> git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@124451 bc3139a8-67e5-0310-9ffc-ced21a209358

* master: update news Fix issue #170 Add spectrapply method and backend option Fix unit test error due to recent changes Add bpi method (issue #168) set filename only when input is a character Update readMSnSet2 to save filename Cite Lazar 2016 in vignette imputation section add imputatation paper to bib update news and description fix typo in impute man page new github devel version From: Laurent <lg390@cam.ac.uk> git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@124452 bc3139a8-67e5-0310-9ffc-ced21a209358

wmoldham · 2018-10-16T18:23:45Z

I just encountered the following error when using either chromatogram() or mz() functions:

Error: BiocParallel errors
  element index: 26
  first error: [MSData::Spectrum::getMZIntensityPairs()] Sizes do not match.

I am analyzing .mzML files generated by msconvert of Thermo .raw files on a Windows 10 device and analyzing with R 3.5.1 running in Rstudio 1.1.456 on a Mac. Of course, I don't get the error if I run readMSData() using mode = "inMemory". I read the above thread in detail and was wondering how to apply the solution?

Thanks for your help and apologies for key missing details; this is my first post in such a forum.

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2   IPO_1.6.0            CAMERA_1.36.0        rsm_2.10            
 [5] xcms_3.2.0           MSnbase_2.6.4        ProtGenerics_1.12.0  mzR_2.14.0          
 [9] Rcpp_0.12.19         BiocParallel_1.14.2  Biobase_2.40.0       BiocGenerics_0.26.0 
[13] BiocInstaller_1.30.0

loaded via a namespace (and not attached):
 [1] vsn_3.48.1             splines_3.5.1          foreach_1.4.4          Formula_1.2-3         
 [5] assertthat_0.2.0       affy_1.58.0            stats4_3.5.1           latticeExtra_0.6-28   
 [9] RBGL_1.56.0            yaml_2.2.0             impute_1.54.0          pillar_1.3.0          
[13] backports_1.1.2        lattice_0.20-35        glue_1.3.0             limma_3.36.5          
[17] digest_0.6.18          checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6       
[21] preprocessCore_1.42.0  Matrix_1.2-14          plyr_1.8.4             MALDIquant_1.18       
[25] XML_3.98-1.16          pkgconfig_2.0.2        zlibbioc_1.26.0        purrr_0.2.5           
[29] scales_1.0.0           RANN_2.6               affyio_1.50.0          tibble_1.4.2          
[33] htmlTable_1.12         IRanges_2.14.12        ggplot2_3.0.0          nnet_7.3-12           
[37] lazyeval_0.2.1         MassSpecWavelet_1.46.0 survival_2.42-6        magrittr_1.5          
[41] crayon_1.3.4           doParallel_1.0.14      MASS_7.3-51            foreign_0.8-71        
[45] graph_1.58.2           data.table_1.11.8      tools_3.5.1            stringr_1.3.1         
[49] S4Vectors_0.18.3       munsell_0.5.0          cluster_2.0.7-1        bindrcpp_0.2.2        
[53] pcaMethods_1.72.0      compiler_3.5.1         mzID_1.18.0            rlang_0.2.2           
[57] grid_3.5.1             iterators_1.0.10       rstudioapi_0.8         htmlwidgets_1.3       
[61] igraph_1.2.2           base64enc_0.1-3        gtable_0.2.0           codetools_0.2-15      
[65] multtest_2.36.0        R6_2.3.0               gridExtra_2.3          knitr_1.20            
[69] dplyr_0.7.7            bindr_0.1.1            Hmisc_4.1-1            stringi_1.2.4         
[73] rpart_4.1-13           acepack_1.4.1          tidyselect_0.2.5

lgatto · 2018-10-16T20:54:15Z

Thank you for the report @wmoldham - Do you get the error on OSX and Windows, or only OSX?

wmoldham · 2018-10-16T20:55:48Z

I have only attempted this on OSX, I don't have easy access to a Windows machine (!), I can try to find one to reproduce there.

jorainer · 2018-10-17T05:29:28Z

I had the same error recently on a set of files too (on OSX). To me this happened randomly, i.e. if I called the same function a second time I did not get the error again. That made me think it might be related to garbage collection. Note also that this error is thrown by the proteowizard routines that are used in mzR for data import.

lgatto · 2018-10-17T05:56:32Z

@jotsetung - is the fastLoad = TRUE parameter still available? If so, @wmoldham could set it to FALSE (at the cost of slowing down access) if the error persists.

jorainer · 2018-10-17T06:25:48Z

On MacOS it should be always FALSE, but you can check its value with isMSnbaseFastLoad() @wmoldham .

wmoldham · 2018-10-18T13:13:51Z

Apologies for the delayed response. After restarting Rstudio, I spent yesterday working with the data, including repeating the processing steps that yielded the error previously, and I did not encounter this error again. No problems using the readMSData() using mode "onDisk". I can confirm that isMSnbaseFastLoad() == FALSE. I will get back in touch if I can find a way to reproduce the error! Thanks for your attention.

lauzikaite · 2018-10-31T14:14:45Z

Hi all,

I've been experiencing the same issue on and off using function findChromPeaks on OnDiskMSnExp object.

Error: BiocParallel errors
  element index: 5, 6, 7, 8, 9, 10, ...
  first error: [MSData::Spectrum::getMZIntensityPairs()] Sizes do not match.

I am running this on macOS, with fastLoad = FALSE. I am struggling to reproduce this error, as it keeps coming up with different files, and if I called the same function a second time on the same file, error is not produced.

< sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] massFlowR_1.0       ggplot2_3.0.0       bindrcpp_0.2.2      xcms_3.2.0          MSnbase_2.6.2       ProtGenerics_1.12.0 mzR_2.14.0         
 [8] Rcpp_0.12.19        BiocParallel_1.14.2 Biobase_2.40.0      BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] viridis_0.5.1          vsn_3.48.1             tidyr_0.8.1            viridisLite_0.3.0      splines_3.5.1          foreach_1.4.4         
 [7] assertthat_0.2.0       affy_1.58.0            stats4_3.5.1           yaml_2.2.0             impute_1.54.0          pillar_1.3.0          
[13] lattice_0.20-35        glue_1.3.0             limma_3.36.2           digest_0.6.15          RColorBrewer_1.1-2     colorspace_1.3-2      
[19] preprocessCore_1.42.0  Matrix_1.2-14          plyr_1.8.4             MALDIquant_1.18        XML_3.98-1.16          pkgconfig_2.0.1       
[25] devtools_1.13.6        zlibbioc_1.26.0        purrr_0.2.5            scales_1.0.0           RANN_2.6               affyio_1.50.0         
[31] tibble_1.4.2           IRanges_2.14.10        withr_2.1.2            lazyeval_0.2.1         MassSpecWavelet_1.46.0 survival_2.42-6       
[37] magrittr_1.5           crayon_1.3.4           memoise_1.1.0          doParallel_1.0.11      MASS_7.3-50            xml2_1.2.0            
[43] BiocInstaller_1.30.0   tools_3.5.1            stringr_1.3.1          S4Vectors_0.18.3       munsell_0.5.0          pcaMethods_1.72.0     
[49] compiler_3.5.1         mzID_1.18.0            rlang_0.2.2            grid_3.5.1             iterators_1.0.10       rstudioapi_0.7        
[55] igraph_1.2.2           labeling_0.3           testthat_2.0.0         gtable_0.2.0           codetools_0.2-15       multtest_2.36.0       
[61] roxygen2_6.1.0         R6_2.2.2               gridExtra_2.3          dplyr_0.7.6            bindr_0.1.1            commonmark_1.5        
[67] stringi_1.2.4          tidyselect_0.2.4       faahKO_1.20.0

wmoldham · 2018-10-31T14:22:58Z

@lauzikaite, I have had much better stability utilizing the doParallel package described in the vignettes linked below. I don't think it completely eliminates the error, but the frequency is dramatically decreased and no longer interferes with the analysis. Hope it helps you too.

Metabolomics data pre-processing

LCMS data preprocessing and analysis with xcms

jorainer · 2018-11-05T06:40:02Z

@lauzikaite yes, I am aware of this and it keeps happening to me too (macOS). Problem is that I have no idea how we could fix the error. To me it seems to be related to some garbage collection process (in R?) that kicks in randomly.

@wmoldham thanks for your input! I also had the impression that with doParallel it works better - but was not sure if it wasn't pure imagination.

lauzikaite · 2018-11-05T13:53:43Z

@wmoldham, thank you for the suggestion. I can confirm that use of doParallel cluster reduces the frequency of this issue, as well as the use of BiocParallel::DoparParam() backend for BiocParallel::bplapply function. Neither completely eradicate it however.

@jotsetung, thank you. I just wanted to inquire whether I've missed something in my setup to avoid this. So far, the best "fix" for me has been the use of a while loop together with try for the findChromPeaks implementation on multiple files.

trainorp · 2019-01-26T02:08:31Z

I have also noticed this problem on mac...
But not on linux (openSuse). So strange

jorainer · 2019-01-26T17:59:24Z

For me (on mac) it seems to be OK now. Regarding linux and Windows, I never got this error on my linux and windows test environments. This seems indeed to happen (randomly) on mac - and absolutely no idea why (the error is thrown by the proteowizard C++ code that is used by mzR).

YonghuiDong · 2019-02-01T10:47:32Z

I encountered the same problem on Mac.

jorainer added a commit that referenced this issue Nov 21, 2016

Fix issue #170

5b2f9f4

o Ensure that header information is read too if spectra data is loaded for OnDiskMSnExp objects.

jorainer closed this as completed Nov 24, 2016

jorainer mentioned this issue May 30, 2017

unit test runtime #216

Closed

jorainer added a commit that referenced this issue May 30, 2017

Remove code reading header in spectrapply

a69c174

- Remove the code inserted to fix issue #170; fixes issue #216. - Add tests to the torture.R script checking if the #170 error still happens.

jorainer reopened this Jul 17, 2017

jorainer mentioned this issue Jul 17, 2017

Modify/improve the RcppPwiz getPeakList method sneumann/mzR#112

Closed

jorainer added a commit that referenced this issue Jul 21, 2017

Add some test functions to evaluate avoiding issue #170

67c2664

jorainer mentioned this issue Jul 29, 2017

Faster raw data access for Windows and Linux #234

Merged

jorainer closed this as completed Aug 1, 2017

jorainer mentioned this issue Aug 16, 2021

Spectra on macOS: Sizes of mz and intensity arrays don't match. rformassspectrometry/Spectra#212

Closed

[MSData::Spectrum::getMZIntensityPairs()] Sizes do not match. #170

[MSData::Spectrum::getMZIntensityPairs()] Sizes do not match. #170

Comments

jorainer commented Nov 17, 2016

jorainer commented Nov 17, 2016

lgatto commented Nov 17, 2016

jorainer commented Nov 17, 2016

jorainer commented Nov 18, 2016 • edited by lgatto Loading

lgatto commented Nov 18, 2016

jorainer commented Nov 18, 2016

jorainer commented Nov 18, 2016

jorainer commented Nov 20, 2016

lgatto commented Nov 20, 2016 • edited Loading

jorainer commented Nov 20, 2016

jorainer commented Nov 21, 2016

jorainer commented Nov 24, 2016

lgatto commented Nov 24, 2016

lgatto commented Nov 24, 2016

jorainer commented Jul 17, 2017 • edited Loading

jorainer commented Jul 17, 2017

jorainer commented Jul 19, 2017

lgatto commented Jul 19, 2017

jorainer commented Jul 20, 2017

jorainer commented Jul 21, 2017

jorainer commented Jul 21, 2017

jorainer commented Jul 21, 2017 • edited Loading

jorainer commented Jul 24, 2017

jorainer commented Jul 24, 2017

jorainer commented Jul 24, 2017

jorainer commented Jul 25, 2017

jorainer commented Jul 26, 2017 • edited Loading

jorainer commented Jul 29, 2017

wmoldham commented Oct 16, 2018

lgatto commented Oct 16, 2018

wmoldham commented Oct 16, 2018

jorainer commented Oct 17, 2018

lgatto commented Oct 17, 2018

jorainer commented Oct 17, 2018

wmoldham commented Oct 18, 2018

lauzikaite commented Oct 31, 2018

wmoldham commented Oct 31, 2018

jorainer commented Nov 5, 2018

lauzikaite commented Nov 5, 2018

trainorp commented Jan 26, 2019

jorainer commented Jan 26, 2019

YonghuiDong commented Feb 1, 2019

jorainer commented Nov 18, 2016 •

edited by lgatto

Loading

lgatto commented Nov 20, 2016 •

edited

Loading

jorainer commented Jul 17, 2017 •

edited

Loading

jorainer commented Jul 21, 2017 •

edited

Loading

jorainer commented Jul 26, 2017 •

edited

Loading