Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify/improve the RcppPwiz getPeakList method #112

Closed
jorainer opened this issue Jul 17, 2017 · 7 comments
Closed

Modify/improve the RcppPwiz getPeakList method #112

jorainer opened this issue Jul 17, 2017 · 7 comments

Comments

@jorainer
Copy link
Collaborator

Background on this:

A call to mzR::header prior to mzR::peaks fixed this, but is performance-wise not an ideal solution.

I'll thus dig deeper into the code to see if I can fix that within mzR.

@jorainer
Copy link
Collaborator Author

First observation: calling mzR::peaks does use an sapply loop in R to extract each spectrum individually from an mzML/mzXML file. Improvement on this could be to move that loop to the C++ code.

@jorainer
Copy link
Collaborator Author

Reading the mz and intensity arrays separately from the (pwiz) msdata::spectrum: sometimes the mz array or the intensity array is of 0 length. While it is not clear why that's the case, descriptions within pwiz::MSData.hpp say:

    /// get a copy of the seed spectrum, optionally with its binary data populated
    /// this is useful for formats like mzML that can delay loading of binary data

The "delay loading of binary data" is puzzling - but might be the reason for the problem?

@jorainer
Copy link
Collaborator Author

One update so far:

  • calling s = slp->spectrum(current_scan -1, false); followed by s = slp->spectrum(s, true); seems to reduce the number of random errors - but does not (yet) remove them completely. This approach might be somehow related to the currently working solution, for MSnbase issue 170, i.e. to read first the header and then the peaks.
  • The option described above, i.e. to extract the mz and intensity arrays separately and to compare them to the expected size is very helpful to avoid the [MSData::Spectrum::getMZIntensityPairs()] Sizes do not match. error.
  • Despite the two points above that significantly reduce the frequency of the random errors, there are from time to time errors.

@jorainer
Copy link
Collaborator Author

Note that peaks works without any problems on linux. On Mac however, no matter if using a manually compiled version of R or the official R, these errors and problems occur. Eventually related to GCC vs clang?

@jorainer
Copy link
Collaborator Author

The best solution to the fix seems to be to read in the C++ function called by mzR::peaks the header information of the last scan before loading the actual spectra data (i.e. the mz and intensity values).
This solution is however not ideal since a) it mixes the concept of reading header and peaks separately (as is no longer does this) and b) it has a very bad performance on mzXML files (on mzML files it is very fast). The performance drops even more if data is read from compressed files (e.g. mzXML.gz).

The code and documentation of the tests are available in https://github.com/jotsetung/mzR/tree/peaks_issue170 - but I don't think it is worth merging any of these changes into mzR, especially since the issue 170 in MSnbase seems to be mac-specific.

@lgatto
Copy link
Collaborator

lgatto commented Jul 19, 2017

Thank you very much for all your efforts with this!

@jorainer
Copy link
Collaborator Author

In the end I really believe the issue I came across is mac specific (and eventually also related only to ABI converted mzML files...). I think we might even switch back (in MSnbase) to use simply mzR::peaks.
I could squeeze eventually some more milliseconds out of it to improve the performance slightly though.

@jorainer jorainer closed this as completed Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants