Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue when parsing big mzml file. #9

Open
KenWifi opened this issue Dec 28, 2017 · 1 comment
Open

Memory issue when parsing big mzml file. #9

KenWifi opened this issue Dec 28, 2017 · 1 comment
Labels
wiki Informative questions with answers that might help with lib usage

Comments

@KenWifi
Copy link

KenWifi commented Dec 28, 2017

Hi,

Recent days, I tried to parsing 2GB mzml file by follow codes:
`MZMLFile mzmlFile = new MZMLFile(spectrumFiles.get(0).getAbsolutePath());

                    ScanCollectionDefault scans = new ScanCollectionDefault();

                    scans.setDefaultStorageStrategy(StorageStrategy.SOFT);

                    scans.isAutoloadSpectra(true);

                    scans.setDataSource(mzmlFile);

                    mzmlFile.setNumThreadsForParsing(threads);

                    try {
                        scans.loadData(LCMSDataSubset.MS1_WITH_SPECTRA);
                        scans.loadData(LCMSDataSubset.MS2_WITH_SPECTRA);
                    } catch (FileParsingException e) {
                        e.printStackTrace();
                        System.exit(1);
                    }`

And the memory was increasing to 4GB and ending up with memory issue. And BatMass had same problem. Do you have any experience about parsing big file?

Kai

@chhh
Copy link
Owner

chhh commented Jul 29, 2019

Didn't see the question here originally, but will still leave an answer.

You're trying to load the whole file in memory. The original file might be quite well compressed with gzip or MsNumpress, so the resulting size of the whole file in memory might be significantly larger. 1st, of course, try loading only MS1 or only MS2.
If that doesn't help there's another way, which is slower that the standard mode, but won't use much memory for any file size:

try (final MZMLFile mzml = new MZMLFile("path-to-mzml")) {
    // Create data source with auto-loading of spectra set
    IScanCollection scans = new ScanCollectionDefault(true);
    scans.setDataSource(mzml);
    // Only load the data structure (i.e. scan meta-data) without spectra.
    // Set StorageStrategy to SOFT - will allow garbage collector to reclaim spectra
    // that are dangling in memory but not being used.
    scans.loadData(LCMSDataSubset.STRUCTURE_ONLY, StorageStrategy.SOFT);

    TreeMap<Integer, IScan> index = scans.getMapNum2scan();
    for (Entry<Integer, IScan> e : index.entrySet()) {
        IScan scan = e.getValue();
        // You need to use `fetchSpectrum()`, because the spectrum might have been
        // garbage collected
        ISpectrum spectrum = scan.fetchSpectrum();
    }
}

@chhh chhh added the wiki Informative questions with answers that might help with lib usage label Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wiki Informative questions with answers that might help with lib usage
Projects
None yet
Development

No branches or pull requests

2 participants