-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to process more than one mzML file at a time #356
Comments
Thank you for your message, @shefalilathwal Here's what I get when reproducing your use case with 3 mzML files: > suppressPackageStartupMessages(library("MSnbase"))
> fls <- dir("~/Data2/Thermo_HELA_PRT/", full.names = TRUE, pattern = "mzML")
> fls
[1] "/home/lg390/Data2/Thermo_HELA_PRT//Thermo_Hela_PRTC_1.mzML"
[2] "/home/lg390/Data2/Thermo_HELA_PRT//Thermo_Hela_PRTC_2.mzML"
[3] "/home/lg390/Data2/Thermo_HELA_PRT//Thermo_Hela_PRTC_3.mzML"
> x <- readMSData(fls, mode = "onDisk", msLevel = 1)
> system.time(xrt <- rtime(x))
user system elapsed
0 0 0
> head(xrt)
F1.S00001 F1.S00002 F1.S00003 F1.S00004 F1.S00005 F1.S00006
0.3287012 0.7814142 1.0613962 1.3288742 1.5962302 1.8637102
> system.time(xmz <- mz(x))
user system elapsed
4.314 2.900 132.740
> head(xmz)
head(xmz)
$F1.S00001
[1] 396.0173 396.0191 396.0210 396.0229 400.9157 400.9176 400.9195 400.9214
[9] 400.9233 400.9252 400.9271 400.9290 400.9309 400.9329 400.9348 400.9367
[17] 400.9386 402.1674 402.1694 402.1713 402.1732 402.1751 402.1770 402.1789
[25] 402.1808 402.1827 402.1847 402.1866 402.1885 402.1904 410.8786 410.8806
[33] 410.8826 410.8846 410.8865 410.8885 410.8905 410.8925 410.8944 410.8964
[41] 410.8984 410.9004 410.9023 413.2510 413.2530 413.2550 413.2570 413.2590
[49] 413.2610 413.2630 413.2650 413.2670 413.2690 413.2710 413.2729 413.2749
[57] 413.2769 413.2789 413.2809 415.0198 415.0218 415.0238 415.0258 415.0278
[65] 415.0298 415.0318 415.0338 415.0358 415.0378 415.0398 415.0418 415.0438
[73] 415.0458 415.0478 415.0499 415.0519 416.0221 416.0241 416.0261 416.0281
[81] 416.0301 416.0321 416.0342 416.0362 416.0382 416.0402 416.0422 416.0442
[89] 416.0462 416.0482 416.0503 417.0704 417.0725 417.0745 417.0765 417.0785
[97] 417.0805 417.0826 417.0846 417.0866
[ reached getOption("max.print") -- omitted 24875 entries ]
$F1.S00002
[1] 396.0197 396.0215 396.0234 396.0253 401.2001 401.2020 401.2039 401.2058
[9] 401.2077 401.2096 401.2115 401.2134 401.2153 401.2172 401.2191 401.2210
[17] 401.2229 401.2248 401.2267 401.8067 401.8087 401.8106 401.8125 401.8144
[25] 401.8163 401.8182 401.8201 401.8220 401.8237 401.8257 401.8276 401.8295
[33] 402.1640 402.1659 402.1678 402.1697 402.1716 402.1735 402.1754 402.1773
[41] 402.1793 402.1812 402.1831 402.1850 402.1869 402.1890 402.1909 402.1928
[49] 402.1948 402.6638 402.6657 402.6676 402.6695 402.6715 402.6734 402.6753
[57] 402.6772 402.6791 402.6810 402.6830 402.6849 402.6868 403.1663 403.1682
[65] 403.1701 403.1720 403.1740 403.1759 403.1778 403.1797 403.1816 403.1836
[73] 403.1855 403.1874 403.1893 403.1912 403.1932 404.1452 404.1471 404.1491
[81] 404.1510 404.1529 404.1548 404.1568 404.1587 404.1606 404.1625 404.1645
[89] 404.1664 404.1683 404.1702 404.1722 404.1741 404.1760 404.1780 404.1799
[97] 404.1818 404.1837 404.1857 404.1876
[ reached getOption("max.print") -- omitted 15946 entries ]
$F1.S00003
[1] 396.0217 396.0235 396.0254 396.0273 401.2021 401.2040 401.2059 401.2078
[9] 401.2097 401.2116 401.2135 401.2154 401.2173 401.2192 401.2211 401.2230
[17] 401.2249 401.2268 402.1643 402.1662 402.1681 402.1700 402.1719 402.1738
[25] 402.1757 402.1777 402.1796 402.1815 402.1834 402.1853 402.1872 402.1891
[33] 402.1910 402.1930 402.9515 402.9534 402.9553 402.9572 402.9591 402.9611
[41] 402.9630 402.9649 402.9668 402.9687 402.9707 402.9726 402.9745 403.1683
[49] 403.1702 403.1722 403.1741 403.1760 403.1779 403.1798 403.1818 403.1837
[57] 403.1856 403.1875 403.1894 403.1914 403.1933 404.1453 404.1472 404.1492
[65] 404.1511 404.1530 404.1549 404.1569 404.1588 404.1607 404.1626 404.1646
[73] 404.1665 404.1684 404.1703 404.1723 405.1490 405.1510 405.1529 405.1549
[81] 405.1568 405.1587 405.1607 405.1626 405.1645 405.1667 405.1686 405.1706
[89] 405.1725 413.2499 413.2519 413.2539 413.2559 413.2579 413.2598 413.2618
[97] 413.2638 413.2658 413.2678 413.2698
[ reached getOption("max.print") -- omitted 10702 entries ]
$F1.S00004
[1] 396.0221 396.0239 396.0258 396.0277 401.2025 401.2044 401.2063 401.2082
[9] 401.2101 401.2120 401.2139 401.2158 401.2177 401.2196 401.2215 401.2234
[17] 401.2253 402.1647 402.1666 402.1685 402.1704 402.1723 402.1742 402.1761
[25] 402.1780 402.1800 402.1819 402.1838 402.1857 402.1876 402.1895 402.1914
[33] 402.1934 403.1687 403.1706 403.1726 403.1745 403.1764 403.1783 403.1802
[41] 403.1822 403.1841 403.1860 403.1879 403.1898 403.1917 403.1937 413.1664
[49] 413.1684 413.1704 413.1724 413.1744 413.1764 413.1784 413.1803 413.1823
[57] 413.1841 413.1861 413.1881 413.1901 413.2499 413.2519 413.2538 413.2558
[65] 413.2578 413.2598 413.2618 413.2638 413.2658 413.2678 413.2698 413.2718
[73] 413.2738 413.2760 413.2780 413.2799 413.2819 414.2559 414.2579 414.2599
[81] 414.2619 414.2639 414.2659 414.2679 414.2699 414.2719 414.2739 414.2759
[89] 414.2779 414.2799 415.0208 415.0228 415.0248 415.0268 415.0288 415.0308
[97] 415.0328 415.0348 415.0368 415.0388
[ reached getOption("max.print") -- omitted 10504 entries ]
$F1.S00005
[1] 396.0212 396.0231 396.0249 396.0268 401.2035 401.2054 401.2073 401.2092
[9] 401.2111 401.2130 401.2149 401.2168 401.2187 401.2207 401.2226 401.2245
[17] 401.2264 402.1638 402.1657 402.1677 402.1696 402.1715 402.1734 402.1753
[25] 402.1772 402.1791 402.1810 402.1830 402.1849 402.1868 402.1886 402.1906
[33] 402.1925 402.1944 403.1678 403.1697 403.1717 403.1736 403.1755 403.1774
[41] 403.1793 403.1813 403.1832 403.1851 403.1870 403.1889 403.1909 404.1448
[49] 404.1467 404.1487 404.1506 404.1525 404.1544 404.1564 404.1583 404.1602
[57] 404.1622 404.1641 404.1660 404.1679 404.1699 404.1718 404.1737 404.1756
[65] 404.1776 404.1795 404.1814 404.1833 404.1853 404.1872 404.1891 408.2977
[73] 408.2997 408.3017 408.3036 408.3056 408.3075 408.3095 408.3114 408.3134
[81] 408.3154 408.3173 408.3193 408.3212 408.3232 411.1594 411.1613 411.1633
[89] 411.1653 411.1673 411.1692 411.1712 411.1732 411.1752 411.1770 411.1789
[97] 411.1809 411.1829 413.2490 413.2510
[ reached getOption("max.print") -- omitted 12053 entries ]
$F1.S00006
[1] 396.0219 396.0237 396.0256 396.0275 401.2023 401.2042 401.2061 401.2080
[9] 401.2099 401.2118 401.2137 401.2156 401.2175 401.2194 401.2213 401.2232
[17] 401.2251 402.1683 402.1702 402.1721 402.1740 402.1759 402.1778 402.1798
[25] 402.1817 402.1836 402.1855 402.1874 402.1893 402.1912 402.1931 403.1685
[33] 403.1704 403.1723 403.1743 403.1762 403.1781 403.1800 403.1819 403.1839
[41] 403.1858 403.1877 403.1896 403.1915 403.7700 403.7720 403.7739 403.7758
[49] 403.7777 403.7797 403.7816 403.7835 403.7854 403.7874 403.7893 403.7912
[57] 403.7931 404.1474 404.1493 404.1513 404.1532 404.1551 404.1571 404.1590
[65] 404.1609 404.1628 404.1648 404.1667 404.1686 404.1705 405.2885 405.2904
[73] 405.2924 405.2943 405.2963 405.2982 405.3001 405.3021 405.3040 405.3059
[81] 405.3079 405.3098 405.3117 408.2984 408.3004 408.3023 408.3043 408.3063
[89] 408.3082 408.3102 408.3121 408.3141 408.3160 408.3180 408.3200 408.3219
[97] 408.3239 411.1601 411.1620 411.1640
[ reached getOption("max.print") -- omitted 10751 entries ] with the following setup
Indeed, accessing The files I used are 1.2G each. The timings above are considerably shorter for smaller files. What sizes are your files? Do you access your data on a remote disk? Also tagging @jotsetung, who regularly analyses tens or hundreds of files (for RT alignment and feature grouping). Jo, what's the size of the files you analyse? |
@lgatto Thank you for responding so quickly to my message. My files are approx. 500MB each and they are saved on my local disk. The size of the mz list that I get for 2 files is around 670MB (see attached screenshot from R environment variable). Does that sound reasonable to you or is something off here? The files I am using are for polarity switching DDA in a ThermoFisher QExactive. I used mconvert to convert them to .mzML format and filtered them by mslevel 1 and single polarity before importing the data with MSnbase. |
Could be that this is not at all related to the files, but the parallel processing setup. On Windows, R uses sock-based parallel processing (sometimes also on mac) and the main worker process has to start a new To avoid these deadlocks I usually initiate the parallel processing setup at the very beginning (after loading the libraries): library(MSnbase)
library(doParallel)
registerDoParallel(3) # define number of parallel processes to be used
register(DoparParam(), default = TRUE)
## some code |
The files aren't that big (I fixed a typo in my earlier post - mine were 1.2 G, not 12G). @shefalilathwal, could you try to disable parallel processing, as suggested by @jotsetung. Also, you could share 2 files and I would try on my computer. |
@lgatto I tried what @jotsetung suggested and added the parallel processing setup at the beginning of the my R script and was able to run the files! So, it must have been the connection as @jotsetung suggested. Thank you so much for both of your help! You can close the issue :) |
Excellent, thank you for reporting back. |
Eventually something that should be added to the vignette? |
Indeed. I won't have time today, but happy to (try to) do tomorrow. We could add a note in the Speed and memory requirements sub-section at the very beginning, in the Introduction. |
Done. |
I am using MSnbase to read the MS data from multiple .mzML files. While the data is being read, I am unable to extract the mz and intensity values when I have an MSnobject of more than one file.
For example, in the following code, I get the output for rtime(raw_data), but the mz(raw_data) does not give any output and the code keeps running for hours with no errors or warnings. I just have to forcibly stop the run without any results.
However, if I use only one file at a time, the code runs.
In this case, I get the output in about 15 seconds as shown below. What is going on here and is there a way to get around it? It is desirable to be able to run multiple files together for downstream analysis (for RT alignment and peak grouping across samples)
The sessionInfo() for R-
The text was updated successfully, but these errors were encountered: