Skip to content

Commit

Permalink
refactor: update backendInitialize,MsBackendMgf
Browse files Browse the repository at this point in the history
- Add parameter `nlines` to `backendInitialize`.
- Update parallel processing in `backendInitialize`: perform per-file parallel
  processing if `length(files) > 1` or otherwise pass `BPPARAM` to `readMgf`.
  • Loading branch information
jorainer committed Jan 9, 2024
1 parent d6096fd commit 8d1e215
Show file tree
Hide file tree
Showing 6 changed files with 70 additions and 37 deletions.
1 change: 0 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,3 @@ Collate:
'hidden_aliases.R'
'MsBackendMgf.R'
'functions-mgf.R'
'old_code.R'
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ exportMethods(export)
exportMethods(spectraVariableMapping)
importClassesFrom(Spectra,MsBackendDataFrame)
importFrom(BiocParallel,SerialParam)
importFrom(BiocParallel,bpparam)
importFrom(IRanges,NumericList)
importFrom(MsCoreUtils,rbindFill)
importFrom(S4Vectors,DataFrame)
Expand Down
39 changes: 27 additions & 12 deletions R/MsBackendMgf.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ NULL
#' for the expected format and examples below or description above for
#' details.
#'
#' @param nlines for `backendInitialize`: `integer(1)` defining the number of
#' lines that should be imported and processed from the MGF file(s).
#' By default (`nlines = -1L`) the full file is imported and processed at
#' once. If set to a positive integer, the data is imported and processed
#' *chunk-wise* using [readMgfSplit()].
#'
#' @param exportTitle `logical(1)` whether the *TITLE* field should be included
#' in the exported MGF file. If `TRUE` (the default) a `spectraVariable`
#' called `"TITLE"` will be used, if no such variable is present either the
Expand All @@ -61,8 +67,12 @@ NULL
#' should be exported.
#'
#' @param BPPARAM Parameter object defining the parallel processing
#' setup to import data in parallel. Defaults to `BPPARAM =
#' bpparam()`. See [bpparam()] for more information.
#' setup. If parallel processing is enabled (with `BPPARAM` different than
#' `SerialParam()`, the default) and length of `files` is larger than one,
#' import is performed in parallel on a per-file basis. If data is to be
#' imported from a single file (i.e., length of `files` is one), parsing
#' of the imported file is performed in parallel. See also [SerialParam()]
#' for information on available parallel processing setup options.
#'
#' @param ... Currently ignored.
#'
Expand All @@ -84,10 +94,6 @@ NULL
#' fls <- dir(system.file("extdata", package = "MsBackendMgf"),
#' full.names = TRUE, pattern = "mgf$")
#'
#' ## Parallel processing setup: disabling parallel processing by registering
#' ## serial processing. See ?bbparam for details and other options
#' register(SerialParam())
#'
#' ## Create an MsBackendMgf backend and import data from test mgf files.
#' be <- backendInitialize(MsBackendMgf(), fls)
#' be
Expand Down Expand Up @@ -146,7 +152,7 @@ setClass("MsBackendMgf",

#' @importMethodsFrom Spectra backendInitialize spectraData<- $<- $
#'
#' @importFrom BiocParallel bpparam
#' @importFrom BiocParallel SerialParam
#'
#' @importMethodsFrom BiocParallel bplapply
#'
Expand All @@ -157,13 +163,16 @@ setClass("MsBackendMgf",
#' @rdname MsBackendMgf
setMethod("backendInitialize", signature = "MsBackendMgf",
function(object, files, mapping = spectraVariableMapping(object),
..., BPPARAM = bpparam()) {
nlines = -1L, ..., BPPARAM = SerialParam()) {
if (missing(files) || !length(files))
stop("Parameter 'files' is mandatory for ", class(object))
if (!is.character(files))
stop("Parameter 'files' is expected to be a character vector",
" with the files names from where data should be",
" imported")
if (!is.numeric(nlines))
stop("'nlines' needs to be an integer")
nlines <- as.integer(nlines)
files <- normalizePath(files)
if (any(!file.exists(files)))
stop("file(s) ",
Expand All @@ -172,11 +181,17 @@ setMethod("backendInitialize", signature = "MsBackendMgf",
## Import data and rbind.
message("Start data import from ", length(files), " files ... ",
appendLF = FALSE)
res <- bplapply(files, FUN = readMgf,
mapping = mapping,
BPPARAM = BPPARAM)
if (nlines > 0)
FUN <- readMgfSplit
else FUN <- readMgf
if (length(files) > 1) {
res <- bplapply(files, FUN = FUN, mapping = mapping,
nlines = nlines, BPPARAM = BPPARAM)
res <- do.call(rbindFill, res)
} else
res <- FUN(files, mapping = mapping, nlines = nlines,
BPPARAM = BPPARAM)
message("done")
res <- do.call(rbindFill, res)
spectraData(object) <- res
object$dataStorage <- "<memory>"
object$centroided <- TRUE
Expand Down
20 changes: 14 additions & 6 deletions man/MsBackendMgf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 10 additions & 7 deletions tests/testthat/test_MsBackendMgf.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,12 @@ test_that("backendInitialize,MsBackendMgf works", {
expect_identical(res1$msLevel, rep(2L, n1))
expect_identical(lengths(res1$mz), c(14L, 21L, 14L))

res1_b <- backendInitialize(be, fls[1], nlines = 30)
expect_equal(res1, res1_b)

res2 <- backendInitialize(be, fls[2])
n2 <- length(res2) ## 4
expect_equal(n2, 4)

## Import multiple files.
res_all <- backendInitialize(be, fls)
Expand All @@ -26,13 +30,12 @@ test_that("backendInitialize,MsBackendMgf works", {
rep(normalizePath(fls[2]), n2)))
expect_true(is.integer(res_all@spectraData$msLevel))

## TODO: Import with failing file.
## TODO: Import with failing file and nonStop = TRUE

## errors
expect_error(backendInitialize(be), "'files' is mandatory")
expect_error(backendInitialize(be, 4), "expected to be a character")
expect_error(backendInitialize(be, "a"), "a not found")
expect_error(backendInitialize(be, fls[1], nlines = "a"), "integer")
expect_error(backendInitialize(be, "a"), "not found")
})

test_that("spectraVariableMapping works", {
Expand All @@ -44,22 +47,22 @@ test_that("spectraVariableMapping works", {
})

test_that("mixed MS level import works", {

fls <- dir(system.file("extdata", package = "MsBackendMgf"),
full.names = TRUE, pattern = "mgf$")[4]

custom_mapping <- c(rtime = "RTINSECONDS",
acquisitionNum = "SCANS",
precursorMz = "PEPMASS",
precursorIntensity = "PEPMASSINT",
precursorCharge = "CHARGE",
msLevel = "MSLEVEL")

res <- Spectra(fls,
source = MsBackendMgf(),
backend = MsBackendDataFrame(),
mapping = custom_mapping)

expect_identical(length(res), 2L)
expect_identical(res$msLevel, c(1L, 2L))
})
Expand Down
29 changes: 19 additions & 10 deletions vignettes/MsBackendMgf.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ library(BiocStyle)
The `r Biocpkg("Spectra")` package provides a central infrastructure for the
handling of Mass Spectrometry (MS) data. The package supports
interchangeable use of different *backends* to import MS data from a
variety of sources (such as mzML files). The `MsBackendMgf` package
allows the import of MS/MS data from mgf ([Mascot Generic
variety of sources (such as mzML files). The `r Biocpkg("MsBackendMgf")`
package allows the import of MS/MS data from mgf ([Mascot Generic
Format](http://www.matrixscience.com/help/data_file_help.html))
files. This vignette illustrates the usage of the `MsBackendMgf`
files. This vignette illustrates the usage of the *MsBackendMgf*
package.

# Installation
Expand Down Expand Up @@ -69,15 +69,9 @@ fls
MS data can be accessed and analyzed through `Spectra` objects. Below
we create a `Spectra` with the data from these mgf files. To this end
we provide the file names and specify to use a `MsBackendMgf()`
backend as *source* to enable data import. Note that below we also disable
parallel processing by specifically *registering* the serial processing as the
default. See `?bpparam` for more details on parallel processing options with the
`r Biocpkg("BiocParallel")` package.
backend as *source* to enable data import.

```{r import}
library(BiocParallel)
register(SerialParam())
sps <- Spectra(fls, source = MsBackendMgf())
```

Expand Down Expand Up @@ -188,6 +182,21 @@ export(sps_ex, backend = MsBackendMgf(), file = fl, exportTitle = FALSE)
readLines(fl)[1:12]
```

# Parallel processing

The *MsBackendMgf* package supports parallel processing for data
import. Parallel processing can be enabled by providing the parallel processing
setup to the `backendInitialize` function (or the `readMgf` function) with the
`BPPARAM` parameter. By default (with `BPPARAM = SerialParam()`) parallel
processing is disabled. If enabled, and data import is performed on a single
file, the extraction of spectra information on the imported MGF file is
performed in parallel. If data is to be imported from multiple files, the import
is performed in parallel on a per-file basis (i.e. in parallel from the
different files). Generally, the performance gain through parallel processing is
only moderate and it is only suggested if a large number of files need to be
processed, or if the MGF file is very large (e.g. containing over 100,000
spectra).

# Session information

```{r}
Expand Down

0 comments on commit 8d1e215

Please sign in to comment.