-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new class SpectrumList #361
Conversation
- Add SpectrumList object with related unit tests and documentation.
Note: the Travis CI builds failed because of a timeout - not an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @jotsetung, thanks for the PR and your effort.
I am not sure what the aim of this new class should be. It is somewhere in the middle between a base list
of Spectrum
objects (e.g. returned by spectra
/spectrapply
) and an (OnDisk)MSnExp
.
I guess an MSnExp
object is too heavy and a list
contains not enough metadata for your approach?
Currently I don't see a real advantage of SpectrumList
. In fact it is just a as(spectra(MSnExpObject), "SimpleList")
that provides some accessors to avoid lapply
calls (e.g. lapply(list, mz)
). In my opinion this creates a lot of overhead and much more code to maintain. An idea would be to replace the environment
in MSnExp
by such a SpectrumList
. But that would be a big change.
Could you please explain a little more in detail why you want this new data structure. I read sneumann/xcms#302 but still don't understand why this could not be handled in an MSnExp
/classical list
object.
R/SpectrumList-methods.R
Outdated
#' @section Accessing spectrum attributes: | ||
#' | ||
#' These methods allow to access the attributes and values of the individual | ||
#' [Spectrum] ([Spectrum1] or [Spectrum2]) objects within the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use [Spectrum-class]
([Spectrum1-class]
, [Spectrum2-class]
) here (and everywhere else) to force roxygen2
to create \linkS4class{Spectrum}
links. Currently the manual page contains a lot of [Spectrum]
references (they are not treated/converted as links).
IMHO it is not clear from https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html#links that you notation isn't allowed but it seems not to work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint. I changed it.
R/SpectrumList-methods.R
Outdated
if (is(z, "Spectrum2")) | ||
if (length(z@precursorMz)) z@precursorMz else NA_real_ | ||
else NA_real_ | ||
}, numeric(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the nested if
call here (and in the methods below). What do you think about (I know it is essential the same):
if (is(z, "Spectrum2") && length(z@precursorMz)) z@precursorMz else NA_real_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've changed it.
R/SpectrumList-methods.R
Outdated
#' ## Get the number of peaks per spectrum. | ||
#' peaksCount(spl) | ||
setMethod("peaksCount", "SpectrumList", function(object) { | ||
vapply(object, peaksCount, integer(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you access the slots of Spectrum
objects directly in all the methods above (mz
, intensity
, rtime
, ...) and start to use the accessor methods for the slots here/below? I think we should use a consistent way for all of the slots (where possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance reasons I tend to directly access slots (when I know it's save). S4 method dispatching is quite slow. In some cases I do however use the accessor methods, especially if they have more functionality than just returning the slot value (I think the peaksCount
is such a case, that can calculate the number of peaks based on the number of mz values.
R/SpectrumList-methods.R
Outdated
#' | ||
#' @examples | ||
#' | ||
#' ## Extract the collision energy for spectrum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each spectrum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
R/SpectrumList-methods.R
Outdated
#' | ||
#' @examples | ||
#' | ||
#' ## Extract the file index for the spectrum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each spectrum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed too. Thanks.
R/SpectrumList.R
Outdated
.show_SpectrumList <- function(x, margin = "", print.classinfo = FALSE) { | ||
cat("SpectrumList with", length(x), "spectra and", length(mcols(x)), | ||
"metadata column(s):\n") | ||
if (length(x) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While if (length(x) > 0)
may be easier to understand if(length(x))
would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. Changed to if(length(x))
R/SpectrumList.R
Outdated
x, .COL2CLASS) | ||
out <- rbind(classinfo, out) | ||
} | ||
if (nrow(out) != 0L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nrow(out)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I changed that too.
R/methods-MSnSetList.R
Outdated
@@ -64,6 +64,7 @@ setMethod("[[", c("MSnSetList", "ANY", "missing"), | |||
|
|||
setMethod("split", c("MSnSet", "character"), | |||
function(x, f) { | |||
cat("split,MSnSet,character\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a "left-over" from debugging. I think we should remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. Thanks for pointing out.
R/methods-MSnSetList.R
Outdated
@@ -78,6 +79,7 @@ setMethod("split", c("MSnSet", "character"), | |||
|
|||
setMethod("split", c("MSnSet", "factor"), | |||
function(x, f) { | |||
cat("split,MSnSet,factor\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a "left-over" from debugging. I think we should remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I removed it now
Dear @sgibb , thanks for your review! I appreciate your critical feedback! So, why a new class. That's a valid question. For Can this be done with an I would not replace the environment in In any rate. Happy to discuss further. Eventually we can also get the opinion from Laurent, @lgatto? |
Another thing worth mentioning: we do also need (for the GNPS workflow) to write MGF files with additional fields (the feature ID or chrom peak ID to which a MS2 spectrum can be associated). I had planned to implement a Thinking it all over and testing some stuff I still very much believe that using an |
@jotsetung thanks for the great explanation. You are right that an
I don't think it really add much additional useless stuff. In fact it is just an more or less empty A few points I would like to see if we decide to implement
|
@sgibb thanks for the reply. Regarding maintainance and implementation of additional methods such as |
I have to say I'm not convinced (yet?) by the need for a list of spectra class. I certainly agree that a @jotsetung - what can you tell me about performance of the Also, in your use cases, I understand that the spectra in a |
Thanks @lgatto for the feedback. So summarizing, we're not going to add the object. There are no real strong arguments in favour of Regarding performance of And yes, |
No, that's not what I wanted to imply. I'm just trying to get the bigger picture and understand the wider implications. In proteomics, spectral libraries (i.e. collections of MS2 spectra) can become really big, which is why it might be worth thinking about it already now. Can you read the full GNPS in a And how would you feel about having a |
Haven's tried to read GNPS yet. But I have implemented the I'm currently testing to use MSnExp/OnDiskMSnExp as an alternative. Only, for Eventually, it helps to explain the history of the From the design I see also the |
That is an important piece of information indeed. What format is the data in, and how, in general, do you create the
Yes, that is also how I see it, hence the idea to be able to use it as a way to populate an experiment |
Currently I generate What could make sense in the long run is to replace the Just realized a potential shortcoming of the |
So for now at least, the new class isn't meant to be exposed, only used as part of your You say you create the |
For now it was implemented in Indeed, the underlying data in |
Ok, thanks. I'll do a quick code review later today. |
R/SpectrumList.R
Outdated
#' @rdname SpectrumList | ||
NULL | ||
|
||
.SpectrumList <- setClass("SpectrumList", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about defining the class in DataClasses.R
and the other functions in functions-SpectumList.R
, as is generally the convention (although I'm sure I must have broken it at some point)? This might also save us from having a collate
field in the DESCRIPTION
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course we will add it to DataClasses.R
. I simply copied the files over from CompoundDb.R
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end it depends if we agree to add the object or not. If so I would add it to DataClasses.R
(I would also call it Spectra
instead of SpectrumList
) and would address also all points raised by Sebastian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine then. Go ahead.
- Rename SpectrumList into Spectra. - Address all comments from @sgibb
With the latest commit I have reorganized the files, renamed |
Just pushed this to Bioc. |
@jotsetung - I see you added |
I tried that, but then |
ok, thanks. |
Really? I just moved spl <- Spectra(sp1, sp2, elementMetadata = DataFrame(id = c("a", "b")))
Error in DataFrame(id = c("a", "b")) :
could not find function "DataFrame" We could also export the |
I just find it strange that we export functions from other packages that we're only importing. If it helps we could indeed move |
Well, documenting doesn't really mean writing much about it. An alias and a short line to mention that once can use |
The
SpectrumList
class extendsS4Vectors::SimpleList
and allows to storeSpectrum1
andSpectrum2
objects. The advantage over a conventional list:mcols
and thus allows to add arbitrary information to each individual spectrum.mz
,msLevel
etc to extract the respective information from the storedSpectrum
objects.This object will be needed in future as we plan to add functionality in xcms that allows to identify MS2 spectra to chromatographic peaks and features. Thus we need additional annotation fields to be assigned to a spectrum, such as chromatographic peak ID or feature ID.
SpectrumList
allows this without having to add additional slots to the already existinsSpectrum
objects.