-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor inconsistencies in spec #3
Comments
RE Glycan formula parsing, I thought that spaces were required already. Otherwise, without constructing an unambiguous longest-to-shortest testing order, it wouldn't be possible to solve in the general case without extreme look-ahead. It's still doable with a fixed list of monosaccharides. For multiple global modifications, they should be in separate angle brackets, following the example in 4.6.1?
I think this fits similarly to how curly-brace syntax specifies one labile modification, though in that case it takes the place of the square braces. It would make the angle bracket section really laborious to parse if we had to overload |
And formula rule 1 includes:
Maybe this should be revisited?
|
Additionally, I have the following comments about the specification draft 13: Minor comments:
Suggestions / questions:
|
Thanks a lot Wout for all your minor corrections. I think all of them are correct apart from the NeuAc, which, as far as I can see it is a valid glycan?. I also considered your previous comments on draft 12. |
In which position should labile modifications be specified? Section 4.3.2 does not explicitly mention this, although the examples all place the labile modification in the beginning. However, how does it relate to modifications with an unknown position (section 4.4.1) and global modifications (section 4.6)? Section 4.6 specifies that global modifications should be written before ambiguous modifications and N-terminal modifications, but the position of labile modifications is not mentioned. A: I have added in the specification document (Section 4.3.2): "Labile modification MUST be located before the first amino acid sequence and before N-terminal modifications, if applicable". |
Right, this does seem to be a glycan (shows that I don't know much about it). It failed my validation though because apparently it's listed as a synonym of Neu5Ac in the monosaccharides OBO and I was only considering the default names. |
All right, so the proper order is like this?
|
Clearly I didn't submit my note about Neu5Ac last night. Neu5Ac is synonymous with NeuAc. Mincing the monosaccharide apart to determine where the acetyl group is attached is also impossible with the current dissociation methods available. You can find NeuAc with additional O-acetyl groups (though they are pretty fragile and are easily lost in sample processing), but GNOme doesn't index them. The OBO and the generated JSON file list all the synonyms for each monosaccharide, though most monosaccharides aren't listed in the ProForma spec, and a very restricted subset are actually indexed in GNOme. My parser isn't handling this properly either. I just wrote the common names from memory. |
It is my recollection that a {labile} modification can appear anywhere that a [non-labile modification] can appear. The only difference is that the writer is making the statement that there is not (or there is not expected to be) any evidence of the mod in a particular location because it is completely labile. So the peptidoform SMALLS{Sulfo}NACK simply means that the writer believes that the sulfo is on the second S, but there is no trace of that in the associated evidence because the mod is (or is expected to be) completely labile. And thus it counts when computing the precursor m/z, but it can be ignored when computing abcxyz ions because it is labile. Therefore I don't think it is confined to a specific location. {} is equivalent to [] but with a "labile" meaning. Does anyone else remember that or am I confused? |
There are some minor inaccuracies in some of the examples in the specification draft 12:
EM[R: Methionine sulfone]EVEES[O-phospho-L-serine]PEK
-> This term doesn't appear in RESID. Note the leading space, but even without that the name is incorrect. Probably it should beL-methionine sulfone
(RESID:AA0251
)?EM[UNIMOD:15]EVEES[UNIMOD:56]PEK
-> accessionUNIMOD:15
does not exist. In case consistency with the previous examples is desired,UNIMOD:35
corresponds toOxidation
. Same for the invalid example withU:15
just underneath.EVTSEKC[half-cystine]LEMSC[half-cystine]EFD
->half-cystine
should behalf cystine
(no hyphen).242.0096
as the mass with four decimals.More conceptual question:
Q: page 14: Parsing glycan compositions is somewhat non-trivial because some labels overlap. It would be easier if spaces between monosaccharides are used (split on space) or cardinality is always specified (split on
[a-zA-Z]+\d+
).Maybe this can be a bit more strongly recommended in section 4.2.8?A: Parsing is possible without enforcing spaces or cardinality by checking for only defined monosaccharides rather than any string.
Q: page 18: I'm a bit confused how parsers should interpret that global modifications are isotopes? The examples (
13C
,15N
,D
) don't seem to be specified using a controlled vocabulary, whereas this is the case throughout the rest of the document. Is it that when no@
is used in the global modification part, as specified in section 4.6.2, it should always be considered an isotope instead?A: Yes, I currently interpret global modifications of the form
INT* LETTER+ SIGNED_INT*
as an isotope and global modifications of the form"[" mod "]@" (AA ",")* AA
as global amino acid modifications (so square brackets and "@" sign).Q: page 19: How should multiple global modifications on different amino acids be specified?
I guess the following example, with a comma separating the global modifications within the angular brackets, would lie in line with the spec, but this is not explicitly detailed:<[Carbamidomethyl]@C,[Oxidation]@M>MTPEILTCNSIGCLK
.A: Multiple global modifications are each specified in their own block between angled brackets.
The text was updated successfully, but these errors were encountered: