-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chemical formula expansion and performance #41
Comments
We are somewhat interested in performance, but our main concern is whether the results of the chemical formula interface give the same results as mzLib. We would eventually depend on the mass calculations and such to give the same results. You can find some tests for the mzLib implementation here. I think it does look promising in skimming the code; your implementation looks similar to mzLib, e.g. using the NIST database. I'm not sure how we have handled Unimod shorthand for glycans. @rmillikin, do you know about that? |
Gotcha. What format for chemical formulas do you use, i.e. is there a name? Looks very similar to what we use at NU, but it has some custom stuff for isotopes. Unimod has a composition format and RESID/PSI-MOD uses something else. Here's an example for Label:13C(9)15N(1):
Given all of these differences, I plan to have multiple parsers/writers that work with a generic IChemicalFormula interface. This means, however, that a simple ToString() on a chemicalFormula doesn't make sense unless we adopt one of the notations as a standard ... |
Wow, that's an unfortunate mess, isn't it? I think Unimod's is the most readable. |
Indeed, messy. The best I can tell, there is no standard way to write chemical formulas ... shall we start a ProFormula manuscript? :) Unimod is probably the best and it is what ProForma chose as the default, so we can lean towards that format as appropriate. |
Ha! ProFormula would be something. Yes, I think we should lean towards Unimod's format, but writing multiple parsers would allow us to read all of those formats. That makes me wonder how the parser will distinguish the formula formats... |
Here's where my head is at presently:
ProForma standardized on Unimod format and will always assume the chemical formulas are written using that format (and throw errors accordingly). Does that help at all or am I missing your point? |
That helps, thanks! I'm on board. |
From @rfellers:
I am curious what requirements others might have for a chemical formula interface. I was only focused on ProForma, but that stills means that we need to handle regular elements, pure isotopes of elements (e.g. C13), and Unimod "atoms" (which can additionally represent glycan residues and common molecules). Should we add to the benchmarking app to include chemical formulas? How important is performance?
The text was updated successfully, but these errors were encountered: