Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chemical formula expansion and performance #41

Open
acesnik opened this issue Apr 6, 2018 · 7 comments
Open

Chemical formula expansion and performance #41

acesnik opened this issue Apr 6, 2018 · 7 comments
Labels
Enhancement New feature or request Performance
Milestone

Comments

@acesnik
Copy link
Contributor

acesnik commented Apr 6, 2018

From @rfellers:

I am curious what requirements others might have for a chemical formula interface. I was only focused on ProForma, but that stills means that we need to handle regular elements, pure isotopes of elements (e.g. C13), and Unimod "atoms" (which can additionally represent glycan residues and common molecules). Should we add to the benchmarking app to include chemical formulas? How important is performance?

@acesnik acesnik added Enhancement New feature or request Performance labels Apr 6, 2018
@acesnik
Copy link
Contributor Author

acesnik commented Apr 6, 2018

We are somewhat interested in performance, but our main concern is whether the results of the chemical formula interface give the same results as mzLib. We would eventually depend on the mass calculations and such to give the same results. You can find some tests for the mzLib implementation here. I think it does look promising in skimming the code; your implementation looks similar to mzLib, e.g. using the NIST database.

I'm not sure how we have handled Unimod shorthand for glycans. @rmillikin, do you know about that?

@rfellers
Copy link
Member

rfellers commented Apr 6, 2018

Gotcha. What format for chemical formulas do you use, i.e. is there a name? Looks very similar to what we use at NU, but it has some custom stuff for isotopes. Unimod has a composition format and RESID/PSI-MOD uses something else.

Here's an example for Label:13C(9)15N(1):
https://www.ebi.ac.uk/ols/ontologies/mod/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMOD_00589

  • PSI-MOD: (12)C -9 (13)C 9 (14)N -1 (15)N 1
  • Unimod: C(-9) 13C(9) N(-1) 15N
  • NU: we don't handle isotopes so we can't handle this
  • mzLib: C-9C{13}9N-1N{15} (I'm guessing based on your unit tests)

Given all of these differences, I plan to have multiple parsers/writers that work with a generic IChemicalFormula interface. This means, however, that a simple ToString() on a chemicalFormula doesn't make sense unless we adopt one of the notations as a standard ...

@acesnik
Copy link
Contributor Author

acesnik commented Apr 6, 2018

Wow, that's an unfortunate mess, isn't it? I think Unimod's is the most readable.

@rfellers
Copy link
Member

rfellers commented Apr 9, 2018

Indeed, messy. The best I can tell, there is no standard way to write chemical formulas ... shall we start a ProFormula manuscript? :) Unimod is probably the best and it is what ProForma chose as the default, so we can lean towards that format as appropriate.

@acesnik
Copy link
Contributor Author

acesnik commented Apr 9, 2018

Ha! ProFormula would be something.

Yes, I think we should lean towards Unimod's format, but writing multiple parsers would allow us to read all of those formats. That makes me wonder how the parser will distinguish the formula formats...

@rfellers
Copy link
Member

rfellers commented Apr 9, 2018

Here's where my head is at presently:

  • A dedicated Unimod format parser (which we have now)
  • A RESID/PSI-MOD format parser that is more baked into the Resid/PsiMod Modification classes (which don't exist)
  • UW and NU formats are not supported directly in TopDownSDK. Each group would have the option to bring the TopDownSDK into our respective codebases and implement any SDK interfaces as needed. This is why I want to rely so heavily on interfaces, so, for example, your ChemicalFormula in mzLib can implement the IChemicalFormula from the SDK if it wants/makes sense.

ProForma standardized on Unimod format and will always assume the chemical formulas are written using that format (and throw errors accordingly).

Does that help at all or am I missing your point?

@acesnik
Copy link
Contributor Author

acesnik commented Apr 9, 2018

That helps, thanks! I'm on board.

@acesnik acesnik added this to the ProForma 2.0 milestone Nov 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Performance
Projects
None yet
Development

No branches or pull requests

2 participants