-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msfragger pepxml reader support for c-term modifications #85
Conversation
if site < cterm_position: | ||
mod_mass = mod_mass - AA_ASCII_MASS[ord(sequence[site-1])] | ||
else: | ||
mod_mass -= (MASS_H + MASS_O + MASS_PROTON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing a proton does not make sense here, as only neutral masses are involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked my results with this file:
20190131_QExHFX3_Ogris_MFPL_gel_PP2A_EV_90p.zip
With the current code from the pull request, for the peptide RTPDYFL with Methyl@cterm, we get the following fragment masses, where both b ions (without the cterm mod) and y ions (with the cterm mod) match the calculated fragment masses from http://db.systemsbiology.net/proteomicsToolkit/FragIonServlet?sequence=RTPDYFL&massType=monoRB&charge=1&bCB=1&yCB=1&nterm=0&cterm=14.015650&addModifType=&addModifVal=
b_z1 y_z1 b_z2 y_z2
157.108387 769.376683 79.057832 385.191980
258.156066 668.329005 129.581671 334.668140
This is also present in the MSFragger source code, where an extra mass of proton is present when reporting mass of cterm mod, but not nterm. This is also evident in the pepxml entry
<search_hit peptide="RTPDYFL" massdiff="0.00225830078125" calc_neutral_pep_mass="924.47046" peptide_next_aa="-" num_missed_cleavages="1" num_tol_term="2" protein_descr="Serine/threonine-protein phosphatase 2A catalytic subunit beta isoform OS=Mus musculus OX=10090 GN=Ppp2cb PE=1 SV=1" num_tot_proteins="2" tot_num_ions="12" hit_rank="1" num_matched_ions="7" protein="sp|P62715|PP2AB_MOUSE" peptide_prev_aa="R" is_rejected="0">
<alternative_protein protein_descr="Serine/threonine-protein phosphatase 2A catalytic subunit alpha isoform OS=Mus musculus OX=10090 GN=Ppp2ca PE=1 SV=1" protein="sp|P63330|PP2AA_MOUSE" peptide_prev_aa="R" peptide_next_aa="-" num_tol_term="2"/>
<modification_info mod_cterm_mass="32.025665" modified_peptide="RTPDYFLc[32]">
</modification_info>
where the cterm_mass is the sum of methylation mass (14.02 Da) + the masses of hydrogen, oxygen, and proton.
Can I provide any other tests to show that the mass of the proton should be included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try it by myself. If this is indeed the case, I will merge this PR.
PS: it is still wired for me, removing a proton will retain the electron, leading to a negative charge...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yangkl96 , i think this is a H2O, not HO+proton, please double check this and then I will merge this PR, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highly doubt that if a proton should be substracted here, as it will result in a negtive ion. This mod_cterm_mass
value in pepxml highly depends on its definition in pepxml schema which defines that this value is a residue mass, or a residue mass plus an H2O compound, or something plus a proton. However, I cannot find the definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @jalew188. I talked to Fengchao and we agree that it was weird that our pepxml writer was using proton mass instead of another hydrogen mass. We may change this in the future MSFragger code, but it should work fine now since the ppm difference is so small. I have pushed the correction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i agree. Just to make sure the substractor is correct, and not confused
if mod_name.endswith('C-term'): | ||
_mod = mod_name | ||
else: | ||
_mod = mod_name.split('@')[0]+'@Any C-term' #what if only Protein C-term is listed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check whether modname@Any C-term
or modname@Protein C-term
is in the MOD_MASS
dict
Thanks for waiting for these changes. I added 'Methyl@E' to psm_reader.yaml to test that fragment m/z calculations are correct for c-term mods. It can be deleted, but I am wondering why only a handful of PTMs are added, rather than putting all the entries in modification.tsv into it?
Tested using pepxml files generated from msfragger search of data from PXD014879, including c-terminal methyl as a variable mod. Ran code below from Python terminal in PyCharm IDE to ensure it works:
from alphabase.psm_reader import *
from alphabase.peptide import *
psm_reader = psm_reader_provider.get_reader("msfragger_pepxml")
msf_df = psm_reader.import_file("20190131_QExHFX3_Ogris_MFPL_gel_PP2A_EV_90p.pepXML")
methyl_df = msf_df[msf_df['mods'].str.contains('Methyl')].copy()
fragment.create_fragment_mz_dataframe_by_sort_precursor(methyl_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'y_modloss_z1', 'b_modloss_z2', 'y_modloss_z2'])