Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAGE search engine score is missing after psm re-scoring using percolator #288

Closed
WangHong007 opened this issue Sep 19, 2023 · 22 comments · Fixed by #289
Closed

SAGE search engine score is missing after psm re-scoring using percolator #288

WangHong007 opened this issue Sep 19, 2023 · 22 comments · Fixed by #289
Assignees
Labels
bug Something isn't working sage-support

Comments

@WangHong007
Copy link
Contributor

Description of the bug

SAGE's search engine score should be hyperscore, pyopenms could extrct it with idXMLs after searchengines step. But after psm re-scoring using percolator, it's missing in idXMLs.

idxml before psm re-scoring:

<PeptideIdentification score_type="hyperscore" higher_score_better="true" significance_threshold="0.0" MZ="988.485223533228918" RT="2776.47440000000006" spectrum_reference="controllerType=0 controllerNumber=1 scan=16570" >
	<PeptideHit score="4.372861652440132" sequence="LLGPSLTSTTPASSSSGSSSR" charge="2" aa_before="R" aa_after="G" start="363" end="383" protein_refs="PH_0" >
		<UserParam type="string" name="target_decoy" value="target"/>
		<UserParam type="string" name="ln(-poisson)" value="3.14389626700959"/>
		<UserParam type="string" name="ln(delta_best)" value="0.0"/>
		<UserParam type="string" name="ln(delta_next)" value="3.819337728782414"/>
		<UserParam type="string" name="ln(matched_intensity_pct)" value="3.5826106"/>
		<UserParam type="string" name="longest_b" value="9"/>
		<UserParam type="string" name="longest_y" value="18"/>
		<UserParam type="string" name="longest_y_pct" value="0.85714287"/>
		<UserParam type="string" name="matched_peaks" value="27"/>
		<UserParam type="string" name="scored_candidates" value="8222"/>
		<UserParam type="string" name="protein_references" value="unique"/>
	</PeptideHit>
	<UserParam type="string" name="PinSpecId" value="312"/>
</PeptideIdentification>

idxml after psm re-scoring:

<PeptideIdentification score_type="Posterior Error Probability" higher_score_better="false" significance_threshold="0.0" MZ="988.485223533228918" RT="2776.47440000000006" spectrum_reference="controllerType=0 controllerNumber=1 scan=16570" >
	<PeptideHit score="4.70852e-08" sequence="LLGPSLTSTTPASSSSGSSSR" charge="2" aa_before="R" aa_after="G" start="363" end="383" protein_refs="PH_7718" >
		<UserParam type="string" name="target_decoy" value="target"/>
		<UserParam type="string" name="ln(-poisson)" value="3.14389626700959"/>
		<UserParam type="string" name="ln(delta_best)" value="0.0"/>
		<UserParam type="string" name="ln(delta_next)" value="3.819337728782414"/>
		<UserParam type="string" name="ln(matched_intensity_pct)" value="3.5826106"/>
		<UserParam type="string" name="longest_b" value="9"/>
		<UserParam type="string" name="longest_y" value="18"/>
		<UserParam type="string" name="longest_y_pct" value="0.85714287"/>
		<UserParam type="string" name="matched_peaks" value="27"/>
		<UserParam type="string" name="scored_candidates" value="8222"/>
		<UserParam type="string" name="protein_references" value="unique"/>
		<UserParam type="float" name="MS:1001492" value="2.65249"/>
		<UserParam type="float" name="MS:1001491" value="7.304600000000001e-04"/>
		<UserParam type="float" name="MS:1001493" value="4.70852e-08"/>
	</PeptideHit>
	<UserParam type="string" name="PinSpecId" value="312"/>
</PeptideIdentification>

Command used and terminal output

No response

Relevant files

No response

System information

No response

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Sep 19, 2023

How is this with other search engines?

It might be because PSMFeatureExtractor can be and is skipped with Sage.

@ypriverol
Copy link
Member

I guess we are not taking the SAGE output but the pin file from percolator?

@WangHong007
Copy link
Contributor Author

@jpfeuffer @ypriverol It should be SAGE seach output. Comet and MSGF+ got their search scores in MetaValue of every PeptideHit, but not SAGE.

@jpfeuffer
Copy link
Collaborator

How does an idXML for comet look like after PSMFeatureExtractor?

@WangHong007
Copy link
Contributor Author

WangHong007 commented Sep 19, 2023

Comet search engine score is xcorr -> MetaValue MS:1002252. It's already exist before psm re-scoring.

<PeptideIdentification score_type="Posterior Error Probability" higher_score_better="false" significance_threshold="0.0" MZ="474.761474031899979" RT="1815.299999999999955" spectrum_reference="controllerType=0 controllerNumber=1 scan=3727" >
	<PeptideHit score="0.990159" sequence="LSGATLQMK" charge="2" aa_before="K" aa_after="R" start="48" end="56" protein_refs="PH_1080" >
		<UserParam type="string" name="target_decoy" value="decoy"/>
		<UserParam type="string" name="MS:1002258" value="6"/>
		<UserParam type="string" name="MS:1002259" value="16"/>
		<UserParam type="string" name="num_matched_peptides" value="1060"/>
		<UserParam type="int" name="isotope_error" value="0"/>
		<UserParam type="float" name="MS:1002252" value="1.116"/>
		<UserParam type="float" name="MS:1002253" value="1.0"/>
		<UserParam type="float" name="MS:1002254" value="0.0"/>
		<UserParam type="float" name="MS:1002255" value="113.900000000000006"/>
		<UserParam type="float" name="MS:1002256" value="11.0"/>
		<UserParam type="float" name="MS:1002257" value="2.89"/>
		<UserParam type="string" name="protein_references" value="unique"/>
		<UserParam type="float" name="COMET:deltCn" value="1.0"/>
		<UserParam type="float" name="COMET:deltLCn" value="0.0"/>
		<UserParam type="float" name="COMET:lnExpect" value="1.061256502124341"/>
		<UserParam type="float" name="COMET:lnNumSP" value="6.966024187106113"/>
		<UserParam type="float" name="COMET:lnRankSP" value="2.397895272798371"/>
		<UserParam type="float" name="COMET:IonFrac" value="0.375"/>
		<UserParam type="float" name="MS:1001492" value="-0.641415"/>
		<UserParam type="float" name="MS:1001491" value="0.270715"/>
		<UserParam type="float" name="MS:1001493" value="0.990159"/>
	</PeptideHit>
</PeptideIdentification>

@jpfeuffer
Copy link
Collaborator

But this is after rescoring. I need to see before.

@WangHong007
Copy link
Contributor Author

<PeptideIdentification score_type="expect" higher_score_better="false" significance_threshold="0.0" MZ="474.761474031899979" RT="1815.299999999999955" spectrum_reference="controllerType=0 controllerNumber=1 scan=3727" >
	<PeptideHit score="2.89" sequence="LSGATLQMK" charge="2" aa_before="K" aa_after="R" start="48" end="56" protein_refs="PH_9943" >
		<UserParam type="string" name="MS:1002258" value="6"/>
		<UserParam type="string" name="MS:1002259" value="16"/>
		<UserParam type="string" name="num_matched_peptides" value="1060"/>
		<UserParam type="int" name="isotope_error" value="0"/>
		<UserParam type="float" name="MS:1002252" value="1.116"/>
		<UserParam type="float" name="MS:1002253" value="1.0"/>
		<UserParam type="float" name="MS:1002254" value="0.0"/>
		<UserParam type="float" name="MS:1002255" value="113.900000000000006"/>
		<UserParam type="float" name="MS:1002256" value="11.0"/>
		<UserParam type="float" name="MS:1002257" value="2.89"/>
		<UserParam type="string" name="target_decoy" value="decoy"/>
		<UserParam type="string" name="protein_references" value="unique"/>
	</PeptideHit>
</PeptideIdentification>

@jpfeuffer
Copy link
Collaborator

Yes so the problem is that we actually use the Comet e-value as main score.
So you are just lucky that you picked a score that is not a main score for the other search engines.

@ypriverol ypriverol linked a pull request Sep 25, 2023 that will close this issue
10 tasks
@ypriverol
Copy link
Member

I think this problem is solved. You should take the data @WangHong007 from the SAGE id folder.

@ypriverol
Copy link
Member

@timosachsenberg Im re-opening this PR because After testing http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD004683/percolator/ the error remains. Can you double check why the hyperscore is not included in the percolator sage output?

Percolator SAGE output: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD004683/percolator/20150820_Haura-Pilot-TMT1-bRPLC01-2_sage_perc.idXML

@ypriverol ypriverol reopened this Sep 27, 2023
@jpfeuffer
Copy link
Collaborator

according to pipeline_info it is still using the old container

@ypriverol
Copy link
Member

This is percolator no?

@jpfeuffer
Copy link
Collaborator

yes PercolatorAdapter

@jpfeuffer
Copy link
Collaborator

I honestly think we should just override the containers for all openms labelled processes until the release. I.e. make the dev profile active by default. Otherwise someone will always forget to change a process.

@ypriverol
Copy link
Member

ypriverol commented Sep 27, 2023

I actually think we should make variable the containers using in every-process a variable, would that be possible? Something like:

openms_conda_string = "bioconda::openms=2.9.1"
openms_singularity_string = "ghcr.io/openms/openms-executables-sif:latest"
openms_docker_string = "ghcr.io/openms/openms-executables:latest"

@jpfeuffer
Copy link
Collaborator

I don't like it very much. You will just get confused because suddenly conda uses something different from docker etc. It also confuses users with yet an additional THREE parameters.
The only thing you will ever want is dev or latest. Nothing else.

@ypriverol
Copy link
Member

I have no idea what do you have in mind? How can you make a profile default, can you send me an example and I can do it.

@ypriverol
Copy link
Member

@jpfeuffer
Copy link
Collaborator

yes. just put it in base.config. The thing is just to remember to remove it when releasing

@ypriverol
Copy link
Member

ypriverol commented Sep 27, 2023

I was actually thinking to leave it there but then in the nextflow.config import it or not depending on the release cycle. Like in the nextflow.config

includeConfig 'conf/dev.config'

What do you think?

@jpfeuffer
Copy link
Collaborator

Yes but you need to find out if and how nextflow knows about its release cycle ;)
If it cannot know about it, then having to change one line every release it not much better than just changing 3 lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sage-support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants