Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combined_fdr_1.2.mzid validation issues #35

Open
edeutsch opened this issue Jun 9, 2016 · 15 comments
Open

combined_fdr_1.2.mzid validation issues #35

edeutsch opened this issue Jun 9, 2016 · 15 comments

Comments

@edeutsch
Copy link
Contributor

edeutsch commented Jun 9, 2016

My CV term validator finds these issues with this file:
ERROR: cvParam distinct peptide-level q-value should have units, but it does not!
WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file'
WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file'
WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'

the first error may be an error in the CV. I don't think we want units for q-value in the term? Should we remove units from all q-value terms? This issue affects several.

[Term]
id: MS:1001868
name: distinct peptide-level q-value
def: "Estimation of the q-value for distinct peptides once redundant identifications of the same peptide have been removed (id e
st multiple PSMs, possibly with different mass modifications, mapping to the same sequence have been collapsed to one entry)." [
PSI:PI]
xref: value-type:xsd:double "The allowed value-type for this CV term."
is_a: MS:1002484 ! peptide-level statistical threshold
relationship: has_units UO:0000166 ! parts per notation unit
relationship: has_units UO:0000187 ! percent
relationship: has_domain MS:1002305 ! value between 0 and 1 inclusive

@andrewrobertjones
Copy link
Contributor

Agree all FDR and q-value terms should not have any units - @germa can you check if there are other similar terms e.g. evalues and pvalues, PEP etc that have units. I don't think any of these should.
thanks

@fawazghali
Copy link

I have update the example file. @edeutsch can you please re-run the validator. Thanks. Fawaz

@germa
Copy link
Collaborator

germa commented Jun 14, 2016

Removed the units from the FDR and q-value terms in version 3.90.0 of psi-ms.obo

@edeutsch
Copy link
Contributor Author

These issues remain today:
WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file'
WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file'
WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'

@fawazghali
Copy link

I don't see these issues in the file.

@germa
Copy link
Collaborator

germa commented Jun 23, 2016

Message 1:
Level: ERROR
--> Non-fatal XML Parsing error detected on line 102973
Error message: cvc-pattern-valid: Wert '*' ist nicht Facet-gültig in Bezug auf Muster '[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}' für Typ '#AnonType_postPeptideEvidenceType'.

Message 2:
Level: ERROR
--> Non-fatal XML Parsing error detected on line 102973
Error message: cvc-attribute.3: Wert '*' des Attributs 'post' bei Element 'PeptideEvidence' hat keinen gültigen Typ 'null'.

It means that according to the schema file4 mzIdentML1.2.0-candidate.xsd there is now '*' allowed in the post attribute of peptideEvidence
mzid_peptideevidence_post_star_not_allowed

@edeutsch
Copy link
Contributor Author

Regarding fghali's "I don't see these issues in the file", perhaps there is confusion about which file we are talking about. There are two similarly named "combined" files. Here are the issues I see:

peptide_level_stats_examples/combined_fdr_1.2.mzid.gz
WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file'
WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file'
WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'

multi_search/combined_1.2.mzid.gz
ERROR: cvParam unknown modification should have a value, but it does not!

@andrewrobertjones
Copy link
Contributor

@fghali Hi Fawaz, please can you check these out again please
thanks
Andy

@fawazghali
Copy link

fawazghali commented Jun 30, 2016

I have update the example file (peptide_level_stats_examples/combined_fdr_1.2.mzid.gz). @edeutsch can you please re-run the validator. Thanks. Fawaz

@andrewrobertjones
Copy link
Contributor

@fghali There are also errors in the file multi_search/combined_1.2.mzid.gz, see Gerhard's and Eric's messages above.

The main parsing error relates to these types of error:

AND

The star should be replaced with “-” assuming this is caused by the peptide being the N or C-terminus of the protein (instead of stars in the sequence, which shouldn’t happen). For now, we can you just do a Find and Replace, but it would be useful if you can track back to see which of the file format parsers is getting this wrong, and we can fix it.

@fawazghali
Copy link

I have update both files replacing "*" with "-". I'll check the parsers to see where it's happening.

@edeutsch
Copy link
Contributor Author

This is is fine to my validators

@germa
Copy link
Collaborator

germa commented Jul 20, 2016

Message 1:
Rule ID: ProteinDetectionList_must_rule
Level: ERROR
Context(/cvParam/@accession ) in 2 locations
--> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/ProteinDetectionList/cvParam/@accession' because no values were found:

  • The sole term MS:1002404 (count of identified proteins) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.

Message 2:
Level: WARN
--> unanticipated terms for XPath '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' : [MS:1002439]

We get rid of using "final PSM list", see GitHub issue #5

@fawazghali
Copy link

Fixed.

@edeutsch
Copy link
Contributor Author

This file seems valid to my validators. Not sure about the Java validator issue above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants