Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

48 controled vocabulary issues revealed #628

Closed
wraff opened this issue Jan 5, 2022 · 2 comments · Fixed by #647
Closed

48 controled vocabulary issues revealed #628

wraff opened this issue Jan 5, 2022 · 2 comments · Fixed by #647
Assignees

Comments

@wraff
Copy link

wraff commented Jan 5, 2022

Dear all,
after mentioning a few times that there are some issues with non-consistent vocabulary usage I finally read all sdrf tables and compared all labels/column-names (using my R package wrProteo) for consistency. This revealed that 48 out of 189 sdrf annotations at this moment had some inconsistencies (in some cases with even more than 10 columns). Based on the terms most frequently used (lower case) I created 48 separate issues, specifying precisely which column-names should be changed to which controlled vocabulary terms. There may be oven more issues (like MS2 vs MS-MS), here I focused on minor/major caps issues which are obvious.
After all I was surprised by the elevated number of inconsistent entries. Thus; I suggest you to regularly check all entries for consistent format.
Best greetings,
Wolfgang Raffelsberger

@StSchulze
Copy link
Collaborator

Hi Wolfgang,

Based on the description of the file format (https://github.com/bigbio/proteomics-metadata-standard/blob/master/sdrf-proteomics/README.adoc#81-sdrf-proteomics-format-rules), SDRF files are case insensitive:

Case sensitivity: By specification the SDRF is case insensitive, but we RECOMMEND using lowercase characters throughout all the text (Column names and values).

I agree that it would be nice to have consistency between the files (I guess that's also why lowercase is recommended), but since this is not enforced in the file format, I don't think that it needs to be checked or modified.

@ypriverol
Copy link
Member

Hi @wraff I have been extremely busy with PRIDE related topics. Thanks a lot for your comments, and I will review these inconsistencies in all SDRFs.

Most of these inconsistencies as @StSchulze commented comes from RECOMMENDED behaviors of the format instead of an actual rule. Also, some of these inconsistencies comes from the evolution of the examples. But, as you said, they must be reviewed more often.

I will create a PR updating and correcting some of these issues and I will add you to the loop for review.

@ypriverol ypriverol self-assigned this May 1, 2022
@ypriverol ypriverol linked a pull request May 1, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants