-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update HACCP_term regex to required FOODON, add multivalue example #802
base: main
Are you sure you want to change the base?
Conversation
keywords: | ||
- food | ||
- term | ||
slot_uri: MIXS:0001215 | ||
multivalued: true | ||
range: string | ||
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$ | ||
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex requires FOODON ontology. Is this what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$
instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
or see my notes on dynamic enumerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intentionally remove the white-space between the label and the term id? I don't think that's consistent with other ontology term patterns in MIxS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turbomam
For the white space, do you mean if it should be "lead poisoning [FOODON:03530243]" vs "lead poisoning[FOODON:03530243]"
So, the white space is supposed to be there? I thought I had it set to be valid with or without it... does it matter? If so, I'll make sure I correct it. Just tell me which is correct.
Looking at the submission schema the white space should be there. So I can make that update to the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your comment here : #802 (comment)
^(\S[^\r\n]*) [FOODON:\d{7,8}]$
I f we want to use pattern-only validation, I suggest we go with that.
That regex ^(\S[^\r\n]*) [FOODON:\d{7,8}]$
is showing me that "lead poisoning [FOODON:03530243]" is invalid... :(
... are you sure that's right?? Or am I missing something about the formatting of the value for "lead poisoning [FOODON:03530243]" ?
I think it needs to be ^.+\s*\[FOODON:\d{7,8}]$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decision, the regex in the 2nd image is good.
I'll test this and confirm then finish this PR.
Discussed 12/03
pattern vs structured_pattern : we have this for some of the more generic term label and term IDs.
Look for "settings" section in schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to escape the square brackets around FOODON with backslashes \[F
... etc
I didn't include an example. I am not at all familiar with the FoodAnimalAndAnimalFeed extension. Before I committed time to getting familair and making an example, I wanted to check that this was a good change. |
Thanks @mslarae13. This is good progress. We can refine it a little: First of all, how long are the numeric portions of FOODON URIs? I used ChatGPT 4 to help me with that SAPRQL query 7 or 8, after subtracting the 38 characters in the base portion or the URIs, "http://purl.obolibrary.org/obo/FOODON_" Next I asked ChatGPT 4
after a little testing with regexr, we came up with
I f we want to use pattern-only validation, I suggest we go with that. |
That doesn't check that the label and id portion match, etc., and it doesn't limit the choices to sub-classes of haccp guide food safety term A better LinkML validation strategy for this might be a dynamic enumeration. They are expressed with logic, but can be expanded to an enumeration with explicit permissible values. A limitation right now is that be that the permissible values won't include the label and the id won't be enclosed in square brackets. But I would like to use this case to motivate improvements to LinkML dynamic enumerations in support of MIxS. |
The vskit expand -s schema.yaml -o schema_expanded.yaml to expand this enums:
HaccpTerm:
reachable_from:
source_ontology: bioregistry:foodon
source_nodes:
- FOODON:03530221 ## haccp guide food safety term
is_direct: false
relationship_types:
- rdfs:subClassOf into this enums:
HaccpTerm:
reachable_from:
source_ontology: bioregistry:foodon
source_nodes:
- FOODON:03530221 ## haccp guide food safety term
is_direct: false
relationship_types:
- rdfs:subClassOf
permissible_values:
FOODON:03530231:
text: FOODON:03530231
meaning: FOODON:03530231
title: hazard 3
FOODON:03530244:
text: FOODON:03530244
meaning: FOODON:03530244
title: sodium tripolyphosphate
FOODON:03530237:
text: FOODON:03530237
meaning: FOODON:03530237
title: hazard 9 |
If using this mechanism sounds promising to you, and you want the OAK code to be modified to emit "sodium tripolyphosphate [FOODON:03530244]" instead of "FOODON:03530244", please up-vote this |
keywords: | ||
- food | ||
- term | ||
slot_uri: MIXS:0001215 | ||
multivalued: true | ||
range: string | ||
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+) \[[a-zA-Z]{2,}:[a-zA-Z0-9]\d+\]$ | ||
pattern: ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please use the pattern ^(\S[^\r\n]*) [FOODON:\d{7,8}]$
instead of ^.+\s*\[FOODON:\d+\](;\s*\[FOODON:\d+\])*$
or see my notes on dynamic enumerations.
I agree that the change is suitable. As for the actual patturn being used, I bow to @turbomam's greater expertise on that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes seem reasonable to me, and I trust the combination of @turbomam and @mslarae13 to get the patturns correct (I dont have the expertise to know whats right).
Address syntax match to examples
Update regexs for MIxS
Based on the description for HACCP this requires the FOODON ontology.
description: Hazard Analysis Critical Control Points (HACCP) food safety terms;
This field accepts terms listed under HACCP guide food safety term (http://purl.obolibrary.org/obo/FOODON_03530221)
While this doesn't perform any validation to check if what's been entered is really in FOODON, it does some string check.