-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the default language in OBO ontologies #128
Comments
I support this, but dcterms only. no dce. what is the expected cardinality? presumably 0..1? What is expected behavior or robot merge and extract? Can we come up with some validation rules strawperson:
the objective here is to have predictable behavior for retrieving single-valued properties like label, definition, etc in a multivalued context |
I agree with all that you say! 0..1 cardinality.
Is it important to discuss this here and now? I guess the point I want to make is: the proposal is to document a common practice. If we start tying this to the difficulty of implementing tool support it will become harder to push this issue. |
Hello, MOD suggests to use dct:language to identify the languages in which we can find label inside the ontology. Attention, there is no "default" natural language property in MOD. In fact, I never really thought about the need to express 1 (and only 1) default language and maybe multiple other ones. Certainly because the situation can occur where none of all the natural language declared would cover the full ontology. So if the group decide to have a property for the "default language " it also needs to decide a property for all the other ones. In that case, I would suggest: 2 notes: |
The DataCite approach is that if it only has one language, that is the only one declared; they do not indicate any mechanism to indicate one language is 'default' more than the others. But I think it is useful to declare a primary language if that is the case (and it is for OBO ontologies). For our metadata files on one project we followed your pattern of generating a 'defaultLanguage' property, that feels like a good solution to me. ("Primary language used to present the data file (if multiple languages are present, the Other Languages field may be used to add additional languages).") |
Thank you @graybeal and @jonquet ; it seems like a property of this kind would be universally useful. If we make it a child of dc:language, I am worried that people start crying about the range violation; do you think @jonquet this would be a problem (maybe we have enough tissues)? I would be fine with it. Remains to be seen what is the right home for it; mod and omo are certainly possibilities. Any opinions here? I would have thought that skos or perhaps skosxl would have been good homes too, as languages seem to be like a universal concern in these domains as well? |
(comment to self: protege has that for many years:
) |
Any further thoughts here? |
The easiest way to move forward here is creating an OMO property (I can do it in 5 min), but since we may want to use this for other kinds of semantic artefacts like schemas, I guess the remaining question is: where should we request this property to be added? |
I really recommend to go for dct:language or extend it in mod namespace. MOD is here waiting for maturity, adopters and contributors. |
I'm not fond of assuming, in released ontologies, that xsd:string means @en. It seems like a good idea to announce policy with a property as suggested but I wonder whether builds could incorporate a step where they change xsd:strings in annotations known to have language-specific values (definitions, comments, editorial notes, maybe labels) to language tagged literals? |
It is an option for the OWL formats - not sure what it will do to the other serialisations like OBO, but for OWL this is definitely an option! |
@jonquet I don't mind adding the property to MOD due to its wide applicability beyond OBO, but re "extension" - are you not concerned about the range restriction on dct:language? It is supposed to be https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#LinguisticSystem, which according to their spec is supposed to be a class. To make things easier for us I really think the value of "defaultLanguage" should be a ISO language string., like |
DCT spec declares range with something more flexible than RDF: Range Includes And define Ranges includes here : https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ So it is to me ok to extend (rdfs:subPropertyOf) a DCT property and refine the range as the range of the super property is "flexible" Aside this discussion: AgroPortal (which for backward compatibility uses omv:naturalLanguage) enforces the use of URI from Lexvo with ISO-639-1 values ... we have tried ISO-639-3 but its too much stuff, not really used. |
In OBO we make the assumption that "no language tag means english". This is fine internally, practical, due to our english language label requirement as part of the admission process, but it would be prudent to explicitly document the default language (i.e. the language that should be assumption for all literals without a language tag) on ontology metadata level.
Looking at MOD 2.0,
I think the suggestion is to do this:
The real range of the
dcterms:language
is http://purl.org/dc/terms/LinguisticSystem, which I don't know how to use, but I think a simple language code should be fine and much easier then some convoluted IRI representing the language. If people are tripped up by this we can do:as well. But I prefer the former to help fading out the dce namespace.
The text was updated successfully, but these errors were encountered: