Document the default language in OBO ontologies #128

matentzn · 2023-04-07T18:26:49Z

In OBO we make the assumption that "no language tag means english". This is fine internally, practical, due to our english language label requirement as part of the admission process, but it would be prudent to explicitly document the default language (i.e. the language that should be assumption for all literals without a language tag) on ontology metadata level.

Looking at MOD 2.0,

I think the suggestion is to do this:

<http://purl.obolibrary.org/obo/mondo.owl> dcterms:language "en".

The real range of the dcterms:language is http://purl.org/dc/terms/LinguisticSystem, which I don't know how to use, but I think a simple language code should be fine and much easier then some convoluted IRI representing the language. If people are tripped up by this we can do:

<http://purl.obolibrary.org/obo/mondo.owl> dce:language "en".

as well. But I prefer the former to help fading out the dce namespace.

The text was updated successfully, but these errors were encountered:

cmungall · 2023-04-07T19:02:04Z

I support this, but dcterms only. no dce.

what is the expected cardinality? presumably 0..1?

What is expected behavior or robot merge and extract?

Can we come up with some validation rules

strawperson:

max 1 language declaration
if absent, assume en
if present, labels etc MAY(?) (? SHOULD NOT) tag literals with the default
if present, labels etc SHOULD include a langless literal corresponding to the default
if present, then there MUST NOT be both a langless and a default-tagged literal with the same value
...

the objective here is to have predictable behavior for retrieving single-valued properties like label, definition, etc in a multivalued context

matentzn · 2023-04-07T19:10:30Z

I agree with all that you say! 0..1 cardinality.

What is expected behavior or robot merge and extract?

Is it important to discuss this here and now? merge is going to be super problematic to get right, but I don't see why we need to deal with extract specifically right now. Another hard part is robot report in this context.

I guess the point I want to make is: the proposal is to document a common practice. If we start tying this to the difficulty of implementing tool support it will become harder to push this issue.

jonquet · 2023-04-07T21:00:59Z

Hello, MOD suggests to use dct:language to identify the languages in which we can find label inside the ontology.
We have identified doap:language, omv:naturalLanguage, schema:inLanguage also.
In AgroPortal we use values URI from Lexvo e.g., http://lexvo.org/id/iso639-3/eng

Attention, there is no "default" natural language property in MOD. In fact, I never really thought about the need to express 1 (and only 1) default language and maybe multiple other ones. Certainly because the situation can occur where none of all the natural language declared would cover the full ontology.

So if the group decide to have a property for the "default language " it also needs to decide a property for all the other ones. In that case, I would suggest:
mod:defaultLanguage subProperty of dct:language => to encode the 0..1 default language (new property in MOD or IAO)
dct:language => to encode the other languages

2 notes:
In AgroPortal, we need to know all the natural languages of an ontology to implement the multilingual capability (currently being developed => agroportal/project-management#307).
Also, as it is not multilingual yet we have setup the portal with a default language (en).

graybeal · 2023-04-09T03:16:22Z

The DataCite approach is that if it only has one language, that is the only one declared; they do not indicate any mechanism to indicate one language is 'default' more than the others. But I think it is useful to declare a primary language if that is the case (and it is for OBO ontologies). For our metadata files on one project we followed your pattern of generating a 'defaultLanguage' property, that feels like a good solution to me. ("Primary language used to present the data file (if multiple languages are present, the Other Languages field may be used to add additional languages).")

matentzn · 2023-04-09T10:39:52Z

Thank you @graybeal and @jonquet ; it seems like a property of this kind would be universally useful. If we make it a child of dc:language, I am worried that people start crying about the range violation; do you think @jonquet this would be a problem (maybe we have enough tissues)? I would be fine with it.

Remains to be seen what is the right home for it; mod and omo are certainly possibilities. Any opinions here? I would have thought that skos or perhaps skosxl would have been good homes too, as languages seem to be like a universal concern in these domains as well?

matentzn · 2023-04-10T11:04:58Z

(comment to self: protege has that for many years:

DataPropertyAssertion(<http://protege.stanford.edu/plugins/owl/protege#defaultLanguage> <http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl> "en"^^xsd:string)

)

cmungall · 2024-02-26T21:53:22Z

Any further thoughts here?

matentzn · 2024-02-27T05:55:29Z

The easiest way to move forward here is creating an OMO property (I can do it in 5 min), but since we may want to use this for other kinds of semantic artefacts like schemas, I guess the remaining question is: where should we request this property to be added?

jonquet · 2024-02-27T07:01:56Z

I really recommend to go for dct:language or extend it in mod namespace. MOD is here waiting for maturity, adopters and contributors.
Also, on another track: since december 2023 (my last post was in April) AgroPortal is now multilingual
(see : https://doc.jonquetlab.lirmm.fr/share/e6158eda-c109-4385-852c-51a42de9a412/doc/release-notes-btKjZk5tU2) and we rely on http://omv.ontoware.org/2005/05/ontology#naturalLanguage (which in our case was chosen to stay consistent with BioPortal historical choices to rely on OMV).

alanruttenberg · 2024-02-27T16:13:24Z

I'm not fond of assuming, in released ontologies, that xsd:string means @en. It seems like a good idea to announce policy with a property as suggested but I wonder whether builds could incorporate a step where they change xsd:strings in annotations known to have language-specific values (definitions, comments, editorial notes, maybe labels) to language tagged literals?

matentzn · 2024-02-28T09:29:33Z

I wonder whether builds could incorporate a step where they change xsd:strings in annotations known to have language-specific values (definitions, comments, editorial notes, maybe labels) to language tagged literals?

It is an option for the OWL formats - not sure what it will do to the other serialisations like OBO, but for OWL this is definitely an option!

matentzn · 2024-02-28T11:38:55Z

@jonquet I don't mind adding the property to MOD due to its wide applicability beyond OBO, but re "extension" - are you not concerned about the range restriction on dct:language? It is supposed to be https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#LinguisticSystem, which according to their spec is supposed to be a class. To make things easier for us I really think the value of "defaultLanguage" should be a ISO language string., like en, fr etc.

jonquet · 2024-02-28T13:48:16Z

DCT spec declares range with something more flexible than RDF: Range Includes
https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/language

And define Ranges includes here : https://www.dublincore.org/specifications/dublin-core/dcmi-terms/

So it is to me ok to extend (rdfs:subPropertyOf) a DCT property and refine the range as the range of the super property is "flexible"

Aside this discussion: AgroPortal (which for backward compatibility uses omv:naturalLanguage) enforces the use of URI from Lexvo with ISO-639-1 values ... we have tried ISO-639-3 but its too much stuff, not really used.

cmungall mentioned this issue Apr 7, 2023

WIP: multilingual support. INCATools/ontology-access-kit#522

Merged

matentzn mentioned this issue Apr 24, 2024

What to do with multiple labels / definitions etc? geneontology/obographs#107

Open

matentzn mentioned this issue Jul 3, 2024

Add lang="en" tag to international version pref labels obophenotype/human-phenotype-ontology#10559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the default language in OBO ontologies #128

Document the default language in OBO ontologies #128

matentzn commented Apr 7, 2023

cmungall commented Apr 7, 2023

matentzn commented Apr 7, 2023

jonquet commented Apr 7, 2023

graybeal commented Apr 9, 2023

matentzn commented Apr 9, 2023

matentzn commented Apr 10, 2023

cmungall commented Feb 26, 2024

matentzn commented Feb 27, 2024

jonquet commented Feb 27, 2024

alanruttenberg commented Feb 27, 2024

matentzn commented Feb 28, 2024

matentzn commented Feb 28, 2024

jonquet commented Feb 28, 2024 •

edited

Loading

Document the default language in OBO ontologies #128

Document the default language in OBO ontologies #128

Comments

matentzn commented Apr 7, 2023

cmungall commented Apr 7, 2023

matentzn commented Apr 7, 2023

jonquet commented Apr 7, 2023

graybeal commented Apr 9, 2023

matentzn commented Apr 9, 2023

matentzn commented Apr 10, 2023

cmungall commented Feb 26, 2024

matentzn commented Feb 27, 2024

jonquet commented Feb 27, 2024

alanruttenberg commented Feb 27, 2024

matentzn commented Feb 28, 2024

matentzn commented Feb 28, 2024

jonquet commented Feb 28, 2024 • edited Loading

jonquet commented Feb 28, 2024 •

edited

Loading