Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the default language in OBO ontologies #128

Open
matentzn opened this issue Apr 7, 2023 · 13 comments
Open

Document the default language in OBO ontologies #128

matentzn opened this issue Apr 7, 2023 · 13 comments

Comments

@matentzn
Copy link
Contributor

matentzn commented Apr 7, 2023

In OBO we make the assumption that "no language tag means english". This is fine internally, practical, due to our english language label requirement as part of the admission process, but it would be prudent to explicitly document the default language (i.e. the language that should be assumption for all literals without a language tag) on ontology metadata level.

Looking at MOD 2.0,

I think the suggestion is to do this:

<http://purl.obolibrary.org/obo/mondo.owl> dcterms:language "en".

The real range of the dcterms:language is http://purl.org/dc/terms/LinguisticSystem, which I don't know how to use, but I think a simple language code should be fine and much easier then some convoluted IRI representing the language. If people are tripped up by this we can do:

<http://purl.obolibrary.org/obo/mondo.owl> dce:language "en".

as well. But I prefer the former to help fading out the dce namespace.

@cmungall
Copy link
Contributor

cmungall commented Apr 7, 2023

I support this, but dcterms only. no dce.

what is the expected cardinality? presumably 0..1?

What is expected behavior or robot merge and extract?

Can we come up with some validation rules

strawperson:

  • max 1 language declaration
  • if absent, assume en
  • if present, labels etc MAY(?) (? SHOULD NOT) tag literals with the default
  • if present, labels etc SHOULD include a langless literal corresponding to the default
  • if present, then there MUST NOT be both a langless and a default-tagged literal with the same value
  • ...

the objective here is to have predictable behavior for retrieving single-valued properties like label, definition, etc in a multivalued context

@matentzn
Copy link
Contributor Author

matentzn commented Apr 7, 2023

I agree with all that you say! 0..1 cardinality.

What is expected behavior or robot merge and extract?

Is it important to discuss this here and now? merge is going to be super problematic to get right, but I don't see why we need to deal with extract specifically right now. Another hard part is robot report in this context.

I guess the point I want to make is: the proposal is to document a common practice. If we start tying this to the difficulty of implementing tool support it will become harder to push this issue.

@jonquet
Copy link

jonquet commented Apr 7, 2023

Hello, MOD suggests to use dct:language to identify the languages in which we can find label inside the ontology.
We have identified doap:language, omv:naturalLanguage, schema:inLanguage also.
In AgroPortal we use values URI from Lexvo e.g., http://lexvo.org/id/iso639-3/eng

Attention, there is no "default" natural language property in MOD. In fact, I never really thought about the need to express 1 (and only 1) default language and maybe multiple other ones. Certainly because the situation can occur where none of all the natural language declared would cover the full ontology.

So if the group decide to have a property for the "default language " it also needs to decide a property for all the other ones. In that case, I would suggest:
mod:defaultLanguage subProperty of dct:language => to encode the 0..1 default language (new property in MOD or IAO)
dct:language => to encode the other languages

2 notes:
In AgroPortal, we need to know all the natural languages of an ontology to implement the multilingual capability (currently being developed => agroportal/project-management#307).
Also, as it is not multilingual yet we have setup the portal with a default language (en).

@graybeal
Copy link

graybeal commented Apr 9, 2023

The DataCite approach is that if it only has one language, that is the only one declared; they do not indicate any mechanism to indicate one language is 'default' more than the others. But I think it is useful to declare a primary language if that is the case (and it is for OBO ontologies). For our metadata files on one project we followed your pattern of generating a 'defaultLanguage' property, that feels like a good solution to me. ("Primary language used to present the data file (if multiple languages are present, the Other Languages field may be used to add additional languages).")

@matentzn
Copy link
Contributor Author

matentzn commented Apr 9, 2023

Thank you @graybeal and @jonquet ; it seems like a property of this kind would be universally useful. If we make it a child of dc:language, I am worried that people start crying about the range violation; do you think @jonquet this would be a problem (maybe we have enough tissues)? I would be fine with it.

Remains to be seen what is the right home for it; mod and omo are certainly possibilities. Any opinions here? I would have thought that skos or perhaps skosxl would have been good homes too, as languages seem to be like a universal concern in these domains as well?

@matentzn
Copy link
Contributor Author

(comment to self: protege has that for many years:

DataPropertyAssertion(<http://protege.stanford.edu/plugins/owl/protege#defaultLanguage> <http://www.co-ode.org/ontologies/pizza/2005/10/18/pizza.owl> "en"^^xsd:string)

)

@cmungall
Copy link
Contributor

Any further thoughts here?

@matentzn
Copy link
Contributor Author

The easiest way to move forward here is creating an OMO property (I can do it in 5 min), but since we may want to use this for other kinds of semantic artefacts like schemas, I guess the remaining question is: where should we request this property to be added?

@jonquet
Copy link

jonquet commented Feb 27, 2024

I really recommend to go for dct:language or extend it in mod namespace. MOD is here waiting for maturity, adopters and contributors.
Also, on another track: since december 2023 (my last post was in April) AgroPortal is now multilingual
(see : https://doc.jonquetlab.lirmm.fr/share/e6158eda-c109-4385-852c-51a42de9a412/doc/release-notes-btKjZk5tU2) and we rely on http://omv.ontoware.org/2005/05/ontology#naturalLanguage (which in our case was chosen to stay consistent with BioPortal historical choices to rely on OMV).

@alanruttenberg
Copy link
Collaborator

I'm not fond of assuming, in released ontologies, that xsd:string means @en. It seems like a good idea to announce policy with a property as suggested but I wonder whether builds could incorporate a step where they change xsd:strings in annotations known to have language-specific values (definitions, comments, editorial notes, maybe labels) to language tagged literals?

@matentzn
Copy link
Contributor Author

I wonder whether builds could incorporate a step where they change xsd:strings in annotations known to have language-specific values (definitions, comments, editorial notes, maybe labels) to language tagged literals?

It is an option for the OWL formats - not sure what it will do to the other serialisations like OBO, but for OWL this is definitely an option!

@matentzn
Copy link
Contributor Author

@jonquet I don't mind adding the property to MOD due to its wide applicability beyond OBO, but re "extension" - are you not concerned about the range restriction on dct:language? It is supposed to be https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#LinguisticSystem, which according to their spec is supposed to be a class. To make things easier for us I really think the value of "defaultLanguage" should be a ISO language string., like en, fr etc.

@jonquet
Copy link

jonquet commented Feb 28, 2024

DCT spec declares range with something more flexible than RDF: Range Includes
https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/language

And define Ranges includes here : https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
Capture d’écran 2024-02-28 à 14 44 03

So it is to me ok to extend (rdfs:subPropertyOf) a DCT property and refine the range as the range of the super property is "flexible"

Aside this discussion: AgroPortal (which for backward compatibility uses omv:naturalLanguage) enforces the use of URI from Lexvo with ISO-639-1 values ... we have tried ISO-639-3 but its too much stuff, not really used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants