Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual support in DCAT profiles #318

Merged
merged 10 commits into from
Oct 31, 2024
Merged

Multilingual support in DCAT profiles #318

merged 10 commits into from
Oct 31, 2024

Conversation

amercader
Copy link
Member

This builds on excellent code started by @stefina and @JVickery-TBS in #124 and #240 respectively, but adapting it to the current profiles and generalizing it for maximum compatibility.

Multilingual support is provided via integration with ckanext-fluent, the supported way of implementing translations for CKAN fields.

At the serialization level, a new triple will be added for each of the defined languages (if the translation is present):

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.org/dataset/0112cf32-bce0-4071-9504-923375f9f2ad> a dcat:Dataset ;
    dct:title "Conjunt de dades de prova DCAT"@ca,
        "Test DCAT dataset"@en,
        "Conjunto de datos de prueba DCAT"@es ;
    dct:description "Una descripció qualsevol"@ca,
        "Some description"@en,
        "Una descripción cualquiera"@es ;
    dct:language "ca",
        "en",
        "es" ;
    dct:provenance [ a dct:ProvenanceStatement ;
        rdfs:label "Una declaració sobre la procedència"@ca,
            "Statement about provenance"@en,
            "Una declaración sobre la procedencia"@es ] ;

When parsing, the parsers will import properties from DCAT serializations in the expected format if the field is defined as fluent in
the schema:

{
    "name": "test-dataset",
    "provenance": {
        "en": "Statement about provenance",
        "ca": "Una declaració sobre la procedència",
        "es": "Una declaración sobre la procedencia"
    }
}

As implemented in #124, if one of the languages is missing in the DCAT serialization, an empty string will be returned for that language. Also if the DCAT serialization does not define the language used, the default CKAN language will be used (ckan.locale_default).

@JVickery-TBS this covers most of your changes in #240 except for the handling of translated fields in publishers / organizations. As it's difficult to come up with a logic that works in the many different scenarios, this is best suited in a small custom profile. But let me know if I missed anything else besides this issue.

cc @seitenbau-govdata

`_add_triple_from_dict()` will check if the value is a dict and assume
it's a fluent field (i.e `{"lang1": "value_lang1", "lang2":
`value_lang2"}. `URIRefOrLiteral` also supports a lang parameter
Created multilingual versions of _object_value() and
_object_value_list() that store the different translations in the format
expected by the fluent fields, e.g.:

{
    "en": "Dataset title",
    "es": "Título del conjunto de datos"
}

and for tags:

{
    "en": ["Oaks", "Pines"],
    "es": ["Robles", "Pinos"],
}

Core fields (those ending in `_translated` are handled separately)
@amercader amercader merged commit ac1c34b into master Oct 31, 2024
8 checks passed
@amercader amercader deleted the multilingual branch October 31, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant