Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on language(s) #195

Closed
m-mohr opened this issue Dec 12, 2022 · 32 comments
Closed

Clarification on language(s) #195

m-mohr opened this issue Dec 12, 2022 · 32 comments
Assignees

Comments

@m-mohr
Copy link
Contributor

m-mohr commented Dec 12, 2022

As far as I can see, hreflang is meant to follow RFC 5646 (Language-Tag). For the language property the format seems undefined. I'd propose to clarify that it uses the same format as hreflang.

Additionally, I'm wondering whether it would be helpful to define a list of available/supported languages, e.g. as a property languages, which is an array of languages.

Also, how should alternative representations in other languages be communicated in (static) catalogs? Maybe multiple self links with different hreflangs?

I'm asking because I'm writing this up for STAC and would like to align as much as possible.
See also https://github.com/stac-extensions/language and https://github.com/stac-api-extensions/language

@pvretano
Copy link
Contributor

12-DEC-2022: Discussed in the SWG. @pvretano will implement the following:

  1. Change "language" to "languages" which will be an array of available languages.
  2. The language representation will be as per RFC-5646.
  3. In a static record, you would point to the other "lanuage" representations of the record using "alternate" links with the appropriate "hreflang".
  4. In an API you would just use standard language negotiation (i.e. Accept-Language).

@pvretano pvretano self-assigned this Dec 12, 2022
@m-mohr
Copy link
Contributor Author

m-mohr commented Dec 12, 2022

This was a very quick turnaround, thanks.

I'm confused on point 1: Why replace language with languages? I think both should exist:

  1. language specifies the actual language you've received.
  2. languages specified which languages are available (this is more or less a shortcut for checking language + all alternate links for the set of available hreflang's)

I agree with all other points and will align to use alternate instead of self.

@m-mohr
Copy link
Contributor Author

m-mohr commented Dec 12, 2022

Example from STAC:

{
  "stac_version": "1.0.0",
  "stac_extensions": [
    "https://stac-extensions.github.io/language/v1.0.0/schema.json"
  ],
  "type": "Feature",
  "id": "item",
  "bbox": [...],
  "geometry": {
    "type": "Polygon",
    "coordinates": [...]
  },
  "properties": {
    "datetime": "2020-12-11T22:38:32Z",
    "example": "An example product",
    "languages": [
      "de",
      "en"
    ],
    "language": "en"
  },
  "links": [
    {
      "href": "https://raw.githubusercontent.com/stac-extensions/language/main/examples/item.json",
      "rel": "self",
      "hreflang": "en"
    },
    {
      "href": "https://raw.githubusercontent.com/stac-extensions/language/main/examples/de/item.json",
      "rel": "alternate",
      "hreflang": "de"
    },
    {
      "href": "catalog.json",
      "rel": "parent",
      "title": "Example STAC Catalog",
      "hreflang": "en"
    },
    {
      "href": "catalog.json",
      "rel": "root",
      "title": "Example STAC Catalog",
      "hreflang": "en"
    }
  ],
  "assets": {
    "data": {
      "href": "https://cloud.example.com/examples/file.tif"
    },
    "metadata": {
      "href": "https://cloud.example.com/examples/metadata.xml",
      "type": "application/xml",
      "hreflang": "en"
    },
    "metadata_de": {
      "href": "https://cloud.example.com/examples/metatdata_DE.xml",
      "type": "application/xml",
      "hreflang": "de"
    }
  }
}

@cnreediii
Copy link

Just a FYI: In a CDB 2.0 datastore, there is a mandatory element 'language' (aka dct:language, PT_Locale) whose content is based on BCP 57 (5646). From the language perspective, OGC API - Records and the STAC API and CDB 2.0 are consistent.

@ycespb
Copy link

ycespb commented Dec 19, 2022

FYI: In the Testbed-18 ER Secure and Async Catalog (OGC 22-018) section 2.2.2, there is also the following note:

NOTE INSPIRE requires the Discovery Service to advertise the default language in the CSW GetCapabilities response. Proposing a similar mechanism to advertise the default language is further work. Possible approaches include:

  • Advertise default language in the OGC API-Records Landing Page response
  • Advertise default language in the OGC API-Records API Definition response
  • Use of hreflang in the link object referring to the search endpoint (/collections/{id}/items) in a Collection response.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 9, 2023

@pvretano Can you confirm that #195 (comment) makes sense to you, too? I'd like to release this behavior into STAC soon and it would be really great to have this aligned between Records and STAC!

Here's the corresponding STAC extension: https://github.com/stac-extensions/language#fields-for-catalogs-collections-and-item-properties

@pvretano
Copy link
Contributor

@m-mohr looking at it today. Will update comment once I had reviewed.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 10, 2023

Thanks @pvretano. While you are at it, do you think it makes sense to allow more than just the language codes in languages?

So for example instead of just "languages": ["de", "en-US", "gr"] we could also think about a bit more, which could be helfpul for clients. For example:

"languages": [
  { "code": "de", "name": "German", "native": "Deutsch", "dir": "ltr" },
  { "code": "en-US", "name": "English (US)", "native": "English (US)", "dir": "ltr" },
  { "code": "gr", "name": "Greek", "native": "Ελληνικά", "dir": "ltr" }
]

Only code would be required.

@pvretano
Copy link
Contributor

@m-mohr my original comment was perhaps not as clear as it should have been because it did not distinguish clearly the language of the resource versus the language of the record.

The previous "language" tag was meant to encode the language of the resource that the record describes (if there was an associated language). So, changing it to an array allows a set of languages to be associated with the resource (e.g. the resource described by the record is available is English, German, Greek, etc.).

The language of the record itself (i.e. the language in which the record is presented to the client) is requested using the "Accept-Language" header when the record is retrieved. That language, however, is currently not explicitly encoded in the record with a specific tag. Rather a "rel=self" link can be included that includes an "hreflang" attribute to indicate the language of the retrieved record. Additional links with "rel=alternate" and "hreflang" attributes can point to additional language representations of the record.

Does this all make sense?

I am mocking up an example record with language information which I will add to the issue later today.
I like the encoding of "languages" that that you present above so I will use that.

If you think there would be value in explicitly encoding the language of the record in the record itself then I would not be opposed to reintroduing the "language" tag for that purpose ...

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 11, 2023

Thank you, @pvretano. This clarifies what the difference between STAC and Records is currently.

First and foremost, it is 100% clear and aligned between STAC and Records that in an API context content negotiation is used to request specific languages and report the language of a response. We are also aligned with regards to the hreflang property. Unfortunatly, there are also static catalogs - both in STAC and Records. Here content negotiation is often not available as such we need an alternative. Also, it is often useful to replicate imporant headers (e.g. the content language) in the body because if you store a response to a (local) file, you loose the (language) headers, but it could still be useful to have these information. Thus, my aim was to find a solution that works without headers for static catalogs and can also be useful in the context of APIs, I think.

For the language you may want to encode multiple things:

  • current language of the metadata (i.e. Record, STAC Item/Catalog)
  • all available languages of the metadata
  • language of a resource (e.g. STAC Assets)

To encode the language of a resource we use the hreflang property in links and assets.
Now the difference comes up:

  • STAC currently uses the language property for the current language of the metadata and the languages property for all available languages of the metadata. This is probably due to the fact that for STAC the metadata language is often not necessarily the language of the resource (imagery usually doesn't have a language).
  • Records uses these properties to describe the resources though!

In theory, you are right, we don't need these properties at all because it could all be handled through hreflang in links. self link + hreflang could describe the language of the metadata, alternate links + hreflang could describe other available languages, link to data file (resource) + hreflang could describe the language(s) of the resources.

This is pretty cumbersome though as you'd need to wade through links to figure this out. Also, in STAC self links are not required as catalogs can be portable and the location may not be known upfront. Also, I'm not overly happy with overloading "alternate" for alternative languages, alternative media types, alternative ... (but that's a different discussion). In the end, the language and languages properties are often just a "summary" and for convenience.

Still, I think it would be good to declare this directly without having to look through links with hreflangs.

Ultimately, we could also allow for a very verbose solution:

  • current language of the metadata: language (shall equal to the hreflang in the self link, just the language code)
  • all available languages of the metadata: languages (shall correspond to the hreflangs of the alternate + self links, but may contain additional properties)
  • language of a resource: resourceLanguages (shall correspond to the hreflangs of the resource links, but may contain additional properties)

While "language" and "languages" could be aligned between Records and STAC, I'm not so sure about the "resourceLanguages". STAC doesn't need that in many cases and I wasn't able to come up with a good name that describes both cases (assetLanguages vs. resourceLanguages), so we may just have different properties here that don't conflict but share the same structure (as described above). An alternative could be redordLanguage, recordLanguages and languages, but then we'd be less aligned between STAC and Records because record doesn't fit into the STAC terminology. So I'd prefer the first variant, but happy to discuss other ideas and alternatives.

What do you think? Would you be open to that?

@pvretano
Copy link
Contributor

pvretano commented Jan 11, 2023

@m-mohr just to make sure I understand ...

  • language is the language of the record in hand and is equal to the hreflang value of the self link if it exists and has an hreflang specified
  • languages is the list of other languages that the record can be requested in; if there are alternate links in the record with hreflang attributes, the hreflang values must exists in this languages list
  • resourceLanguages is the list of languages in which the resource being described by the record is available in.
  • the structure of the lanugages and resourceLanguages properties shall be as you presented in this comment
  • all language codes shall conform to RFC-5646

Is this correct? If yes, that I think I am OK with that. If you verify that that my understanding is correct then I will present to the SWG and report back in this issue. (NOTE: next SWG meeting is on the 23-JAN-2023 ... I hope that is not too late for you).

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 12, 2023

Thank you for taking the time, @pvretano. Yes, this is generally correct.

I have once concern though about the requirement in the second bullet. You are saying:

if there are alternate links in the record with hreflang attributes, the hreflang values must exists in this languages list

I see potential issues here which I mentioned above due to the overloading of the alternate relation type (alternate type vs. alternate language). Here's an example for some links that would not be unusual to see in STAC and I could imaging that it also occurs in Records (although I think you require the type, right?):

Let's say the links are in a metadata document in Greek (i.e. contains "language": "gr")

{
  "href": "../de/item.json",
  "rel": "alternate",
  "hreflang": "de"
},
{
  "href": "../item.json",
  "rel": "alternate",
  "hreflang": "en"
},
{
  "href": "https://stacindex.org/browser/example/de/item.json?uiLanguage=de",
  "rel": "alternate",
  "type": "text/html",
  "hreflang": "de"
},
{
  "href": "https://stacindex.org/browser/example/item.json?uiLanguage=en",
  "rel": "alternate",
  "type": "text/html",
  "hreflang": "en"
},
{
  "href": "https://stacindex.org/browser/example/item.json?uiLanguage=fr",
  "rel": "alternate",
  "type": "text/html",
  "hreflang": "fr"
},
{
  "href": "https://stacindex.org/browser/example/gr/item.json?uiLanguage=gr",
  "rel": "alternate",
  "type": "text/html",
  "hreflang": "gr"
}

You see that there are more languages available in the UI than for the metadata. I'd expect that languages would be something like the following (i.e. not include French):

"languages": [
  { "code": "de", "name": "German", "native": "Deutsch" },
  { "code": "en", "name": "English", "native": "English" },
  { "code": "gr", "name": "Greek", "native": "Ελληνικά" }
]

So either we make the relationship between languages and the alternate type less demanding or we have to clearly specify the corresponding media types, but that would (at least in STAC) be JSON + GeoJSON (+ missing type as type is not required in STAC yet).

Thank you for bringing it to the SWG. Jan 23 is fine for me. If it helps I could also join the meeting. I'll also prepare an update for the STAC extension that follows this proposal.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 12, 2023

I just had another idea to "merge" resourceLanguages and languages into languages and just add boolean properties as follows:

"languages": [
  { "code": "de", "name": "German", "native": "Deutsch", "record": true, "resource": true },
  { "code": "en", "name": "English", "native": "English", "record": true, "resource": true },
  { "code": "gr", "name": "Greek", "native": "Ελληνικά", "record": true, "resource": false },
  { "code": "fr", "name": "French", "native": "Française", "record": false, "resource": true }
]

I'm not sure whether this is a good idea and whether this mixes separate concerns too much so looking for thoughts of others.

@pvretano
Copy link
Contributor

@m-mohr my feeling is that it mixes separate concerns too much but lets give others a chance to chime in with their thoughts ...

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 12, 2023

Yeah, happy with that, too.

An addition to #195 (comment): Should the languages list contain the current language itself? I'd say for clients it would be good so it would just not be alternate, but alternate + self.

@pvretano
Copy link
Contributor

@m-mohr yes I suppose the languages list should contain the current language as well although that is slightly redundent. Perhaps we can get rid of language tag and simple say the first item in the languages list is the language of the record in hand.

About this comment ... I hadn't considered that but I would say that the list of lanagues should include all the avilable languages independent of their media type representation. If there is a type dependency, that can be represented in the alternate links via the type attribute and/or negotiated between the client and server using the normal HTTP contant type and language negotiation handshake. Your thoughts?

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 12, 2023

@pvretano Interesting idea about putting the current language first. While I like having all in one place I don't like that it is not very explicit and "the average user" may get confused what the actual language is. It just needs good knowledge of the spec. Alternatively, we could also remove the current language from languages and instead of just proving a code for language use the "language object" from above als there. Phew... no strong preference right now.

Example:

"language": { "code": "gr", "name": "Greek", "native": "Ελληνικά" },
"languages": [
  { "code": "de", "name": "German", "native": "Deutsch" },
  { "code": "en", "name": "English", "native": "English" }
]

I'm not sure about adding adding e.g. the "UI languages" to the languages list. It feels a bit weird to me as it mixes separate concerns. For example, I'm currently making STAC Browser mutli-lingual with right now 6+ planned languages and the metadata only has 2 metadata languages. So the languages list would have 6 entries and that seems a bit excessive to have in the languages list...

(but of course I'm relatively biased right now towards the usecase I'm working on)

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 12, 2023

I updated the STAC extension to reflect what you proposed here: https://github.com/stac-extensions/language

@pvretano
Copy link
Contributor

@m-mohr I have no strong perference. However if I had to pick I would say ... language for the current languages. languages for the list of other available languages. So, the current language is NOT in the list of other languages.
Still think that the list of other languages should contain all the available other languages regardless of the representation. The HTML representation is as valid as any other and likely one of the more common represenations ... no?
I'll review the STAC extension write up later today ...

@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 12, 2023

The HTML representation is as valid as any other and likely one of the more common represenations ... no?

No, not in my eyes. For me languages is the list in which the source metadata files are available. The STAC clients usually only work with the source metadata (JSON) variants and all other are just spit out or ignored. But I guess I could filter the languages somehow...

@pvretano
Copy link
Contributor

@m-mohr I could be wrong about the HTML representation ... I'll present to the SWG and see what the others think.

@pvretano
Copy link
Contributor

23-JAN-2023: Is STAC asset language is represented using hreflang in the asset section and there is a rule that basically says that if a STAC record is requested in a specific language AND the asset has associated languages, only the request language is represented in the asset section. So, if the STAC item is requested in Greek and there is a "Greek" asset, only that link will be listed in the asset section. Of course, all this only applies to the API; static records would probably include the links to all the available languages.

@pvretano
Copy link
Contributor

@m-mohr with regard to the language parameter in the STAC API language proposal, why is it only a single language? Can't its value be the same string as that used for the Accept-Langauge header with the same semantics (e.g. `langauge=de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.3``)?

@m-mohr
Copy link
Contributor Author

m-mohr commented Feb 20, 2023

@pvretano This was just meant as a very simple alternative for "tinkering" in "simpler" environments, e.g. in the Browser where it's not easily possible to send HTTP headers. So I kept it simple. Recently, I've actually thought about removing the parameter altogether and just relying on header. What do you think? What's the general direction OGC APIs go for? I've often seen e.g. ?f=json in OGC API implementations as an alternative to Accept headers, which would somewhat align with the current specification of ?language=de, it seems.

@pvretano
Copy link
Contributor

@m-mohr the usual thinking at OGC is to "recommend" that implementations have a mechanism to mint URLs that need to be embded or for situations where the client does not have easy access to the use of HTTP headers. So, take f for example. That is not part of the specification per se. It is just an example for creating URLs where the output format can be specified. I guess it would be the same thing with a language parameter. It would not be "standard" but only a suggestion that implementations create a mechanism for requesting records in a specific language when access to the HTTP headers is not feasible. In all cases the HTTP way is the normative way.

@m-mohr
Copy link
Contributor Author

m-mohr commented Feb 20, 2023

@pvretano Then I'd suggest following the same pattern. As I can't find anything about f in the specs (features, records), I'd also remove it from the STAC API - Languages extension.

@pvretano
Copy link
Contributor

pvretano commented Feb 20, 2023

@m-mohr here is the reference to f in Features ... https://docs.opengeospatial.org/is/17-069r4/17-069r4.html#encodings
It's in the NOTE in that section ...

@m-mohr
Copy link
Contributor Author

m-mohr commented Feb 20, 2023

@pvretano Thanks, I did not find that (but "f" is also not an ideal search term ;-) ). So you'd add a similar wording for language or accept-language into Records? Then I'd just refer back to that in the STAC API extension.

@pvretano
Copy link
Contributor

@m-mohr yes ... that is my plan.

@pvretano
Copy link
Contributor

PR #211 created to align language handling as per this discussion in this issue.

@m-mohr
Copy link
Contributor Author

m-mohr commented Feb 23, 2023

@pvretano Added a comment in the PR, thanks.

@pvretano
Copy link
Contributor

pvretano commented May 1, 2023

01-MAY-2023: Resolved by #211. Closing.

@pvretano pvretano closed this as completed May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants