-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on language(s) #195
Comments
12-DEC-2022: Discussed in the SWG. @pvretano will implement the following:
|
This was a very quick turnaround, thanks. I'm confused on point 1: Why replace language with languages? I think both should exist:
I agree with all other points and will align to use alternate instead of self. |
Example from STAC: {
"stac_version": "1.0.0",
"stac_extensions": [
"https://stac-extensions.github.io/language/v1.0.0/schema.json"
],
"type": "Feature",
"id": "item",
"bbox": [...],
"geometry": {
"type": "Polygon",
"coordinates": [...]
},
"properties": {
"datetime": "2020-12-11T22:38:32Z",
"example": "An example product",
"languages": [
"de",
"en"
],
"language": "en"
},
"links": [
{
"href": "https://raw.githubusercontent.com/stac-extensions/language/main/examples/item.json",
"rel": "self",
"hreflang": "en"
},
{
"href": "https://raw.githubusercontent.com/stac-extensions/language/main/examples/de/item.json",
"rel": "alternate",
"hreflang": "de"
},
{
"href": "catalog.json",
"rel": "parent",
"title": "Example STAC Catalog",
"hreflang": "en"
},
{
"href": "catalog.json",
"rel": "root",
"title": "Example STAC Catalog",
"hreflang": "en"
}
],
"assets": {
"data": {
"href": "https://cloud.example.com/examples/file.tif"
},
"metadata": {
"href": "https://cloud.example.com/examples/metadata.xml",
"type": "application/xml",
"hreflang": "en"
},
"metadata_de": {
"href": "https://cloud.example.com/examples/metatdata_DE.xml",
"type": "application/xml",
"hreflang": "de"
}
}
} |
Just a FYI: In a CDB 2.0 datastore, there is a mandatory element 'language' (aka dct:language, PT_Locale) whose content is based on BCP 57 (5646). From the language perspective, OGC API - Records and the STAC API and CDB 2.0 are consistent. |
FYI: In the Testbed-18 ER Secure and Async Catalog (OGC 22-018) section 2.2.2, there is also the following note: NOTE INSPIRE requires the Discovery Service to advertise the default language in the CSW GetCapabilities response. Proposing a similar mechanism to advertise the default language is further work. Possible approaches include:
|
@pvretano Can you confirm that #195 (comment) makes sense to you, too? I'd like to release this behavior into STAC soon and it would be really great to have this aligned between Records and STAC! Here's the corresponding STAC extension: https://github.com/stac-extensions/language#fields-for-catalogs-collections-and-item-properties |
@m-mohr looking at it today. Will update comment once I had reviewed. |
Thanks @pvretano. While you are at it, do you think it makes sense to allow more than just the language codes in languages? So for example instead of just
Only code would be required. |
@m-mohr my original comment was perhaps not as clear as it should have been because it did not distinguish clearly the language of the resource versus the language of the record. The previous "language" tag was meant to encode the language of the resource that the record describes (if there was an associated language). So, changing it to an array allows a set of languages to be associated with the resource (e.g. the resource described by the record is available is English, German, Greek, etc.). The language of the record itself (i.e. the language in which the record is presented to the client) is requested using the "Accept-Language" header when the record is retrieved. That language, however, is currently not explicitly encoded in the record with a specific tag. Rather a "rel=self" link can be included that includes an "hreflang" attribute to indicate the language of the retrieved record. Additional links with "rel=alternate" and "hreflang" attributes can point to additional language representations of the record. Does this all make sense? I am mocking up an example record with language information which I will add to the issue later today. If you think there would be value in explicitly encoding the language of the record in the record itself then I would not be opposed to reintroduing the "language" tag for that purpose ... |
Thank you, @pvretano. This clarifies what the difference between STAC and Records is currently. First and foremost, it is 100% clear and aligned between STAC and Records that in an API context content negotiation is used to request specific languages and report the language of a response. We are also aligned with regards to the For the language you may want to encode multiple things:
To encode the language of a resource we use the hreflang property in links and assets.
In theory, you are right, we don't need these properties at all because it could all be handled through hreflang in links. self link + hreflang could describe the language of the metadata, alternate links + hreflang could describe other available languages, link to data file (resource) + hreflang could describe the language(s) of the resources. This is pretty cumbersome though as you'd need to wade through links to figure this out. Also, in STAC self links are not required as catalogs can be portable and the location may not be known upfront. Also, I'm not overly happy with overloading "alternate" for alternative languages, alternative media types, alternative ... (but that's a different discussion). In the end, the language and languages properties are often just a "summary" and for convenience. Still, I think it would be good to declare this directly without having to look through links with hreflangs. Ultimately, we could also allow for a very verbose solution:
While "language" and "languages" could be aligned between Records and STAC, I'm not so sure about the "resourceLanguages". STAC doesn't need that in many cases and I wasn't able to come up with a good name that describes both cases (assetLanguages vs. resourceLanguages), so we may just have different properties here that don't conflict but share the same structure (as described above). An alternative could be redordLanguage, recordLanguages and languages, but then we'd be less aligned between STAC and Records because record doesn't fit into the STAC terminology. So I'd prefer the first variant, but happy to discuss other ideas and alternatives. What do you think? Would you be open to that? |
@m-mohr just to make sure I understand ...
Is this correct? If yes, that I think I am OK with that. If you verify that that my understanding is correct then I will present to the SWG and report back in this issue. (NOTE: next SWG meeting is on the 23-JAN-2023 ... I hope that is not too late for you). |
Thank you for taking the time, @pvretano. Yes, this is generally correct. I have once concern though about the requirement in the second bullet. You are saying:
I see potential issues here which I mentioned above due to the overloading of the alternate relation type (alternate type vs. alternate language). Here's an example for some links that would not be unusual to see in STAC and I could imaging that it also occurs in Records (although I think you require the Let's say the links are in a metadata document in Greek (i.e. contains {
"href": "../de/item.json",
"rel": "alternate",
"hreflang": "de"
},
{
"href": "../item.json",
"rel": "alternate",
"hreflang": "en"
},
{
"href": "https://stacindex.org/browser/example/de/item.json?uiLanguage=de",
"rel": "alternate",
"type": "text/html",
"hreflang": "de"
},
{
"href": "https://stacindex.org/browser/example/item.json?uiLanguage=en",
"rel": "alternate",
"type": "text/html",
"hreflang": "en"
},
{
"href": "https://stacindex.org/browser/example/item.json?uiLanguage=fr",
"rel": "alternate",
"type": "text/html",
"hreflang": "fr"
},
{
"href": "https://stacindex.org/browser/example/gr/item.json?uiLanguage=gr",
"rel": "alternate",
"type": "text/html",
"hreflang": "gr"
} You see that there are more languages available in the UI than for the metadata. I'd expect that "languages": [
{ "code": "de", "name": "German", "native": "Deutsch" },
{ "code": "en", "name": "English", "native": "English" },
{ "code": "gr", "name": "Greek", "native": "Ελληνικά" }
] So either we make the relationship between languages and the alternate type less demanding or we have to clearly specify the corresponding media types, but that would (at least in STAC) be JSON + GeoJSON (+ missing Thank you for bringing it to the SWG. Jan 23 is fine for me. If it helps I could also join the meeting. I'll also prepare an update for the STAC extension that follows this proposal. |
I just had another idea to "merge" resourceLanguages and languages into languages and just add boolean properties as follows: "languages": [
{ "code": "de", "name": "German", "native": "Deutsch", "record": true, "resource": true },
{ "code": "en", "name": "English", "native": "English", "record": true, "resource": true },
{ "code": "gr", "name": "Greek", "native": "Ελληνικά", "record": true, "resource": false },
{ "code": "fr", "name": "French", "native": "Française", "record": false, "resource": true }
] I'm not sure whether this is a good idea and whether this mixes separate concerns too much so looking for thoughts of others. |
@m-mohr my feeling is that it mixes separate concerns too much but lets give others a chance to chime in with their thoughts ... |
Yeah, happy with that, too. An addition to #195 (comment): Should the languages list contain the current language itself? I'd say for clients it would be good so it would just not be alternate, but alternate + self. |
@m-mohr yes I suppose the languages list should contain the current language as well although that is slightly redundent. Perhaps we can get rid of About this comment ... I hadn't considered that but I would say that the list of lanagues should include all the avilable languages independent of their media type representation. If there is a type dependency, that can be represented in the |
@pvretano Interesting idea about putting the current language first. While I like having all in one place I don't like that it is not very explicit and "the average user" may get confused what the actual language is. It just needs good knowledge of the spec. Alternatively, we could also remove the current language from Example: "language": { "code": "gr", "name": "Greek", "native": "Ελληνικά" },
"languages": [
{ "code": "de", "name": "German", "native": "Deutsch" },
{ "code": "en", "name": "English", "native": "English" }
] I'm not sure about adding adding e.g. the "UI languages" to the languages list. It feels a bit weird to me as it mixes separate concerns. For example, I'm currently making STAC Browser mutli-lingual with right now 6+ planned languages and the metadata only has 2 metadata languages. So the languages list would have 6 entries and that seems a bit excessive to have in the languages list... (but of course I'm relatively biased right now towards the usecase I'm working on) |
I updated the STAC extension to reflect what you proposed here: https://github.com/stac-extensions/language |
@m-mohr I have no strong perference. However if I had to pick I would say ... |
No, not in my eyes. For me |
@m-mohr I could be wrong about the HTML representation ... I'll present to the SWG and see what the others think. |
23-JAN-2023: Is STAC asset language is represented using hreflang in the asset section and there is a rule that basically says that if a STAC record is requested in a specific language AND the asset has associated languages, only the request language is represented in the asset section. So, if the STAC item is requested in Greek and there is a "Greek" asset, only that link will be listed in the asset section. Of course, all this only applies to the API; static records would probably include the links to all the available languages. |
@m-mohr with regard to the |
@pvretano This was just meant as a very simple alternative for "tinkering" in "simpler" environments, e.g. in the Browser where it's not easily possible to send HTTP headers. So I kept it simple. Recently, I've actually thought about removing the parameter altogether and just relying on header. What do you think? What's the general direction OGC APIs go for? I've often seen e.g. |
@m-mohr the usual thinking at OGC is to "recommend" that implementations have a mechanism to mint URLs that need to be embded or for situations where the client does not have easy access to the use of HTTP headers. So, take |
@pvretano Then I'd suggest following the same pattern. As I can't find anything about |
@m-mohr here is the reference to |
@pvretano Thanks, I did not find that (but "f" is also not an ideal search term ;-) ). So you'd add a similar wording for |
@m-mohr yes ... that is my plan. |
PR #211 created to align language handling as per this discussion in this issue. |
@pvretano Added a comment in the PR, thanks. |
01-MAY-2023: Resolved by #211. Closing. |
As far as I can see,
hreflang
is meant to follow RFC 5646 (Language-Tag). For thelanguage
property the format seems undefined. I'd propose to clarify that it uses the same format ashreflang
.Additionally, I'm wondering whether it would be helpful to define a list of available/supported languages, e.g. as a property
languages
, which is an array of languages.Also, how should alternative representations in other languages be communicated in (static) catalogs? Maybe multiple
self
links with differenthreflang
s?I'm asking because I'm writing this up for STAC and would like to align as much as possible.
See also https://github.com/stac-extensions/language and https://github.com/stac-api-extensions/language
The text was updated successfully, but these errors were encountered: