Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: add 'language' fields for assets #161

Closed
billgeo opened this issue Nov 15, 2021 · 4 comments · Fixed by #173
Closed

Feature request: add 'language' fields for assets #161

billgeo opened this issue Nov 15, 2021 · 4 comments · Fixed by #173
Assignees
Labels
enhancement New feature or request

Comments

@billgeo
Copy link
Contributor

billgeo commented Nov 15, 2021

Is your feature request related to a problem? Please describe.

So that a user has some additional information about the assets, we want to add the language used in the dataset, and also the character set encoding of the data

Note: after discussion on Slack decided to leave out character set info for now as it is assumed all LINZ data will be in UTF-8. Enforcing this will be done by the Data Management COP / practice lead. Also decided to use IETF codes for languages/locales rather than ISO standards.

Describe the solution you'd like

Language

Location: Assets
Name: linz:language
Description: "The language code and local of the language used in the data. For example, en_NZ for New Zealand English."
Type: "Recommended for internal data. Mandatory for published data."
Restrictions: "Use IETF rfc 5646 for enum list. Precompiled lists 1 and 2"

Character set

~~Location: inside the asset object (can either be in item or collection)
Name: linz:character_set
Description: "Character encoding standard used for the data asset."
Type: "Recommended for internal data. Mandatory for published data."
Restrictions: UTF-8 only ~~

Describe alternatives you've considered
[A clear and concise description of any alternative solutions or features you've considered.]

Additional context
[Add any other context or screenshots about the feature request here.]

@billgeo billgeo added the enhancement New feature or request label Nov 15, 2021
@billgeo billgeo changed the title Feature request: add character set and language fields for assets Feature request: add 'character set' and 'language' fields for assets Nov 15, 2021
@l0b0
Copy link
Contributor

l0b0 commented Nov 15, 2021

Why those specific standards?

  • In my experience IETF language codes are much more common than ISO 639-2.
  • IETF language codes allow (but does not mandate) specifying languages as written within different locales, so you can use "en" for English or "en_NZ" for New Zealand English. It doesn't look like ISO 639-2 supports this.
  • UTF-8 is much more common than ISO/IEC 10646:2020.
  • ISO/IEC 10646:2020 has a full seven encoding schemes.
  • ISO standards are expensive and not publicly available.
  • "JSON exchange in an open ecosystem must be encoded in UTF-8."

@l0b0
Copy link
Contributor

l0b0 commented Nov 15, 2021

If we really want to support non-portable JSON we might want to use the word "encoding" rather than "character set".

@billgeo billgeo changed the title Feature request: add 'character set' and 'language' fields for assets Feature request: add 'language' fields for assets Nov 17, 2021
@kodiakhq kodiakhq bot closed this as completed in #173 Nov 22, 2021
@billgeo billgeo reopened this Nov 22, 2021
@billgeo
Copy link
Contributor Author

billgeo commented Nov 22, 2021

@l0b0 just noticed that your PR for this had this as a required property. It's supposed to be optional. Can you please do another PR to resolve that?

@billgeo
Copy link
Contributor Author

billgeo commented Nov 24, 2021

Closing

@billgeo billgeo closed this as completed Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

2 participants