Skip to content
This repository has been archived by the owner on Mar 29, 2024. It is now read-only.

Data updates cohesion and policies #4

Closed
zbraniecki opened this issue Oct 21, 2019 · 3 comments
Closed

Data updates cohesion and policies #4

zbraniecki opened this issue Oct 21, 2019 · 3 comments

Comments

@zbraniecki
Copy link
Member

With a number of crates that use data from Unicode and CLDR tables, it would be benefitial to design a policy and scripting around cohesive updates of those to minimize the scenario where one crate uses Unicode 12, while another is stuck on Unicode 11 etc.

Some open questions are like:

  • Should a CLDR data update be considered a minor or major update?
  • Is there a value in some meta-crate which would collect the subcrates around a particular version of Unicode/CLDR (so, rust-icu 65 would depend on all crates in versions using Unicode 12 and CLDR 36)?
  • Can we design some basic tooling to make updating data of all Unicode related crates easier - point at CLDR dir, update the code for all crates, release.
  • ... ?
@sffc sffc mentioned this issue Oct 21, 2019
@sffc
Copy link
Member

sffc commented Oct 21, 2019

Having thought about this for a while, for how to consume the data, I think the best approach is for the library (like i18n-concept) to define and maintain its own data schema. Pros and cons:

  1. Pro: Decouples the library from breaking changes in CLDR
  2. Pro: Enables us to put the data in a form more suitable for use in code
    • CLDR data is optimized for maintainability, not for ease of consumption
  3. Pro: Several examples of prior art
    • ICU itself converts CLDR data into its own format
    • Google Closure
    • Globalize.js
  4. Con: Requires maintaining CLDR-to-i18n-concept code

In terms of semantic versioning of i18n-concept, CLDR and Unicode data change so often that I think it is impractical to bump the major version for every update. That would result in users being "locked in" to an old version simply as a result of their package system refusing to automatically update to the new major version. (I am speaking based on my knowledge how npm modules work with semantic versioning; let me know if this is different in the Rust ecosystem.)

@sffc
Copy link
Member

sffc commented Oct 21, 2019

I could see something like,

  • Crate i18n-concept-data-file
    • Default data provider, reading from a data file
    • Major version: breaking changes in data schema (infrequent)
    • Minor version: updates to CLDR/Unicode
    • Patch version: bug fixes
  • Crate i18n-concept-data-http
    • Data provider pulling from a web service
    • Same semantic versioning as i18n-concept-data-file
  • Crate i18n-concept-numberformat
    • Contains no data, only code
    • Major version: bumped whenever API changes or when upgrading to a new major data version
    • Minor version: new API, behavior-changing bug fixes
    • Patch version: internal bug fixes

(Note: I am using "i18n-concept" as a placeholder name; this is not intended to be the final name)

@sffc
Copy link
Member

sffc commented Jun 30, 2020

The doc data-pipeline.md in the ICU4X repo covers these issues well. Closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants