Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Datagen API and CLI with FallbackOptions #4710

Merged
merged 42 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a6bdfce
Checkpoint: new APIs plumbed through. Need to test.
sffc Mar 19, 2024
1f19ba4
Make tests pass
sffc Mar 19, 2024
5b2dcd7
Don't impl Display on public types unless we mean it
sffc Mar 19, 2024
6f93495
Fix defaults; make DeduplicationStrategy optional
sffc Mar 19, 2024
ace5856
fmt
sffc Mar 19, 2024
d323114
clippy and docs
sffc Mar 19, 2024
5e13943
Rename and simplify some things
sffc Mar 21, 2024
ec2807a
Refactor code to remove coerce functions
sffc Mar 22, 2024
b2352e3
Add more DatagenDriver options tests
sffc Mar 22, 2024
ce406c8
Fix UndInclusion and wildcard
sffc Mar 22, 2024
4002457
Add empty list test
sffc Mar 24, 2024
d69a9ec
fmt
sffc Mar 24, 2024
1c477c2
'full' instead of '*'
sffc Mar 24, 2024
b25f974
Consolidate all-locales handling into main loop
sffc Mar 24, 2024
24efc92
Add docs and export the types
sffc Mar 24, 2024
2b5268a
Add 1.5 API to tests and mark old API deprecated
sffc Mar 24, 2024
a70b929
LocaleFamily and docs
sffc Mar 24, 2024
7e1cb29
Add PreferredForExporter test
sffc Mar 24, 2024
0ee50e6
Add the 1.5 default options test
sffc Mar 24, 2024
60f78e1
Merge branch 'main' into datagen-neo-driver
sffc Mar 26, 2024
3f50f09
Fix merge
sffc Mar 26, 2024
6a92fb7
Always include und
sffc Mar 26, 2024
777d25d
Remove UndInclusion
sffc Mar 26, 2024
cf8ed69
Make helper function to reduce verbosity of call site in test file
sffc Mar 26, 2024
98f9ed3
Remove RetainBaseLanguages for now and add TODOs
sffc Mar 26, 2024
e691c30
Add CLI for the new options
sffc Mar 26, 2024
883ac51
Reduce usage of old API and options; clippy
sffc Mar 26, 2024
88571b3
Update more call sites to new API
sffc Mar 26, 2024
ed497a0
Fix test failure in blob provider. Needs investigation
sffc Mar 27, 2024
527d4c0
doc
sffc Mar 27, 2024
378fb93
Better fix for the blob provider test failure
sffc Mar 27, 2024
62e75b5
Refactoring
sffc Mar 27, 2024
4f51fa9
Consolidate types and other nits
sffc Mar 27, 2024
28c0ef5
Maybe fix CI
sffc Mar 27, 2024
4a76bf6
oops
sffc Mar 27, 2024
a245cbb
Improve logging
sffc Mar 27, 2024
78e29ac
Fix CLI message
sffc Mar 27, 2024
c5ddea3
NoDeduplication > None
sffc Mar 27, 2024
8460810
LocaleFamily serialization and stuff
sffc Mar 27, 2024
5474401
Fix build?
sffc Mar 27, 2024
1cccda7
Testing
sffc Mar 27, 2024
5bebfbc
Remap LocaleFamilies to LanguageIdentifiers in without-fallback
sffc Mar 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ffi/dart/tools/datagen/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ fn main() {
} else {
Default::default()
})
.with_all_locales()
.with_locales_and_fallback([LocaleFamily::full()], Default::default())
sffc marked this conversation as resolved.
Show resolved Hide resolved
.export(
&DatagenProvider::new_latest_tested(),
BlobExporter::new_with_sink(Box::new(
Expand Down
7 changes: 5 additions & 2 deletions ffi/dart/tools/datagen/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,12 +78,15 @@ fn main() -> eyre::Result<()> {
let locales = matches
.locales
.iter()
.map(|l| l.parse::<LanguageIdentifier>())
.map(|l| {
l.parse::<LanguageIdentifier>()
.map(LocaleFamily::with_descendants)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're doing this in a follow-up: here, as on the main CLI, we should parse directly to LocaleFamily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. Working on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you fixed it in datagen but not here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no you didn't change it in datagen, that's why CI is failing

})
.collect::<Result<Vec<_>, _>>()?;

DatagenDriver::new()
.with_keys(keys)
.with_locales(locales)
.with_locales_and_fallback(locales, Default::default())
.export(
&ReexportableBlobDataProvider(BlobDataProvider::try_new_from_static_blob(
include_bytes!(concat!(core::env!("OUT_DIR"), "/all.blob")),
Expand Down
2 changes: 1 addition & 1 deletion provider/blob/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ log = { version = "0.4", optional = true }

[dev-dependencies]
icu_locid = { path = "../../components/locid", features = ["serde"] }
icu_datagen = { path = "../../provider/datagen", features = ["networking"] }
icu_datagen = { path = "../../provider/datagen", default-features = false, features = ["networking"] }

[target.'cfg(not(target_arch = "wasm32"))'.dev-dependencies]
criterion = "0.4"
Expand Down
2 changes: 1 addition & 1 deletion provider/blob/tests/test_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ const BLOB_V2: &[u8] = include_bytes!("data/v2.postcard");
fn run_driver(exporter: BlobExporter) -> Result<(), DataError> {
DatagenDriver::new()
.with_keys([icu_provider::hello_world::HelloWorldV1Marker::KEY])
.with_all_locales()
.with_locales_and_fallback([LocaleFamily::full()], Default::default())
.export(&DatagenProvider::new_custom(), exporter)
}

Expand Down
110 changes: 101 additions & 9 deletions provider/datagen/src/bin/datagen/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ enum CollationTable {
SearchAll,
}

// Mirrors crate::options::FallbackMode
// Mirrors crate::FallbackMode
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum, Debug)]
enum Fallback {
Auto,
Expand All @@ -56,6 +56,20 @@ enum Fallback {
Preresolved,
}

// Mirrors crate::DeduplicationStrategy
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum, Debug)]
enum DeduplicationStrategy {
Maximal,
None,
}

// Mirrors crate::RuntimeFallbackLocation
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum, Debug)]
enum RuntimeFallbackLocation {
Internal,
External,
}

impl CollationTable {
fn to_datagen_value(self) -> &'static str {
match self {
Expand Down Expand Up @@ -231,8 +245,32 @@ pub struct Cli {

// TODO(#2856): Change the default to Auto in 2.0
#[arg(short, long, value_enum, default_value_t = Fallback::Hybrid)]
#[arg(hide = true, help = "Deprecated: use --deduplication-strategy, --runtime-fallback-location, or --without-fallback")]
fallback: Fallback,

#[arg(long)]
#[arg(
help = "disables locale fallback, instead exporting exactly the locales specified in --locales. \
Cannot be used with --deduplication-strategy, --runtime-fallback-location"
)]
without_fallback: bool,

#[arg(long, value_enum)]
#[arg(help = "configures where runtime fallback should take place in code. \
If not set, determined by the exporter: \
internal fallback is used if the exporter supports it. \
Cannot be used with --without-fallback")]
runtime_fallback_location: Option<RuntimeFallbackLocation>,

#[arg(long, value_enum)]
#[arg(
help = "configures the deduplication of locales for exported data payloads. \
If not set, determined by `runtime_fallback_location`: \
if internal fallback is enabled, a more aggressive deduplication strategy is used. \
Cannot be used with --without-fallback"
)]
deduplication_strategy: Option<DeduplicationStrategy>,

#[arg(long, num_args = 0.., default_value = "recommended")]
#[arg(
help = "Include these segmenter models in the output. Accepts multiple arguments. \
Expand All @@ -246,6 +284,9 @@ impl Cli {
Ok(config::Config {
keys: self.make_keys()?,
locales: self.make_locales()?,
without_fallback: self.make_without_fallback(),
deduplication_strategy: self.make_deduplication_strategy()?,
runtime_fallback_location: self.make_runtime_fallback_location()?,
cldr: self.make_path(&self.cldr_root, &self.cldr_tag, "cldr-root")?,
icu_export: self.make_path(
&self.icuexport_root,
Expand All @@ -272,13 +313,6 @@ impl Cli {
.collect(),
segmenter_models: self.make_segmenter_models()?,
export: self.make_exporter()?,
fallback: match self.fallback {
Fallback::Auto => config::FallbackMode::PreferredForExporter,
Fallback::Hybrid => config::FallbackMode::Hybrid,
Fallback::Runtime => config::FallbackMode::Runtime,
Fallback::RuntimeManual => config::FallbackMode::RuntimeManual,
Fallback::Preresolved => config::FallbackMode::Preresolved,
},
overwrite: self.overwrite,
})
}
Expand Down Expand Up @@ -347,7 +381,7 @@ impl Cli {
self.locales
.iter()
.map(|s| {
s.parse::<LanguageIdentifier>()
s.parse::<LocaleFamily>()
.with_context(|| s.to_string())
})
.collect::<Result<_, eyre::Error>>()?,
Expand All @@ -372,6 +406,64 @@ impl Cli {
})
}

fn make_without_fallback(&self) -> bool {
self.without_fallback || matches!(self.fallback, Fallback::Preresolved)
}

fn make_deduplication_strategy(&self) -> eyre::Result<Option<config::DeduplicationStrategy>> {
match (
self.deduplication_strategy,
self.fallback,
self.without_fallback,
) {
(None, _, true) => Ok(None),
(Some(_), _, true) => {
eyre::bail!("cannot combine --without-fallback and --deduplication-strategy")
}
(Some(x), _, false) => Ok(match x {
DeduplicationStrategy::Maximal => Some(config::DeduplicationStrategy::Maximal),
DeduplicationStrategy::None => Some(config::DeduplicationStrategy::None),
}),
(None, fallback_mode, false) => Ok(match fallback_mode {
Fallback::Auto => None,
Fallback::Hybrid => Some(config::DeduplicationStrategy::None),
Fallback::Runtime => Some(config::DeduplicationStrategy::None),
Fallback::RuntimeManual => Some(config::DeduplicationStrategy::None),
Fallback::Preresolved => None,
}),
}
}

fn make_runtime_fallback_location(
&self,
) -> eyre::Result<Option<config::RuntimeFallbackLocation>> {
match (
self.runtime_fallback_location,
self.fallback,
self.without_fallback,
) {
(None, _, true) => Ok(None),
(Some(_), _, true) => {
eyre::bail!("cannot combine --without-fallback and --runtime-fallback-location")
}
(Some(x), _, false) => Ok(match x {
RuntimeFallbackLocation::Internal => {
Some(config::RuntimeFallbackLocation::Internal)
}
RuntimeFallbackLocation::External => {
Some(config::RuntimeFallbackLocation::External)
}
}),
(None, fallback_mode, false) => Ok(match fallback_mode {
Fallback::Auto => None,
Fallback::Hybrid => Some(config::RuntimeFallbackLocation::External),
Fallback::Runtime => Some(config::RuntimeFallbackLocation::Internal),
Fallback::RuntimeManual => Some(config::RuntimeFallbackLocation::External),
Fallback::Preresolved => None,
}),
}
}

fn make_segmenter_models(&self) -> eyre::Result<config::SegmenterModelInclude> {
Ok(if self.segmenter_models.as_slice() == ["none"] {
config::SegmenterModelInclude::None
Expand Down
12 changes: 9 additions & 3 deletions provider/datagen/src/bin/datagen/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

pub use icu_datagen::{CollationHanDatabase, CoverageLevel, FallbackMode, TrieType};
use icu_datagen::LocaleFamily;
pub use icu_datagen::{
CollationHanDatabase, CoverageLevel, DeduplicationStrategy, FallbackMode,
RuntimeFallbackLocation, TrieType,
};
pub use icu_locid::LanguageIdentifier;
use icu_provider::prelude::*;
use std::collections::{BTreeSet, HashSet};
Expand All @@ -12,8 +16,10 @@ use std::path::{Path, PathBuf};
#[serde(rename_all = "camelCase")]
pub struct Config {
pub keys: KeyInclude,
pub fallback: FallbackMode,
pub locales: LocaleInclude,
pub without_fallback: bool,
pub deduplication_strategy: Option<DeduplicationStrategy>,
pub runtime_fallback_location: Option<RuntimeFallbackLocation>,
#[serde(
default,
skip_serializing_if = "is_default",
Expand Down Expand Up @@ -81,7 +87,7 @@ pub enum LocaleInclude {
Recommended,
All,
None,
Explicit(#[serde(serialize_with = "sorted_set")] HashSet<LanguageIdentifier>),
Explicit(#[serde(serialize_with = "sorted_set")] HashSet<LocaleFamily>),
CldrSet(#[serde(serialize_with = "sorted_set")] HashSet<CoverageLevel>),
}

Expand Down
61 changes: 45 additions & 16 deletions provider/datagen/src/bin/datagen/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -86,23 +86,52 @@ fn main() -> eyre::Result<()> {
config::KeyInclude::Explicit(set) => driver.with_keys(set),
config::KeyInclude::ForBinary(path) => driver.with_keys(icu_datagen::keys_from_bin(path)?),
};
driver = driver.with_fallback_mode(config.fallback);
driver = driver.with_additional_collations(config.additional_collations);
driver = match config.locales {
config::LocaleInclude::All => driver.with_all_locales(),
config::LocaleInclude::None => driver.with_locales([]),
config::LocaleInclude::Explicit(set) => driver.with_locales(set),
config::LocaleInclude::CldrSet(levels) => {
driver.with_locales(provider.locales_for_coverage_levels(levels.iter().copied())?)
}
config::LocaleInclude::Recommended => {
driver.with_locales(provider.locales_for_coverage_levels([
CoverageLevel::Modern,
CoverageLevel::Moderate,
CoverageLevel::Basic,
])?)
}
enum LanguageIdentifiersOrLocaleFamilies {
LanguageIdentifiers(Vec<LanguageIdentifier>),
LocaleFamilies(Vec<LocaleFamily>),
}
use LanguageIdentifiersOrLocaleFamilies::*;
let locales_intermediate: LanguageIdentifiersOrLocaleFamilies = match config.locales {
robertbastian marked this conversation as resolved.
Show resolved Hide resolved
config::LocaleInclude::All => LocaleFamilies(vec![LocaleFamily::full()]),
config::LocaleInclude::None => LanguageIdentifiers(vec![]),
config::LocaleInclude::Explicit(set) => LocaleFamilies(set.into_iter().collect()),
config::LocaleInclude::CldrSet(levels) => LanguageIdentifiers(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I think we've been treating cldr sets as families

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This locales_intermediate just preserves the type things come in as. They get converted to LocaleFamilies in with_fallback and they stay as LanguageIdentifiers in without_fallback.

provider
.locales_for_coverage_levels(levels.iter().copied())?
.into_iter()
.collect(),
),
config::LocaleInclude::Recommended => LanguageIdentifiers(
provider
.locales_for_coverage_levels([
CoverageLevel::Modern,
CoverageLevel::Moderate,
CoverageLevel::Basic,
])?
.into_iter()
.collect(),
),
};
if config.without_fallback {
let locale_families = match locales_intermediate {
LanguageIdentifiers(lids) => lids,
LocaleFamilies(lfs) => eyre::bail!("--without-fallback needs an explicit locale list"),
};
driver = driver.with_locales_no_fallback(locale_families, Default::default());
} else {
let locale_families = match locales_intermediate {
LanguageIdentifiers(lids) => lids
.into_iter()
.map(LocaleFamily::with_descendants)
.collect(),
LocaleFamilies(lfs) => lfs,
};
let mut options: FallbackOptions = Default::default();
options.deduplication_strategy = config.deduplication_strategy;
options.runtime_fallback_location = config.runtime_fallback_location;
driver = driver.with_locales_and_fallback(locale_families, options);
}
driver = driver.with_additional_collations(config.additional_collations);
driver = match config.segmenter_models {
config::SegmenterModelInclude::None => driver.with_segmenter_models([]),
config::SegmenterModelInclude::Recommended => driver.with_segmenter_models([
Expand Down
Loading
Loading