-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BaseLanguageHandling option to Datagen #4606
base: main
Are you sure you want to change the base?
Conversation
@@ -370,23 +385,32 @@ impl DatagenDriver { | |||
.iter() | |||
.try_for_each(|(locale, (payload, _duration))| { | |||
let mut iter = fallbacker_with_config.fallback_for(locale.clone()); | |||
let mut first = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effect of first
is that it prevents non-base locales such as tlh-001
from being retained. I'm not sure if this is the behavior that we desire; it only comes up when an unsupported locale is requested and that locale is not a base language. I lean toward removing the check on first
, which would cause tlh-001
to be retained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to remove first
, but we should probably print a warning alerting the user that the requested locale is not supported; if that's not the current behaviour. I think that should be enough indication that using the provider as is could return incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Today I'm thinking that we probably should not include either tlh-001
or arc
since there is no CLDR data for them, and therefore they are not really "supported". I'll work on that tweak.
/// Sets the base language handling in runtime fallback mode. | ||
/// | ||
/// For more details, see [`BaseLanguageHandling`]. | ||
pub fn with_base_language_handling(self, base_language_handling: BaseLanguageHandling) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: this option is only valid for a specific fallback mode. I'd prefer modelling it with a new fallback mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it applies for both Runtime
and RuntimeManual
. It also applies for PreferredForExporter
in certain cases; an user who wants to enable resolvedLocale queries but doesn't want to pick a specific fallback type. This means we would need to add 3 more fallback variants + any other variant that is added in the future that is similar to Runtime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about PreferredForExporter
. With that option, the exporter can state its preferred fallback mode, but that question is completely separate from whether to include the base languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exporter does not state its preferred fallback mode, it states supports_built_in_fallback
. Datagen translates this to an actual mode, which we are free to change: true
can be RuntimeThin
, and false
HybridThin
, as you are currently making this the default already.
/// By default, locales that inherit directly from "und" will be retained even if their data | ||
/// is equal to "und". To override this behavior, see [`BaseLanguageHandling`]. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: I'm not convinced this is a good default. This increases baked data size, but supported locale queries are not a universal use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to discuss the default. Given that it's needed for 402 and we've seen enough examples of use cases where people may want to do these queries out of the box, and the data size increase is fairly small (no regional variants), I lean toward including these by default. It's easy for power users to get the extra 5% data size benefit if they know what they are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly lean towards the default of Strip. I think either way we will need careful documentation of supported locale queries and how they can work. We should do that documentation and clearly document the defaults.
Current:
Proposed to add:
Deduplicated / Runtime2 {
use_internal_fallback: bool,
include_base_languages: bool,
} Now: #[non_exhaustive]
pub enum FallbackMode {
#[default]
DefaultForProvider,
// same as Deduplicated { RuntimeFallbackLocation::Internal, BaseLanguageHandling::Strip }
Runtime,
// same as Deduplicated { RuntimeFallbackLocation::External, BaseLanguageHandling::Strip }
RuntimeManual,
Hybrid,
Preresolved,
Deduplicated(Deduplicated),
} 2.0: #[non_exhaustive]
pub enum LocaleExpansionMode {
Deduplicated(Deduplicated),
Exhaustive(Exhaustive),
Preresolved(Preresolved),
}
#[non_exhaustive]
pub enum RuntimeFallbackLocation {
Internal,
External,
}
#[non_exhaustive]
pub enum BaseLanguageHandling {
#[default]
Retain,
Strip,
}
#[non_exhaustive]
#[derive(Default)]
pub struct Deduplicated {
runtime_fallback_location: Option<RuntimeFallbackLocation>,
base_language_handling: BaseLanguageHandling,
}
// resolve this against the exporter like
foo.runtime_fallback_location.unwrap_or_else(||
if exporter.supports_runtime_fallback() {
RuntimeFallbackLocation::Internal
} else {
RuntimeFallbackLocation::External
}
)
#[non_exhaustive]
pub struct Exhaustive {}
#[non_exhaustive]
pub struct Preresolved {} LGTM: @Manishearth @sffc @robertbastian |
@robertbastian What do you want the CLI to look like in 1.5 and in 2.0? |
I guess for now we flatten the enum like |
If we are going to end up with a |
It doesn't, because then someone will set |
We have a lot of flags that are only used if some other flag is set. This would be the same. |
@robertbastian I would like to merge this as-is because:
|
With my engine maintainer hat on, I have to agree with @sffc. Boa will require all engine embedders to use this feature, so from a docs perspective it's easier to guide our users to activate a simple flag than to explain to them which fallback variants are supported and how to activate them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still consider this additional flag tech debt.
#[arg(long, value_enum, default_value_t = BaseLanguageHandling::Auto)] | ||
#[arg( | ||
help = "--fallback=runtime[-manual] only: Whether to retain or strip base languages. \ | ||
For details, read the Rust docs for BaseLanguageHandling." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's call the local one BaseLanguageHandlingArg, to avoid confusion when looking at the code. The one above has no docs.
Alternatively give it docs saying to look at the type in the config module.
/// By default, locales that inherit directly from "und" will be retained even if their data | ||
/// is equal to "und". To override this behavior, see [`BaseLanguageHandling`]. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly lean towards the default of Strip. I think either way we will need careful documentation of supported locale queries and how they can work. We should do that documentation and clearly document the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of discussion going on in #4629 which I'd like to resolve before merging anything.
Need to re-do this on top of #4710 |
Probably super seeded by #4836. |
See #58
I implemented the new option as
BaseLanguageHandling::Retain
orBaseLanguageHandling::Strip
.