Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix baked fallback to work with auxiliary keys #4412

Merged
merged 4 commits into from
Dec 23, 2023

Conversation

sffc
Copy link
Member

@sffc sffc commented Dec 6, 2023

I noticed that Baked Data Fallback does not work with auxiliary keys.

LocaleFallbackProvider has the following algorithm:

  1. Initialize fallback iterator with request locale
  2. Repeat:
    1. Load for iterator.get()
    2. If error other than missing locale, return result (success or error)
    3. If iterator.get().is_und(), break
    4. Run iterator.step()
  3. Return missing locale error

Baked Data fallback when "und" is not present has the following algorithm:

  1. Search for request locale; return if found
  2. Initialize fallback iterator with request locale
  3. Repeat:
    1. If iterator.get().is_und(), break
    2. Load for iterator.get()
    3. Search for request locale; return if found
    4. Run iterator.step()

(When "und" is present, it doesn't include the line that breaks out of the loop.)

This breaks because "en-x-3" falls back to "und-x-3" but not all the way to "und". Example:

failures:

---- format::neo::tests::test_basic_pattern_interpolator stdout ----
ICU4X data error: Missing data for locale (key: datetime/symbols/weekdays@1, request: en-x-3)
thread 'format::neo::tests::test_basic_pattern_interpolator' panicked at components/datetime/src/format/neo.rs:988:14:
called `Result::unwrap()` on an `Err` value: Data(DataError { kind: MissingLocale, key: Some(DataKey{datetime/symbols/weekdays@1}), str_context: None, silent: false })

I fixed this by doing the following:

  1. Evaluate the iterator locale before breaking from the loop (this fixes the bug)
  2. Use the same generated code whether or not "und" is present (improves consistency and prevents aux-key-related bugs with no super clear downside: .is_und() is cheap)

@sffc sffc removed request for a team, zbraniecki, gregtatum and nordzilla December 6, 2023 08:23
@sffc sffc changed the title Baked fallback Fix baked fallback to work with auxiliary keys Dec 6, 2023
@@ -525,17 +525,6 @@ impl BakedExporter {
.insert("icu_locid_transform/compiled_data");
let search_direct = search(quote!(req.locale));
let search_iterator = search(quote!(fallback_iterator.get()));
let maybe_err = if keys.contains(&String::from("und")) {
// The loop will terminate on its own
Copy link
Member

@robertbastian robertbastian Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this property of the generated code. It's nice because then it's clear that the loop always terminates with break, and therefore load is infallible. This is useful for the reader, and might even be useful for the compiler (load probably gets inlined). I'd like to keep this property where possible.

The way to do this for keys that have aux keys would be:

  • find all aux keys in keys
  • introduce a check before the loop that throws away a request if the aux key is unknown (no need to run fallback)
  • if every aux key has a und-x-foo, then the loop terminates on its own
    • if only some aux keys have a und-x-foo, that's super weird and should probably be a warning

The first step could also be done by adding static information about allowed aux keys to key metadata. I think for neo datetime the aux keys are a static set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, adding auxiliary keys breaks the invariant that any DataLocale will eventually fall back to und. Even worse, users can break this invariant because we have DataLocale on all our constructors instead of having it as an internal type. This is something we need to consider before stabilizing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can break the invariant whether or not we have DataLocale on our constructors by invoking DataProvider directly. It's trivial to trigger an infinite loop today by using aux keys.

I don't see how making the function have a break changes the inlining properties of the function.

I'm not excited about making BakedExporter have special logic about aux keys. It should treat locales as opaque except for calling .is_und() to terminate the loop.

The UTS 35 locale fallback algorithm concerns itself only with language identifiers, and we still have the invariant that the language identifier eventually falls back to und. ICU4X additionally has the invariant that Unicode extension keywords eventually are removed. I will open a new PR documenting these invariants.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a future where the DataKeyMetadata specifies behavior regarding aux key fallback for that specific DataKey, but I do think the default behavior should be to not fall back on the aux key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's trivial to trigger an infinite loop today by using aux keys.

It's impossible to do this with stable code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and this PR makes it impossible on experimental, too.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me but I'll defer to Rob

@sffc sffc added the discuss-priority Discuss at the next ICU4X meeting label Dec 8, 2023
@sffc
Copy link
Member Author

sffc commented Dec 12, 2023

  • @robertbastian - I don't want to make DataLocale a non-subset of Locale
  • @sffc - icu_preferences will remove DataLocale from the public API of constructors
  • @robertbastian - But LocaleFallbacker still uses DataLocale. And auxiliary subtags will be blocked on releasing icu_preferences
  • @sffc - LocaleFallbacker can just say it ignores private use subtags
  • @robertbastian - If we keep them in DataLocale, we should name them just private use subtags instead of auxiliary subtags
  • @robertbastian - But people might start generating data with private use subtags and then we replace them with auxiliary subtags, which seems brittle
  • @sffc - I would not mind adding a field to DataRequest but we can't do that until 2.0
  • @robertbastian - It could go in DataRequestMetadata in the meantime
  • @sffc - Or we keep it where it is in the meantime
  • @robertbastian - But I don't like the private use subtags either being passed through or replaced with our aux keys; both behaviors are wrong. I don't want to stabilize DataLocale in its current form
  • @Manishearth - I kinda like where we ended up; it seems okay to use private use subtags. We shouldn't accept private use subtags; we should throw them away.
  • @robertbastian - But users give us a DataLocale which can contain auxiliary subtags
  • @Manishearth - Then we just document the behavior. "private use subtags are treated as auxiliary subtags"
  • @sffc - We can't put it on DataRequestMetadata because aux subtags are not Copy
  • @robertbastian - What do you think about this:

Possible solution for 1.5: impl FromStr for DataLocale and impl Writeable for DataLocale (which implies impl Display for DataLocale) throws away aux keys, as does impl From<Locale> for DataLocale and its siblings. We add some other mechanism for stringifying the DataLocale with the auxiliary subtags, public but well documented. Auxiliary subtag functions are public but clear about their semantics. Alternatively, keep these as experimental in 1.5 and don't stabilize things that depend on them.

  • @sffc - This is too thorny. We also have PartialEq, Eq, Ord, .... I don't like the situation already of ToString ignoring them. The auxiliary subtags are part of the data model, period. All the APIs on DataLocale should behave as if they exist.
  • @robertbastian - We shouldn't have stabilized the concept of an auxiliary key on a DataLocale since it is not on a Locale. We should have two concepts of DataLocale, one which is user-facing and one which is provider-facing.
  • @sffc - No matter what we do in 1.5, DataLocale needs to have fields, which may or may not be public, that contain an auxiliary key. Since auxiliary keys are part of the data model, it is simply not true that two instances are equal in Eq if one has auxiliary keys and the other doesn't.
  • @robertbastian - You can have an internal wrapper thing that exposes the other behavior.
  • @sffc - I don't think I like that but I don't have time to discuss further.

Conclusions:

  1. Okay to merge this PR since it is internal
  2. Don't commit or advertise or document experimental features in non-experimental code (i.e. close Document invariants in LocaleFallbacker #4414)
  3. Keep the status quo in 1.5 and don't stabilize things that depend on them in 1.5
  4. In 2.0, put the auxiliary subtags on DataRequest.

LGTM: @sffc @robertbastian

@sffc sffc removed the discuss-priority Discuss at the next ICU4X meeting label Dec 14, 2023
sffc added a commit to sffc/omnicu that referenced this pull request Dec 22, 2023
@sffc
Copy link
Member Author

sffc commented Dec 23, 2023

Rebased and ran datagen again. Given the above agreement, I will proceed to merge this PR since it is increasingly blocking my datetime formatter work.

@sffc sffc merged commit 2caa236 into unicode-org:main Dec 23, 2023
29 checks passed
@sffc sffc deleted the baked-fallback branch December 23, 2023 06:17
sffc added a commit that referenced this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants