Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove auto-deserialize implementations from {Blob,Static,Fs}DataProvider and restructure testdata #2364

Merged
merged 50 commits into from
Sep 6, 2022

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Aug 11, 2022

These are very rarely used and hide the deserialization. If client has a BufferProvider they should use _with_buffer_provider functions, or explicitly deserialize with .as_deserializing() to use _unstable (although this is rarely the right thing to do).

Test data was restructured to match the constructors, and to do more fallbacks:

  • unstable() + unstable_no_fallback()
  • any() + any_no_fallback()
  • buffer() + buffer_no_fallback(), requires the buffer feature.

Most tests have been updated to use icu_testdata::unstable() with the appropriate constructor. This wasn't possible in some cases due to rust-lang/rust#79381, those cases are using icu_testdata::buffer().as_deserializing(), which requires activating the serde feature

@robertbastian
Copy link
Member Author

I'm also thinking of renaming the icu_testdata methods to

  • icu_testdata::postcard() -> impl BufferProvider
  • icu_testdata::json() -> impl BufferProvider
  • icu_testdata::unstable -> ...
  • icu_testdata::unstable_baked -> ...
  • icu_tesdata::unstable_no_fallback -> ...

The unstable_* providers would guarantee compatibility with all _unstable constructors at the same version, but their return types would also not be stable (ideally they would be impl DataProvider<A> + DataProvider<B> + ..., but I don't want to write that out).

Usage would look like this

ListFormatter::try_new_and_unstable(&icu_testdata::unstable(), Width::Wide);
ListFormatter::try_new_and_with_buffer_provider(&icu_testdata::json(), Width::Wide);

@sffc
Copy link
Member

sffc commented Aug 11, 2022

I support this change.

I suggest making a _no_fallback version of all of the functions, like

  • icu_testdata::postcard()
  • icu_testdata::postcard_no_fallback()
  • icu_testdata::json()
  • icu_testdata::json_no_fallback()
  • ...

@sffc
Copy link
Member

sffc commented Aug 11, 2022

For most examples, I think we should continue to use the (newly renamed) icu_testdata::postcard() and try_new_with_buffer_provider(), primarily due to the build time considerations with BakedDataProvider

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(some comments that I hadn't sent yet)

components/collator/fuzz/Cargo.toml Outdated Show resolved Hide resolved
components/datetime/tests/datetime.rs Show resolved Hide resolved
components/normalizer/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got all the way through. Once you address my remaining comments and get CI to be green, we can merge this.

provider/testdata/Cargo.toml Show resolved Hide resolved
// The function is documented to allow panics.
#[allow(clippy::panic)]
#[cfg(feature = "fs")]
pub fn get_json_provider() -> FsDataProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There's nothing to replace FsDataProvider and get_json_provider.

We have the data, and we should expose a canonical provider for it. Please add it back.

Perhaps icu_testdata::json() and icu_testdata::json_no_fallback()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have JSON only as a human-readable postcard version for code reviews, and don't see the need to expose it just because we have it. It's the exact same data as postcard, and nobody is using it (why would they, it's inefficient and might require extra features).


/// A [`BufferProvider`] backed by a Postcard blob.
#[cfg(feature = "buffer")]
pub fn buffer() -> impl BufferProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: I would still prefer icu_testdata::postcard() over icu_testdata::buffer(). Even icu_testdata::blob() would be more precise.

Postcard is a special case of Blob* which is a special case of Buffer.

We have two different types of buffer providers we can expose: the FsDataProvider JSON one and the BlobDataProvider Postcard one.

I also think naming the functions with their actual data source has educational benefit.

I feel less strongly about icu_testdata::unstable() vs icu_testdata::baked() but I would slightly prefer ::baked() for the same reasons. It's also 3 characters shorter.

* I realize that Blob supports only Postcard at this time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to make them symmetric with the constructors. First time users will not know that ::baked() implements the traits required by unstable, or that ::blob() is a BufferProvider. We're exposing fn buffer() -> impl BufferProvider, and I don't think anyone needs to care what the buffer format is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea; I just don't agree with the weighting of priorities.

But I won't block this PR on it. I'd like to discuss later but it'll be an easy refactor-rename to change it.

components/datetime/Cargo.toml Outdated Show resolved Hide resolved
sffc
sffc previously approved these changes Sep 6, 2022
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @robertbastian 😃

@sffc sffc merged commit 03f8e10 into unicode-org:main Sep 6, 2022
@sffc
Copy link
Member

sffc commented Sep 6, 2022

I'm merging this because it is green and it won't be green if I don't merge it

robertbastian added a commit to robertbastian/icu4x that referenced this pull request Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants