Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding BlobDataProvider for dynamically loaded data blobs #1084

Merged
merged 11 commits into from
Sep 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions provider/blob/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ postcard = { version = "0.7.0" }
erased-serde = { version = "0.3", default-features = false, features = ["alloc"] }
litemap = { version = "0.2.0", path = "../../utils/litemap/", features = ["serde"] }
writeable = { path = "../../utils/writeable" }
yoke = { path = "../../utils/yoke" }

# For the export feature
log = { version = "0.4", optional = true }
Expand Down
8 changes: 6 additions & 2 deletions provider/blob/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
`icu_provider_blob` contains implementations of the [`ICU4X`] [`DataProvider`] interface
that load data from a single blob.

Currently, this crate supports only static blobs, but it will soon support blobs loaded
dynamically at runtime (see [#848](https://github.com/unicode-org/icu4x/issues/848)).
There are two exports:

1. [`BlobDataProvider`] supports data blobs loaded dynamically at runtime.
2. [`StaticDataProvider`] supports data blobs baked into the binary at compile time.

To build blob data, use the `--format blob` option of [`icu4x-datagen`]. For example, to build
"hello world" data, run:
Expand All @@ -26,6 +28,8 @@ Create a [`StaticDataProvider`] from pre-built test data:
let _ = icu_testdata::get_static_provider();
```

For more examples, see the specific data providers.

[`ICU4X`]: ../icu/index.html
[`DataProvider`]: icu_provider::prelude::DataProvider
[`icu4x-datagen`]: https://github.com/unicode-org/icu4x/tree/main/tools/datagen#readme
Expand Down
136 changes: 136 additions & 0 deletions provider/blob/src/blob_data_provider.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::blob_schema::BlobSchema;
use crate::path_util;
use alloc::rc::Rc;
use alloc::string::String;
use icu_provider::prelude::*;
use icu_provider::serde::{SerdeDeDataProvider, SerdeDeDataReceiver};
use serde::de::Deserialize;
use yoke::trait_hack::YokeTraitHack;
use yoke::*;

/// A data provider loading data from blobs dynamically created at runtime.
///
/// This enables data blobs to be read from the filesystem or from an HTTP request dynamically
/// at runtime, so that the code and data can be shipped separately.
///
/// If you prefer to bake the data into your binary, see [`StaticDataProvider`].
///
/// # Examples
///
/// ```
/// use icu_locid_macros::langid;
/// use icu_provider::prelude::*;
/// use icu_provider::hello_world::*;
/// use icu_provider_blob::BlobDataProvider;
/// use std::fs::File;
/// use std::io::Read;
/// use std::rc::Rc;
///
/// // Read an ICU4X data blob dynamically:
/// let mut blob: Vec<u8> = Vec::new();
/// let filename = concat!(
/// env!("CARGO_MANIFEST_DIR"),
/// "/tests/data/hello_world.postcard",
/// );
/// File::open(filename)
/// .expect("File should exist")
/// .read_to_end(&mut blob)
/// .expect("Reading pre-computed postcard buffer");
///
/// // Create a DataProvider from it:
/// let provider = BlobDataProvider::new_from_rc_blob(Rc::from(blob))
/// .expect("Deserialization should succeed");
///
/// // Check that it works:
/// let response: DataPayload<HelloWorldV1Marker> = provider.load_payload(
/// &DataRequest {
/// resource_path: ResourcePath {
/// key: key::HELLO_WORLD_V1,
/// options: langid!("la").into(),
/// }
/// })
/// .expect("Data should be valid")
/// .take_payload()
/// .expect("Data should be present");
///
/// assert_eq!(response.get().message, "Ave, munde");
/// ```
///
/// [`StaticDataProvider`]: crate::StaticDataProvider
pub struct BlobDataProvider {
blob: Yoke<BlobSchema<'static>, Rc<[u8]>>,
}

impl BlobDataProvider {
/// Create a [`BlobDataProvider`] from an `Rc` blob of ICU4X data.
pub fn new_from_rc_blob(blob: Rc<[u8]>) -> Result<Self, DataError> {
Ok(BlobDataProvider {
blob: Yoke::try_attach_to_cart_badly(blob, |bytes| {
BlobSchema::deserialize(&mut postcard::Deserializer::from_bytes(bytes))
})
.map_err(DataError::new_resc_error)?,
})
}

/// Gets the buffer for the given DataRequest out of the BlobSchema and returns it yoked
/// to the buffer backing the BlobSchema.
fn get_file(&self, req: &DataRequest) -> Result<Yoke<&'static [u8], Rc<[u8]>>, DataError> {
let path = path_util::resource_path_to_string(&req.resource_path);
self.blob
.try_project_cloned_with_capture::<&'static [u8], String, ()>(
path,
move |blob, path, _| {
let BlobSchema::V001(blob) = blob;
blob.resources.get(&*path).ok_or(()).map(|v| *v)
},
)
.map_err(|_| DataError::MissingResourceKey(req.resource_path.key))
}
}

impl<'data, M> DataProvider<'data, M> for BlobDataProvider
where
M: DataMarker<'data>,
// Actual bound:
// for<'de> <M::Yokeable as Yokeable<'de>>::Output: serde::de::Deserialize<'de>,
// Necessary workaround bound (see `yoke::trait_hack` docs):
for<'de> YokeTraitHack<<M::Yokeable as Yokeable<'de>>::Output>: serde::de::Deserialize<'de>,
{
fn load_payload(&self, req: &DataRequest) -> Result<DataResponse<'data, M>, DataError> {
let file = self.get_file(req)?;
let payload =
DataPayload::try_from_yoked_buffer::<(), DataError>(file, (), |bytes, _, _| {
let mut d = postcard::Deserializer::from_bytes(bytes);
let data = YokeTraitHack::<<M::Yokeable as Yokeable>::Output>::deserialize(&mut d)
.map_err(DataError::new_resc_error)?;
Ok(data.0)
})?;
Ok(DataResponse {
metadata: DataResponseMetadata {
data_langid: req.resource_path.options.langid.clone(),
},
payload: Some(payload),
})
}
}

impl SerdeDeDataProvider for BlobDataProvider {
fn load_to_receiver(
&self,
req: &DataRequest,
receiver: &mut dyn SerdeDeDataReceiver,
) -> Result<DataResponseMetadata, DataError> {
let file = self.get_file(req)?;
receiver.receive_yoked_buffer(file, |bytes, f2| {
let mut d = postcard::Deserializer::from_bytes(bytes);
f2(&mut <dyn erased_serde::Deserializer>::erase(&mut d))
})?;
Ok(DataResponseMetadata {
data_langid: req.resource_path.options.langid.clone(),
})
}
}
2 changes: 1 addition & 1 deletion provider/blob/src/blob_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
use litemap::LiteMap;

/// A versioned Serde schema for ICU4X data blobs.
#[derive(serde::Serialize, serde::Deserialize)]
#[derive(serde::Serialize, serde::Deserialize, yoke::Yokeable)]
pub enum BlobSchema<'data> {
#[serde(borrow)]
V001(BlobSchemaV1<'data>),
Expand Down
10 changes: 8 additions & 2 deletions provider/blob/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
//! `icu_provider_blob` contains implementations of the [`ICU4X`] [`DataProvider`] interface
//! that load data from a single blob.
//!
//! Currently, this crate supports only static blobs, but it will soon support blobs loaded
//! dynamically at runtime (see [#848](https://github.com/unicode-org/icu4x/issues/848)).
//! There are two exports:
//!
//! 1. [`BlobDataProvider`] supports data blobs loaded dynamically at runtime.
//! 2. [`StaticDataProvider`] supports data blobs baked into the binary at compile time.
//!
//! To build blob data, use the `--format blob` option of [`icu4x-datagen`]. For example, to build
//! "hello world" data, run:
Expand All @@ -28,6 +30,8 @@
//! let _ = icu_testdata::get_static_provider();
//! ```
//!
//! For more examples, see the specific data providers.
//!
//! [`ICU4X`]: ../icu/index.html
//! [`DataProvider`]: icu_provider::prelude::DataProvider
//! [`icu4x-datagen`]: https://github.com/unicode-org/icu4x/tree/main/tools/datagen#readme
Expand All @@ -36,11 +40,13 @@

extern crate alloc;

mod blob_data_provider;
mod blob_schema;
mod path_util;
mod static_data_provider;

#[cfg(feature = "export")]
pub mod export;

pub use blob_data_provider::BlobDataProvider;
pub use static_data_provider::StaticDataProvider;
13 changes: 7 additions & 6 deletions provider/blob/src/static_data_provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,14 @@

use crate::blob_schema::BlobSchema;
use crate::path_util;
use icu_provider::{
prelude::*,
serde::{SerdeDeDataProvider, SerdeDeDataReceiver},
};
use icu_provider::prelude::*;
use icu_provider::serde::{SerdeDeDataProvider, SerdeDeDataReceiver};
use serde::de::Deserialize;

/// A data provider loading data statically baked in to the binary.
///
/// Although static data is convenient and highly portable, it also increases binary size.
/// Although static data is convenient and highly portable, it also increases binary size. To
/// load the data files dynamically at runtime, see [`BlobDataProvider`].
///
/// To bake blob data into your binary, use [`include_bytes!`](std::include_bytes), as shown in
/// the example below.
Expand Down Expand Up @@ -49,6 +48,8 @@ use serde::de::Deserialize;
///
/// assert_eq!(response.get().message, "Ave, munde");
/// ```
///
/// [`BlobDataProvider`]: crate::BlobDataProvider
pub struct StaticDataProvider {
blob: BlobSchema<'static>,
}
Expand Down Expand Up @@ -76,7 +77,7 @@ impl<'data, M> DataProvider<'data, M> for StaticDataProvider
where
M: DataMarker<'data>,
// 'static is what we want here, because we are deserializing from a static buffer.
M::Yokeable: serde::de::Deserialize<'static>,
M::Yokeable: Deserialize<'static>,
{
fn load_payload(&self, req: &DataRequest) -> Result<DataResponse<'data, M>, DataError> {
let file = self.get_file(req)?;
Expand Down
47 changes: 46 additions & 1 deletion provider/core/src/data_provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,6 @@ where
/// use icu_provider::prelude::*;
/// use icu_provider::hello_world::*;
/// use std::rc::Rc;
/// use icu_provider::yoke::Yokeable;
///
/// let json_text = "{\"message\":\"Hello World\"}";
/// let json_rc_buffer: Rc<[u8]> = json_text.as_bytes().into();
Expand All @@ -324,6 +323,52 @@ where
})
}

/// Convert a byte buffer into a [`DataPayload`]. A function must be provided to perform the
/// conversion. This can often be a Serde deserialization operation.
///
/// This function is similar to [`DataPayload::try_from_rc_buffer`], but it accepts a buffer
/// that is already yoked to an Rc buffer cart.
///
/// # Examples
///
/// ```
/// # #[cfg(feature = "provider_serde")] {
/// use icu_provider::prelude::*;
/// use icu_provider::hello_world::*;
/// use std::rc::Rc;
/// use icu_provider::yoke::Yoke;
///
/// let json_text = "{\"message\":\"Hello World\"}";
/// let json_rc_buffer: Rc<[u8]> = json_text.as_bytes().into();
///
/// let payload = DataPayload::<HelloWorldV1Marker>::try_from_yoked_buffer(
/// Yoke::attach_to_rc_cart(json_rc_buffer),
/// (),
/// |bytes, _, _| {
/// serde_json::from_slice(bytes)
/// }
/// )
/// .expect("JSON is valid");
///
/// assert_eq!("Hello World", payload.get().message);
/// # } // feature = "provider_serde"
/// ```
#[allow(clippy::type_complexity)]
pub fn try_from_yoked_buffer<T, E>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and elsewhere)

no need for examples, just useful to have a one-liner about what's going on

yoked_buffer: Yoke<&'static [u8], Rc<[u8]>>,
capture: T,
f: for<'de> fn(
<&'static [u8] as yoke::Yokeable<'de>>::Output,
T,
PhantomData<&'de ()>,
) -> Result<<M::Yokeable as Yokeable<'de>>::Output, E>,
) -> Result<Self, E> {
let yoke = yoked_buffer.try_project_with_capture(capture, f)?;
Ok(Self {
inner: DataPayloadInner::RcBuf(yoke),
})
}

/// Convert a fully owned (`'static`) data struct into a DataPayload.
///
/// This constructor creates `'static` payloads.
Expand Down
41 changes: 41 additions & 0 deletions provider/core/src/serde.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,18 @@ pub trait SerdeDeDataReceiver {
),
) -> Result<(), Error>;

/// Receives a yoked byte buffer.
///
/// This function has behavior identical to that of [`SerdeDeDataReceiver::receive_rc_buffer`].
fn receive_yoked_buffer(
&mut self,
yoked_buffer: Yoke<&'static [u8], Rc<[u8]>>,
f1: for<'de> fn(
bytes: &'de [u8],
f2: &mut dyn FnMut(&mut dyn erased_serde::Deserializer<'de>),
),
) -> Result<(), Error>;

/// Receives a `&'static` byte buffer via an [`erased_serde::Deserializer`].
///
/// Note: Since the purpose of this function is to handle zero-copy deserialization of static
Expand Down Expand Up @@ -133,6 +145,35 @@ where
Ok(())
}

fn receive_yoked_buffer(
&mut self,
yoked_buffer: Yoke<&'static [u8], Rc<[u8]>>,
f1: for<'de> fn(
bytes: &'de [u8],
f2: &mut dyn FnMut(&mut dyn erased_serde::Deserializer<'de>),
),
) -> Result<(), Error> {
self.replace(DataPayload::try_from_yoked_buffer(
yoked_buffer,
f1,
move |bytes, f1, _| {
let mut holder = None;
f1(bytes, &mut |deserializer| {
holder.replace(
erased_serde::deserialize::<YokeTraitHack<<M::Yokeable as Yokeable>::Output>>(
deserializer,
)
.map(|w| w.0),
);
});
// The holder is guaranteed to be populated so long as the lambda function was invoked,
// which is in the contract of `receive_rc_buffer`.
holder.unwrap()
},
)?);
Ok(())
}

fn receive_static(
&mut self,
deserializer: &mut dyn erased_serde::Deserializer<'static>,
Expand Down
Loading