-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoupling data schema from data format #196
Comments
Adding additional comments from @Manishearth to this thread, originally shared by @zbraniecki in #198 (review):
|
@dtolnay, thanks for maintaining |
I don't completely follow the scenario and the existing setup of data providers and requests, since I don't know anything about this crate. Would you be able to put together a minimized compilable code snippet that shows the problem being solved and the traits/components involved? |
Thanks! I'll do my best at a minimal explanation. If you want more color, see data-pipeline.md. The DataProvider trait is defined as: pub trait DataProvider<'d> {
/// Query the provider for data. Returns Ok if the request successfully loaded data. If data
/// failed to load, returns an Error with more information.
fn load<'a>(&'a self, req: &DataRequest) -> Result<DataResponse<'d>, Error>;
} In other words, it's a pretty basic request-response pattern. #[derive(Debug, Clone)]
pub struct DataResponse<'d> {
payload: Cow<'d, dyn CloneableAny>,
} Multiple DataProviders can be chained together, each providing specific functionality like filtering, caching, routing, etc. Each chain has a source (the upstream data provider that ultimately receives and fulfills the request) and a sink (the downstream agent that initiated the request). The problem is, source knows the data format (e.g., JSON, Bincode, CBOR, etc), but the sink knows the data structure (the thing implementing Serde Deserialize). However, both of those pieces of information need to converge somewhere in order for Serde to do its job. The options from the OP are to:
Does this make more sense? I think I'm leaning toward option 2. |
In order to parse a JSON blob in Serde, one needs to know the data schema (struct definition).
Currently, the data provider passes Rust structs (encoded as
Any
s) in the Reponse objects through the pipeline.Together, these two statements mean that the source data provider, the one reading the JSON blob from the file system, needs to know ahead of time the mapping from JSON files to structs.
This is undesirable because:
match
statement, table lookup, etc. This dispatch needs to be maintained and could be a source of failures or performance bottlenecks.I've considered a few solutions:
I think it may be useful to pass around structs, because if we get to a point where we can pre-build data into *.rs files (#78), we'd like to pass those verbatim through the pipeline.
The text was updated successfully, but these errors were encountered: