-
-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's serde zero-copy story so far? #492
Comments
That kind of deserialization is only possible when the original serialized state outlives the deserialized data, and the data is in the correct format. It's probably nothing more than spraying a few lifetimes around, but it might break the existing use cases, so we have to take care here. |
Here is a proof of concept of what @oli-obk meant by "spraying a few lifetimes around." The remaining work on this issue will be figuring out how to support this without polluting every usage of Deserialize and Deserializer with lifetimes. trait Deserialize<'a>: Sized {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'a>;
}
trait Deserializer<'a>: Sized {
type Error: Error;
fn deserialize_str<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'a>;
fn deserialize_string<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'a>;
}
trait Error {
fn whatever() -> Self;
}
trait Visitor<'a>: Sized {
type Value;
fn visit_str_from_input<E>(self, v: &'a str) -> Result<Self::Value, E>
where E: Error
{
self.visit_str(v)
}
fn visit_str<E>(self, _v: &str) -> Result<Self::Value, E>
where E: Error
{
Err(Error::whatever())
}
fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
where E: Error
{
self.visit_str(&v)
}
}
impl<'a> Deserialize<'a> for String {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'a>
{
struct StringVisitor;
impl<'a> Visitor<'a> for StringVisitor {
type Value = String;
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where E: Error
{
Ok(v.to_owned())
}
fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
where E: Error
{
Ok(v)
}
}
deserializer.deserialize_string(StringVisitor)
}
}
impl<'a> Deserialize<'a> for &'a str {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'a>
{
struct StrVisitor;
impl<'a> Visitor<'a> for StrVisitor {
type Value = &'a str;
fn visit_str_from_input<E>(self, v: &'a str) -> Result<Self::Value, E>
where E: Error
{
Ok(v)
}
}
deserializer.deserialize_str(StrVisitor)
}
} |
One unfortunate limitation here that would require some sort of lifetime dispatch is that we can't have |
#819 would be a way to preserve the current Deserialize and Deserializer traits without lifetimes. |
It makes me sad that the approach in #492 (comment) is so simple but would require HRTB to accomplish most things that a person would want to do with Deserialize. pub fn from_reader<R, T>(rdr: R) -> Result<T>
where R: Read,
T: for<'a> Deserialize<'a> I would not want an understanding of HRTB as a prerequisite for using Serde. @maciejhirsz have you found this to be a problem with bitsparrow::BitDecode<'src>? Maybe if we ever get type aliases in trait bounds, we could somehow have Deserialize be an alias for |
Maybe we want this sort of setup, but unclear whether this is less or more complicated than HRTB. pub trait DeserializeBorrow<'a>;
pub trait Deserialize: for<'a> DeserializeBorrow<'a>;
impl<T> Deserialize for T where T: for<'a> DeserializeBorrow<'a>;
// #[derive(Deserialize)]
struct A;
impl<'a> DeserializeBorrow<'a> for A;
// #[derive(DeserializeBorrow)]
struct B<'b> { b: &'b str }
impl<'b> DeserializeBorrow<'b> for B<'b>;
pub fn from_reader<R, T>(rdr: R) -> Result<T>
where R: Read,
T: Deserialize;
pub fn from_str<'a, T>(s: &'a str) -> Result<T>
where T: DeserializeBorrow<'a>; |
Didn't run into any issues with HRTB for sparrow. For the most part, the fact that there is a lifetime on the trait is completely opaque to the end user when deriving the impl, which - I imagine - is the vast majority of cases. Even when you implement it manually you just need to stick two That said, I didn't have backwards compatibility to consider. |
Right, deriving is not a problem and I am also not concerned about handwritten impls because that is an advanced use case anyway. Look at these 220 places. Almost all of those would require HRTB and practically all users will need to interact with a function like that to use Serde, for example serde_json::from_reader. |
Right now I am leaning toward Deserialize<'a> along with an explicit setting to opt into borrowing particular fields to address #492 (comment). #[derive(Deserialize)]
struct S<'a, 'b, 'c, 'd, 'e, 'f, 'g> {
// &str and &[u8] are the only types automatically borrowed
a: &'a str,
b: &'b [u8],
// everything else must use serde(borrow)
#[serde(borrow)]
c: Cow<'c, str>,
#[serde(borrow)]
d: T<'d>,
#[serde(borrow = "'e + 'f")]
x: X<'e, 'f, 'g>,
} Along with possibly a helper trait to simplify bounds. I think pub trait DeserializeOwned: for<'a> Deserialize<'a> {}
impl<T> DeserializeOwned for T where T: for<'a> Deserialize<'a> {}
pub fn from_reader<R, T>(rdr: R) -> Result<T>
where R: Read,
T: DeserializeOwned;
pub fn from_str<'a, T>(s: &'a str) -> Result<T>
where T: Deserialize<'a>; |
#837 has a working zero-copy implementation. @arthurprs and others, please check whether this addresses all of your use cases. struct S<'a, 'b, 'c, 'd, 'e> {
// copied from the input
w: String,
// &str and &[u8] are always implicitly borrowed from the input
x: &'a str,
// other types require an attribute to opt into borrowing
#[serde(borrow)]
y: Cow<'b, str>,
// can narrow down which lifetimes should be borrowed
#[serde(borrow = "'c + 'd")]
z: Z<'c, 'd, 'static, 'e>,
} The union of all borrowed lifetimes becomes the bound on the impl<'de: 'a+'b+'c+'d, 'a, 'b, 'c, 'd, 'e> Deserialize<'de> for S<'a, 'b, 'c, 'd, 'e> |
This is very impressive work @dtolnay 🎉 I don't have a test right now but my use-case would be unpacking bincode at ludicrous speeds. |
This feature has been merged and will be in the next major release. |
Hi all, first of all huge kudos to all contributors! Serde is already a tried and proven framework for de/serializing!
I was wondering what's the current state or plans to support zero-copy deserialization, aka deserializing into &str and &[u8].
The text was updated successfully, but these errors were encountered: