Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's serde zero-copy story so far? #492

Closed
arthurprs opened this issue Aug 15, 2016 · 12 comments
Closed

What's serde zero-copy story so far? #492

arthurprs opened this issue Aug 15, 2016 · 12 comments

Comments

@arthurprs
Copy link
Contributor

Hi all, first of all huge kudos to all contributors! Serde is already a tried and proven framework for de/serializing!

I was wondering what's the current state or plans to support zero-copy deserialization, aka deserializing into &str and &[u8].

@oli-obk
Copy link
Member

oli-obk commented Aug 15, 2016

That kind of deserialization is only possible when the original serialized state outlives the deserialized data, and the data is in the correct format.

It's probably nothing more than spraying a few lifetimes around, but it might break the existing use cases, so we have to take care here.

@dtolnay
Copy link
Member

dtolnay commented Mar 16, 2017

Here is a proof of concept of what @oli-obk meant by "spraying a few lifetimes around." The remaining work on this issue will be figuring out how to support this without polluting every usage of Deserialize and Deserializer with lifetimes.

trait Deserialize<'a>: Sized {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'a>;
}

trait Deserializer<'a>: Sized {
    type Error: Error;

    fn deserialize_str<V>(self, visitor: V) -> Result<V::Value, Self::Error>
        where V: Visitor<'a>;
    fn deserialize_string<V>(self, visitor: V) -> Result<V::Value, Self::Error>
        where V: Visitor<'a>;
}

trait Error {
    fn whatever() -> Self;
}

trait Visitor<'a>: Sized {
    type Value;

    fn visit_str_from_input<E>(self, v: &'a str) -> Result<Self::Value, E>
        where E: Error
    {
        self.visit_str(v)
    }

    fn visit_str<E>(self, _v: &str) -> Result<Self::Value, E>
        where E: Error
    {
        Err(Error::whatever())
    }

    fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
        where E: Error
    {
        self.visit_str(&v)
    }
}

impl<'a> Deserialize<'a> for String {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'a>
    {
        struct StringVisitor;

        impl<'a> Visitor<'a> for StringVisitor {
            type Value = String;

            fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
                where E: Error
            {
                Ok(v.to_owned())
            }

            fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
                where E: Error
            {
                Ok(v)
            }
        }

        deserializer.deserialize_string(StringVisitor)
    }
}

impl<'a> Deserialize<'a> for &'a str {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'a>
    {
        struct StrVisitor;

        impl<'a> Visitor<'a> for StrVisitor {
            type Value = &'a str;

            fn visit_str_from_input<E>(self, v: &'a str) -> Result<Self::Value, E>
                where E: Error
            {
                Ok(v)
            }
        }

        deserializer.deserialize_str(StrVisitor)
    }
}

@dtolnay
Copy link
Member

dtolnay commented Mar 19, 2017

One unfortunate limitation here that would require some sort of lifetime dispatch is that we can't have Cow<'a, T> borrow the data if the input outlives the output, and copy the data if the input does not outlive the output. We either need to impl<'a, 'd, T> Deserialize<'d> for Cow<'a, T> and have it always copy, or impl<'a, T> Deserialize<'a> for Cow<'a, T> and have it always borrow even if you would otherwise want it to copy.

@dtolnay
Copy link
Member

dtolnay commented Mar 20, 2017

#819 would be a way to preserve the current Deserialize and Deserializer traits without lifetimes.

@dtolnay
Copy link
Member

dtolnay commented Mar 21, 2017

It makes me sad that the approach in #492 (comment) is so simple but would require HRTB to accomplish most things that a person would want to do with Deserialize.

pub fn from_reader<R, T>(rdr: R) -> Result<T>
    where R: Read,
          T: for<'a> Deserialize<'a>

I would not want an understanding of HRTB as a prerequisite for using Serde. @maciejhirsz have you found this to be a problem with bitsparrow::BitDecode<'src>?

Maybe if we ever get type aliases in trait bounds, we could somehow have Deserialize be an alias for for<'a> Deserialize<'a>.

@dtolnay
Copy link
Member

dtolnay commented Mar 21, 2017

Maybe we want this sort of setup, but unclear whether this is less or more complicated than HRTB.

pub trait DeserializeBorrow<'a>;
pub trait Deserialize: for<'a> DeserializeBorrow<'a>;
impl<T> Deserialize for T where T: for<'a> DeserializeBorrow<'a>;


// #[derive(Deserialize)]
struct A;
impl<'a> DeserializeBorrow<'a> for A;

// #[derive(DeserializeBorrow)]
struct B<'b> { b: &'b str }
impl<'b> DeserializeBorrow<'b> for B<'b>;


pub fn from_reader<R, T>(rdr: R) -> Result<T>
    where R: Read,
          T: Deserialize;

pub fn from_str<'a, T>(s: &'a str) -> Result<T>
    where T: DeserializeBorrow<'a>;

@maciejhirsz
Copy link
Contributor

Didn't run into any issues with HRTB for sparrow. For the most part, the fact that there is a lifetime on the trait is completely opaque to the end user when deriving the impl, which - I imagine - is the vast majority of cases. Even when you implement it manually you just need to stick two <'a>s on your impl and you are done.

That said, I didn't have backwards compatibility to consider.

@dtolnay
Copy link
Member

dtolnay commented Mar 22, 2017

Right, deriving is not a problem and I am also not concerned about handwritten impls because that is an advanced use case anyway.

Look at these 220 places. Almost all of those would require HRTB and practically all users will need to interact with a function like that to use Serde, for example serde_json::from_reader.

@dtolnay
Copy link
Member

dtolnay commented Mar 23, 2017

Right now I am leaning toward Deserialize<'a> along with an explicit setting to opt into borrowing particular fields to address #492 (comment).

#[derive(Deserialize)]
struct S<'a, 'b, 'c, 'd, 'e, 'f, 'g> {
    // &str and &[u8] are the only types automatically borrowed
    a: &'a str,
    b: &'b [u8],

    // everything else must use serde(borrow)
    #[serde(borrow)]
    c: Cow<'c, str>,
    #[serde(borrow)]
    d: T<'d>,
    #[serde(borrow = "'e + 'f")]
    x: X<'e, 'f, 'g>,
}

Along with possibly a helper trait to simplify bounds. I think T: DeserializeOwned is clearer in intent and meaning than T: for<'a> Deserialize<'a>.

pub trait DeserializeOwned: for<'a> Deserialize<'a> {}
impl<T> DeserializeOwned for T where T: for<'a> Deserialize<'a> {}

pub fn from_reader<R, T>(rdr: R) -> Result<T>
    where R: Read,
          T: DeserializeOwned;

pub fn from_str<'a, T>(s: &'a str) -> Result<T>
    where T: Deserialize<'a>;

@dtolnay
Copy link
Member

dtolnay commented Apr 3, 2017

#837 has a working zero-copy implementation. @arthurprs and others, please check whether this addresses all of your use cases.

struct S<'a, 'b, 'c, 'd, 'e> {
    // copied from the input
    w: String,

    // &str and &[u8] are always implicitly borrowed from the input
    x: &'a str,

    // other types require an attribute to opt into borrowing
    #[serde(borrow)]
    y: Cow<'b, str>,

    // can narrow down which lifetimes should be borrowed
    #[serde(borrow = "'c + 'd")]
    z: Z<'c, 'd, 'static, 'e>,
}

The union of all borrowed lifetimes becomes the bound on the 'de lifetime of the Deserialize impl.

impl<'de: 'a+'b+'c+'d, 'a, 'b, 'c, 'd, 'e> Deserialize<'de> for S<'a, 'b, 'c, 'd, 'e>

@arthurprs
Copy link
Contributor Author

This is very impressive work @dtolnay 🎉

I don't have a test right now but my use-case would be unpacking bincode at ludicrous speeds.

@dtolnay
Copy link
Member

dtolnay commented Apr 4, 2017

This feature has been merged and will be in the next major release.

@dtolnay dtolnay closed this as completed Apr 4, 2017
@dtolnay dtolnay self-assigned this Apr 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants