Skip to content

Latest commit

 

History

History
569 lines (456 loc) · 14.4 KB

README.md

File metadata and controls

569 lines (456 loc) · 14.4 KB

Serde Rust Serialization Framework

Build Status Coverage Status Latest Version

Serde is a powerful framework that enables serialization libraries to generically serialize Rust data structures without the overhead of runtime type information. In many situations, the handshake protocol between serializers and serializees can be completely optimized away, leaving Serde to perform roughly the same speed as a hand written serializer for a specific type.

Documentation is available at:

Using Serde

Here is a simple example that demonstrates how to use Serde by serializing and deserializing to JSON. Serde comes with some powerful code generation libraries that work with Stable and Nightly Rust that eliminate much of the complexity of hand rolling serialization and deserialization for a given type. First lets see how we would use Nightly Rust, which is currently a bit simpler than Stable Rust:

Cargo.toml:

[package]
name = "serde_example_nightly"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]

[dependencies]
serde = "*"
serde_json = "*"
serde_macros = "*"

src/main.rs

#![feature(custom_derive, plugin)]
#![plugin(serde_macros)]

extern crate serde;
extern crate serde_json;

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let point = Point { x: 1, y: 2 };
    let serialized = serde_json::to_string(&point).unwrap();

    println!("{}", serialized);

    let deserialized: Point = serde_json::from_str(&serialized).unwrap();

    println!("{:?}", deserialized);
}

When run, it produces:

% cargo run
{"x":1,"y":2}
Point { x: 1, y: 2 }

Stable Rust is a little more complicated because it does not yet support compiler plugins. Instead we need to use the code generation library syntex for this:

[package]
name = "serde_example"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]
build = "build.rs"

[build-dependencies]
serde_codegen = "*"
syntex = "*"

[dependencies]
serde = "*"
serde_json = "*"

src/main.rs:

extern crate serde;
extern crate serde_json;

include!(concat!(env!("OUT_DIR"), "/main.rs"));

src/main.rs.in:

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let point = Point { x: 1, y: 2 };
    let serialized = serde_json::to_string(&point).unwrap();

    println!("{}", serialized);

    let deserialized: Point = serde_json::from_str(&serialized).unwrap();

    println!("{:?}", deserialized);
}

This also produces:

% cargo run
{"x":1,"y":2}
Point { x: 1, y: 2 }

While this works well with Stable Rust, be aware that the error locations currently are reported in the generated file instead of in the source file. You may find it easier to develop with Nightly Rust and serde\_macros, then deploy with Stable Rust and serde_codegen. It's possible to combine both approaches in one setup:

Cargo.toml:

[package]
name = "serde_example"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]
build = "build.rs"

[features]
default = ["serde_codegen"]
nightly = ["serde_macros"]

[build-dependencies]
serde_codegen = { version = "*", optional = true }
syntex = "*"

[dependencies]
serde = "*"
serde_json = "*"
serde_macros = { version = "*", optional = true }

build.rs:

#[cfg(not(feature = "serde_macros"))]
mod inner {
    extern crate syntex;
    extern crate serde_codegen;

    use std::env;
    use std::path::Path;

    pub fn main() {
        let out_dir = env::var_os("OUT_DIR").unwrap();

        let src = Path::new("src/main.rs.in");
        let dst = Path::new(&out_dir).join("main.rs");

        let mut registry = syntex::Registry::new();

        serde_codegen::register(&mut registry);
        registry.expand("", &src, &dst).unwrap();
    }
}

#[cfg(feature = "serde_macros")]
mod inner {
    pub fn main() {}
}

fn main() {
    inner::main();
}

src/main.rs:

#![cfg_attr(feature = "serde_macros", feature(custom_derive, plugin))]
#![cfg_attr(feature = "serde_macros", plugin(serde_macros))]

extern crate serde;
extern crate serde_json;

#[cfg(feature = "serde_macros")]
include!("main.rs.in");

#[cfg(not(feature = "serde_macros"))]
include!(concat!(env!("OUT_DIR"), "/main.rs"));

The src/main.rs.in is the same as before.

Serialization without Macros

Under the covers, Serde extensively uses the Visitor pattern to thread state between the Serializer and Serialize without the two having specific information about each other's concrete type. This has many of the same benefits as frameworks that use runtime type information without the overhead. In fact, when compiling with optimizations, Rust is able to remove most or all the visitor state, and generate code that's nearly as fast as a hand written serializer format for a specific type.

To see it in action, lets look at how a simple type like i32 is serialized. The Serializer is threaded through the type:

impl serde::Serialize for i32 {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: serde::Serializer,
    {
        serializer.visit_i32(*self)
    }
}

As you can see it's pretty simple. More complex types like BTreeMap need to pass a MapVisitor to the Serializer in order to walk through the type:

impl<K, V> Serialize for BTreeMap<K, V>
    where K: Serialize + Ord,
          V: Serialize,
{
    #[inline]
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: Serializer,
    {
        serializer.visit_map(MapIteratorVisitor::new(self.iter(), Some(self.len())))
    }
}

pub struct MapIteratorVisitor<Iter> {
    iter: Iter,
    len: Option<usize>,
}

impl<K, V, Iter> MapIteratorVisitor<Iter>
    where Iter: Iterator<Item=(K, V)>
{
    #[inline]
    pub fn new(iter: Iter, len: Option<usize>) -> MapIteratorVisitor<Iter> {
        MapIteratorVisitor {
            iter: iter,
            len: len,
        }
    }
}

impl<K, V, I> MapVisitor for MapIteratorVisitor<I>
    where K: Serialize,
          V: Serialize,
          I: Iterator<Item=(K, V)>,
{
    #[inline]
    fn visit<S>(&mut self, serializer: &mut S) -> Result<Option<()>, S::Error>
        where S: Serializer,
    {
        match self.iter.next() {
            Some((key, value)) => {
                let value = try!(serializer.visit_map_elt(key, value));
                Ok(Some(value))
            }
            None => Ok(None)
        }
    }

    #[inline]
    fn len(&self) -> Option<usize> {
        self.len
    }
}

Serializing structs follow this same pattern. In fact, structs are represented as a named map. Its visitor uses a simple state machine to iterate through all the fields:

struct Point {
    x: i32,
    y: i32,
}

impl serde::Serialize for Point {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: serde::Serializer
    {
        serializer.visit_struct("Point", PointMapVisitor {
            value: self,
            state: 0,
        })
    }
}

struct PointMapVisitor<'a> {
    value: &'a Point,
    state: u8,
}

impl<'a> serde::ser::MapVisitor for PointMapVisitor<'a> {
    fn visit<S>(&mut self, serializer: &mut S) -> Result<Option<()>, S::Error>
        where S: serde::Serializer
    {
        match self.state {
            0 => {
                self.state += 1;
                Ok(Some(try!(serializer.visit_struct_elt("x", &self.value.x))))
            }
            1 => {
                self.state += 1;
                Ok(Some(try!(serializer.visit_struct_elt("y", &self.value.y))))
            }
            _ => {
                Ok(None)
            }
        }
    }
}

Deserialization without Macros

Deserialization is a little more complicated since there's a bit more error handling that needs to occur. Let's start with the simple i32 Deserialize implementation. It passes a Visitor to the Deserializer. The Visitor can create the i32 from a variety of different types:

impl Deserialize for i32 {
    fn deserialize<D>(deserializer: &mut D) -> Result<i32, D::Error>
        where D: serde::Deserializer,
    {
        deserializer.visit(I32Visitor)
    }
}

struct I32Visitor;

impl serde::de::Visitor for I32Visitor {
    type Value = i32;

    fn visit_i16<E>(&mut self, value: i16) -> Result<i16, E>
        where E: Error,
    {
        self.visit_i32(value as i32)
    }

    fn visit_i32<E>(&mut self, value: i32) -> Result<i32, E>
        where E: Error,
    {
        Ok(value)
    }

    ...

Since it's possible for this type to get passed an unexpected type, we need a way to error out. This is done by way of the Error trait, which allows a Deserialize to generate an error for a few common error conditions. Here's how it could be used:

    ...

    fn visit_string<E>(&mut self, _: String) -> Result<i32, E>
        where E: Error,
    {
        Err(serde::de::Error::syntax("expect a string"))
    }

    ...

Maps follow a similar pattern as before, and use a MapVisitor to walk through the values generated by the Deserializer.

impl<K, V> serde::Deserialize for BTreeMap<K, V>
    where K: serde::Deserialize + Eq + Ord,
          V: serde::Deserialize,
{
    fn deserialize<D>(deserializer: &mut D) -> Result<BTreeMap<K, V>, D::Error>
        where D: serde::Deserializer,
    {
        deserializer.visit(BTreeMapVisitor::new())
    }
}

pub struct BTreeMapVisitor<K, V> {
    marker: PhantomData<BTreeMap<K, V>>,
}

impl<K, V> BTreeMapVisitor<K, V> {
    pub fn new() -> Self {
        BTreeMapVisitor {
            marker: PhantomData,
        }
    }
}

impl<K, V> serde::de::Visitor for BTreeMapVisitor<K, V>
    where K: serde::de::Deserialize + Ord,
          V: serde::de::Deserialize
{
    type Value = BTreeMap<K, V>;

    fn visit_unit<E>(&mut self) -> Result<BTreeMap<K, V>, E>
        where E: Error,
    {
        Ok(BTreeMap::new())
    }

    fn visit_map<V_>(&mut self, mut visitor: V_) -> Result<BTreeMap<K, V>, V_::Error>
        where V_: MapVisitor,
    {
        let mut values = BTreeMap::new();

        while let Some((key, value)) = try!(visitor.visit()) {
            values.insert(key, value);
        }

        try!(visitor.end());

        Ok(values)
    }
}

Deserializing structs goes a step further in order to support not allocating a String to hold the field names. This is done by custom field enum that deserializes an enum variant from a string. So for our Point example from before, we need to generate:

enum PointField {
    X,
    Y,
}

impl serde::Deserialize for PointField {
    fn deserialize<D>(deserializer: &mut D) -> Result<PointField, D::Error>
        where D: serde::de::Deserializer
    {
        struct PointFieldVisitor;

        impl serde::de::Visitor for PointFieldVisitor {
            type Value = PointField;

            fn visit_str<E>(&mut self, value: &str) -> Result<PointField, E>
                where E: serde::de::Error
            {
                match value {
                    "x" => Ok(PointField::X),
                    "y" => Ok(PointField::Y),
                    _ => Err(serde::de::Error::syntax("expected x or y")),
                }
            }
        }

        deserializer.visit(PointFieldVisitor)
    }
}

This is then used in our actual deserializer:

impl serde::Deserialize for Point {
    fn deserialize<D>(deserializer: &mut D) -> Result<Point, D::Error>
        where D: serde::de::Deserializer
    {
        static FIELDS: &'static [&'static str] = &["x", "y"];
        deserializer.visit_struct("Point", FIELDS, PointVisitor)
    }
}

struct PointVisitor;

impl serde::de::Visitor for PointVisitor {
    type Value = Point;

    fn visit_map<V>(&mut self, mut visitor: V) -> Result<Point, V::Error>
        where V: serde::de::MapVisitor
    {
        let mut x = None;
        let mut y = None;

        loop {
            match try!(visitor.visit_key()) {
                Some(PointField::X) => { x = Some(try!(visitor.visit_value())); }
                Some(PointField::Y) => { y = Some(try!(visitor.visit_value())); }
                None => { break; }
            }
        }

        let x = match x {
            Some(x) => x,
            None => try!(visitor.missing_field("x")),
        };

        let y = match y {
            Some(y) => y,
            None => try!(visitor.missing_field("y")),
        };

        try!(visitor.end());

        Ok(Point{ x: x, y: y })
    }
}

Serialization Formats Using Serde

Format Name
Bincode bincode
JSON serde_json
MessagePack rmp
XML serde_xml
YAML serde_yaml