Improve support for serialization/deserialization #19

jhugard · 2016-02-04T16:32:14Z

Fix issues with Froto.Core.IO

Restructure to handle deserialization
Improve performance
Reduce (or eliminate) making multiple copies of strings and byte strings
Report errors on invalid encoding
Handle big-endian architectures
Unit test everything

The current Froto.Core.IO code, which does serialization/deserialization to/from the Protobuf Wire format, requires knowing the final datatype of a field before deserializing a field.

As background, the Wire Format only supports four data types: varint, fixed32, fixed64, and length-delimited. (There is also an "group" format, but that is deprecated, not supported outside google, and not officially documented). These get mapped into more specific data types, such as int32, sint32, byte string, and text string by a message definition, based on the field number.

The problem is that the final datatype isn't known until after parsing the field number and fields can arrive out-of-order. Therefore, one approach could be to parse all fields in a message into the above four data types, then flow these into the final field representation by looking up the datatype (and field name) using the field number, then doing the appropriate type converstion. This could also be done lazy via a sequence.

Another issue with the Core.IO code is that every varint requires creating a byte array and copying the underlying bytes, reducing performance and increasing the amount of GC pressure.

jhugard · 2016-02-04T16:41:02Z

Started work on the above, and have it largely done on the refactor-and-complete-wireformat branch.

I've pushed this branch to github mainly for review of work-in-progress. Final code should support serialization/deserialization from/to class-based objects, and form basis for generated code.

Comments welcome!

Still need to implement:

Full decoding of an entire message into a container of some sort (probably just a list of WireFields... should be easy)
Helpers to map from wire-varint to object-field type; e.g, varint to sint32, length-delim to string, etc.
Helpers to map from field number to object-field type (then to set the actual field value), as well as the reverse (from field-type to wire-type).
Helpers to calculate the final size - of a value, field, and hence an entire message - so that a proper-size serialization buffer can be instantiated.

- Rewrite wire-format encoding and decoding to use zero-copy buffer (based on ArraySegment) to minimize memory copies and limit object allocations. - Add full WireFormat unit test suite. - Rename Froto.Core.IO to Froto.Core.WireFormat.

jhugard · 2016-02-08T02:24:53Z

Going to refactor the ZeroCopyBuffer to something much closer to MemoryTributory.

The use case is avoiding memory copies when receiving multiple frames in a single websocket message. Using a stream would also afford the flexibility to serialize off the wire for stream-based protocols. Just need to figure out how to efficiently decode utf8 into a string without making a copy of the utf8 stream bytes or underlying buffer bytes.

ctaggart · 2016-02-08T19:34:01Z

I really like what I see so far! 👍

jhugard · 2016-02-17T07:33:06Z

Abandoned the move to something like MemoryTributory for now; too much work to ensure all the edge conditions are correct, lol. Might make a good optimization later.

…deserializer, with sample serializable class (TestProto.fs). Todo: add dehydrate functions for signed types & floats. Todo: figure out how to support repeated fields (non-packed). Should be able to add a helper that composes with the existing hydrate and dehydrate methods. Todo: Add unit tests for all serialization & deserialization code. Consider using binary files encoded with stock protobuf code.

jhugard · 2016-02-17T07:51:17Z

Added a bunch of code for Serialization (in Core/Serializer.fs), as well as a sample of a class which implements serialization (Froto.Core.Test/TestProto.fs).

The code isn't 100% complete: it needs support for non-packed repeated fields and a few other things, there are no unit tests, and I have not even tested by hand. However, would very much like feedback on the overall direction before investing more (of my very limited) time on this approach.

Here's what a serializable class might look like (yes, this compiles and might even work with the code from the last commit, above).

type Test () =

    inherit MessageBase()

    let mutable m_id = 0
    let mutable m_name = ""
    let mutable m_option = false
    let mutable m_test = ETest.Nada
    let mutable m_packedFixed32 = Array.empty

    override x.DecoderRing =
        [
            1, ref m_id             |> Serializer.hydrateInt32
            2, ref m_name           |> Serializer.hydrateString
            3, ref m_option         |> Serializer.hydrateBool
            4, ref m_test           |> Serializer.hydrateEnum
            5, ref m_packedFixed32  |> Serializer.hydratePackedFixed32
        ]
        |> Map.ofList

    override x.EncoderRing =
        [
            m_id            |> Serializer.dehydrateVarint 1
            m_name          |> Serializer.dehydrateString 2
            m_option        |> Serializer.dehydrateBool 3
            m_test          |> Serializer.dehydrateVarint 4
            m_packedFixed32 |> Serializer.dehydratePackedFixed32 5
        ]

    static member Deserialize buf =
        let self = Test()
        self.Deserialize(buf) |> ignore
        self

    static member DeserializeLengthDelimited buf =
        let self = Test()
        self.DeserializeLengthDelimited(buf) |> ignore
        self


    member x.ID = m_id
    member x.Name = m_name
    member x.bOption = m_option
    member x.Test = m_test
    member x.PackedFixed32 = m_packedFixed32

and ETest =
    | Nada = 0
    | One = 1
    | Two = 2

Add serialization framework, with sample serializable class. Todo: add dehydrate functions for signed types & floats. Todo: figure out how to support repeated fields (non-packed). Should be able to add a helper that composes with the existing hydrate and dehydrate methods. Todo: Add unit tests for all serialization & deserialization code. Consider using binary files encoded with stock protobuf code.

jhugard · 2016-02-17T08:11:37Z

Should have done the above as another commit; I thought a forced commit would update the previous comment. Just fixed an obvious bug (missing parens on Utility.tagLen).

jhugard · 2016-02-17T19:00:43Z

Repeated field support added:

type Test () =
    ...
    let mutable m_repeatedInt32 = List.empty
        ...
        6, ref m_repeatedInt32  |> Serializer.hydrateRepeated Serializer.hydrateInt32
        ...
        m_repeatedInt32 |> Serializer.dehydrateRepeated Serializer.dehydrateVarint 6

ctaggart · 2016-02-17T20:30:12Z

@jhugard, I would recommend just adding a PR where the title has [WIP] prefix while it is work in progress. That will make is easier to review and comment. Looks awesome!

- Rewrite wire-format encoding and decoding to use zero-copy buffer (based on ArraySegment) to minimize memory copies and limit object allocations. - Add full WireFormat unit test suite. - Rename Froto.Core.IO to Froto.Core.WireFormat.

jhugard · 2016-02-18T05:18:22Z

Done :)

- Rewrite wire-format encoding and decoding to use zero-copy buffer (based on ArraySegment) to minimize memory copies and limit object allocations. - Add full WireFormat unit test suite. - Rename Froto.Core.IO to Froto.Core.WireFormat.

ctaggart · 2016-03-03T04:42:36Z

PR #20 was merged and resolves this.

jhugard self-assigned this Feb 4, 2016

jhugard added a commit to jhugard/froto that referenced this issue Feb 17, 2016

Req ctaggart#19 (add2): Add repeated field support

f6473c8

jhugard added a commit to jhugard/froto that referenced this issue Feb 17, 2016

Req ctaggart#19: Remove unnecessary type decls from example

cccc096

jhugard mentioned this issue Feb 18, 2016

Implement Serialization Framework and rework WireFormat #20

Merged

ctaggart closed this as completed Mar 3, 2016

ctaggart added this to the 0.2.1 milestone Apr 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for serialization/deserialization #19

Improve support for serialization/deserialization #19

jhugard commented Feb 4, 2016

jhugard commented Feb 4, 2016

jhugard commented Feb 8, 2016

ctaggart commented Feb 8, 2016

jhugard commented Feb 17, 2016

jhugard commented Feb 17, 2016

jhugard commented Feb 17, 2016

jhugard commented Feb 17, 2016

ctaggart commented Feb 17, 2016

jhugard commented Feb 18, 2016

ctaggart commented Mar 3, 2016