Skip to content
This repository has been archived by the owner on Feb 8, 2022. It is now read-only.

Improve support for serialization/deserialization #19

Closed
jhugard opened this issue Feb 4, 2016 · 10 comments
Closed

Improve support for serialization/deserialization #19

jhugard opened this issue Feb 4, 2016 · 10 comments
Assignees
Milestone

Comments

@jhugard
Copy link
Collaborator

jhugard commented Feb 4, 2016

Fix issues with Froto.Core.IO

  • Restructure to handle deserialization
  • Improve performance
  • Reduce (or eliminate) making multiple copies of strings and byte strings
  • Report errors on invalid encoding
  • Handle big-endian architectures
  • Unit test everything

The current Froto.Core.IO code, which does serialization/deserialization to/from the Protobuf Wire format, requires knowing the final datatype of a field before deserializing a field.

As background, the Wire Format only supports four data types: varint, fixed32, fixed64, and length-delimited. (There is also an "group" format, but that is deprecated, not supported outside google, and not officially documented). These get mapped into more specific data types, such as int32, sint32, byte string, and text string by a message definition, based on the field number.

The problem is that the final datatype isn't known until after parsing the field number and fields can arrive out-of-order. Therefore, one approach could be to parse all fields in a message into the above four data types, then flow these into the final field representation by looking up the datatype (and field name) using the field number, then doing the appropriate type converstion. This could also be done lazy via a sequence.

Another issue with the Core.IO code is that every varint requires creating a byte array and copying the underlying bytes, reducing performance and increasing the amount of GC pressure.

@jhugard
Copy link
Collaborator Author

jhugard commented Feb 4, 2016

Started work on the above, and have it largely done on the refactor-and-complete-wireformat branch.

I've pushed this branch to github mainly for review of work-in-progress. Final code should support serialization/deserialization from/to class-based objects, and form basis for generated code.

Comments welcome!

Still need to implement:

  • Full decoding of an entire message into a container of some sort (probably just a list of WireFields... should be easy)
  • Helpers to map from wire-varint to object-field type; e.g, varint to sint32, length-delim to string, etc.
  • Helpers to map from field number to object-field type (then to set the actual field value), as well as the reverse (from field-type to wire-type).
  • Helpers to calculate the final size - of a value, field, and hence an entire message - so that a proper-size serialization buffer can be instantiated.

@jhugard jhugard self-assigned this Feb 4, 2016
jhugard added a commit that referenced this issue Feb 4, 2016
- Rewrite wire-format encoding and decoding to use zero-copy buffer
(based on ArraySegment) to minimize memory copies and limit object allocations.
- Add full WireFormat unit test suite.
- Rename Froto.Core.IO to Froto.Core.WireFormat.
@jhugard
Copy link
Collaborator Author

jhugard commented Feb 8, 2016

Going to refactor the ZeroCopyBuffer to something much closer to MemoryTributory.

The use case is avoiding memory copies when receiving multiple frames in a single websocket message. Using a stream would also afford the flexibility to serialize off the wire for stream-based protocols. Just need to figure out how to efficiently decode utf8 into a string without making a copy of the utf8 stream bytes or underlying buffer bytes.

@ctaggart
Copy link
Owner

ctaggart commented Feb 8, 2016

I really like what I see so far! 👍

@jhugard
Copy link
Collaborator Author

jhugard commented Feb 17, 2016

Abandoned the move to something like MemoryTributory for now; too much work to ensure all the edge conditions are correct, lol. Might make a good optimization later.

jhugard added a commit that referenced this issue Feb 17, 2016
…deserializer, with sample serializable class (TestProto.fs).

Todo: add dehydrate functions for signed types & floats.

Todo: figure out how to support repeated fields (non-packed).  Should be able to add a helper that composes with the existing hydrate and dehydrate methods.

Todo: Add unit tests for all serialization & deserialization code.  Consider using binary files encoded with stock protobuf code.
@jhugard
Copy link
Collaborator Author

jhugard commented Feb 17, 2016

Added a bunch of code for Serialization (in Core/Serializer.fs), as well as a sample of a class which implements serialization (Froto.Core.Test/TestProto.fs).

The code isn't 100% complete: it needs support for non-packed repeated fields and a few other things, there are no unit tests, and I have not even tested by hand. However, would very much like feedback on the overall direction before investing more (of my very limited) time on this approach.

Here's what a serializable class might look like (yes, this compiles and might even work with the code from the last commit, above).

type Test () =

    inherit MessageBase()

    let mutable m_id = 0
    let mutable m_name = ""
    let mutable m_option = false
    let mutable m_test = ETest.Nada
    let mutable m_packedFixed32 = Array.empty

    override x.DecoderRing =
        [
            1, ref m_id             |> Serializer.hydrateInt32
            2, ref m_name           |> Serializer.hydrateString
            3, ref m_option         |> Serializer.hydrateBool
            4, ref m_test           |> Serializer.hydrateEnum
            5, ref m_packedFixed32  |> Serializer.hydratePackedFixed32
        ]
        |> Map.ofList

    override x.EncoderRing =
        [
            m_id            |> Serializer.dehydrateVarint 1
            m_name          |> Serializer.dehydrateString 2
            m_option        |> Serializer.dehydrateBool 3
            m_test          |> Serializer.dehydrateVarint 4
            m_packedFixed32 |> Serializer.dehydratePackedFixed32 5
        ]

    static member Deserialize buf =
        let self = Test()
        self.Deserialize(buf) |> ignore
        self

    static member DeserializeLengthDelimited buf =
        let self = Test()
        self.DeserializeLengthDelimited(buf) |> ignore
        self


    member x.ID = m_id
    member x.Name = m_name
    member x.bOption = m_option
    member x.Test = m_test
    member x.PackedFixed32 = m_packedFixed32

and ETest =
    | Nada = 0
    | One = 1
    | Two = 2

jhugard added a commit that referenced this issue Feb 17, 2016
Add serialization framework, with sample serializable class.

Todo: add dehydrate functions for signed types & floats.

Todo: figure out how to support repeated fields (non-packed).  Should be able to add a helper that composes with the existing hydrate and dehydrate methods.

Todo: Add unit tests for all serialization & deserialization code.  Consider using binary files encoded with stock protobuf code.
@jhugard
Copy link
Collaborator Author

jhugard commented Feb 17, 2016

Should have done the above as another commit; I thought a forced commit would update the previous comment. Just fixed an obvious bug (missing parens on Utility.tagLen).

jhugard added a commit to jhugard/froto that referenced this issue Feb 17, 2016
@jhugard
Copy link
Collaborator Author

jhugard commented Feb 17, 2016

Repeated field support added:

type Test () =
    ...
    let mutable m_repeatedInt32 = List.empty
        ...
        6, ref m_repeatedInt32  |> Serializer.hydrateRepeated Serializer.hydrateInt32
        ...
        m_repeatedInt32 |> Serializer.dehydrateRepeated Serializer.dehydrateVarint 6

jhugard added a commit to jhugard/froto that referenced this issue Feb 17, 2016
@ctaggart
Copy link
Owner

@jhugard, I would recommend just adding a PR where the title has [WIP] prefix while it is work in progress. That will make is easier to review and comment. Looks awesome!

jhugard added a commit that referenced this issue Feb 18, 2016
- Rewrite wire-format encoding and decoding to use zero-copy buffer
(based on ArraySegment) to minimize memory copies and limit object allocations.
- Add full WireFormat unit test suite.
- Rename Froto.Core.IO to Froto.Core.WireFormat.
@jhugard
Copy link
Collaborator Author

jhugard commented Feb 18, 2016

Done :)

jhugard added a commit that referenced this issue Feb 20, 2016
- Rewrite wire-format encoding and decoding to use zero-copy buffer
(based on ArraySegment) to minimize memory copies and limit object allocations.
- Add full WireFormat unit test suite.
- Rename Froto.Core.IO to Froto.Core.WireFormat.
@ctaggart
Copy link
Owner

ctaggart commented Mar 3, 2016

PR #20 was merged and resolves this.

@ctaggart ctaggart closed this as completed Mar 3, 2016
@ctaggart ctaggart added this to the 0.2.1 milestone Apr 8, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants