-
Notifications
You must be signed in to change notification settings - Fork 27
Improve support for serialization/deserialization #19
Comments
Started work on the above, and have it largely done on the refactor-and-complete-wireformat branch. I've pushed this branch to github mainly for review of work-in-progress. Final code should support serialization/deserialization from/to class-based objects, and form basis for generated code. Comments welcome! Still need to implement:
|
- Rewrite wire-format encoding and decoding to use zero-copy buffer (based on ArraySegment) to minimize memory copies and limit object allocations. - Add full WireFormat unit test suite. - Rename Froto.Core.IO to Froto.Core.WireFormat.
Going to refactor the ZeroCopyBuffer to something much closer to MemoryTributory. The use case is avoiding memory copies when receiving multiple frames in a single websocket message. Using a stream would also afford the flexibility to serialize off the wire for stream-based protocols. Just need to figure out how to efficiently decode utf8 into a string without making a copy of the utf8 stream bytes or underlying buffer bytes. |
I really like what I see so far! 👍 |
Abandoned the move to something like MemoryTributory for now; too much work to ensure all the edge conditions are correct, lol. Might make a good optimization later. |
…deserializer, with sample serializable class (TestProto.fs). Todo: add dehydrate functions for signed types & floats. Todo: figure out how to support repeated fields (non-packed). Should be able to add a helper that composes with the existing hydrate and dehydrate methods. Todo: Add unit tests for all serialization & deserialization code. Consider using binary files encoded with stock protobuf code.
Added a bunch of code for Serialization (in Core/Serializer.fs), as well as a sample of a class which implements serialization (Froto.Core.Test/TestProto.fs). The code isn't 100% complete: it needs support for non-packed repeated fields and a few other things, there are no unit tests, and I have not even tested by hand. However, would very much like feedback on the overall direction before investing more (of my very limited) time on this approach. Here's what a serializable class might look like (yes, this compiles and might even work with the code from the last commit, above).
|
Add serialization framework, with sample serializable class. Todo: add dehydrate functions for signed types & floats. Todo: figure out how to support repeated fields (non-packed). Should be able to add a helper that composes with the existing hydrate and dehydrate methods. Todo: Add unit tests for all serialization & deserialization code. Consider using binary files encoded with stock protobuf code.
Should have done the above as another commit; I thought a forced commit would update the previous comment. Just fixed an obvious bug (missing parens on Utility.tagLen). |
Repeated field support added:
|
@jhugard, I would recommend just adding a PR where the title has |
- Rewrite wire-format encoding and decoding to use zero-copy buffer (based on ArraySegment) to minimize memory copies and limit object allocations. - Add full WireFormat unit test suite. - Rename Froto.Core.IO to Froto.Core.WireFormat.
Done :) |
- Rewrite wire-format encoding and decoding to use zero-copy buffer (based on ArraySegment) to minimize memory copies and limit object allocations. - Add full WireFormat unit test suite. - Rename Froto.Core.IO to Froto.Core.WireFormat.
PR #20 was merged and resolves this. |
Fix issues with Froto.Core.IO
The current Froto.Core.IO code, which does serialization/deserialization to/from the Protobuf Wire format, requires knowing the final datatype of a field before deserializing a field.
As background, the Wire Format only supports four data types: varint, fixed32, fixed64, and length-delimited. (There is also an "group" format, but that is deprecated, not supported outside google, and not officially documented). These get mapped into more specific data types, such as int32, sint32, byte string, and text string by a message definition, based on the field number.
The problem is that the final datatype isn't known until after parsing the field number and fields can arrive out-of-order. Therefore, one approach could be to parse all fields in a message into the above four data types, then flow these into the final field representation by looking up the datatype (and field name) using the field number, then doing the appropriate type converstion. This could also be done lazy via a sequence.
Another issue with the Core.IO code is that every varint requires creating a byte array and copying the underlying bytes, reducing performance and increasing the amount of GC pressure.
The text was updated successfully, but these errors were encountered: