Custom serialization #41

aturley · 2016-10-11T18:56:24Z

…tion Also fix typos.

SeanTAllen · 2016-10-11T19:54:39Z

I'm quite excited for this

Praetonus · 2016-10-13T17:36:35Z

How could fun _serialise(bytes: Pointer[U8] tag) and fun _deserialise(bytes: Pointer[U8] tag) be implemented? Currently, user code can't do much with a Pointer[U8] tag.

jemc · 2016-10-13T18:09:03Z

@Praetonus - I think it's assumed that FFI would be involved, in which users can do pretty much anything with a pointer. That is, I imagine any object using a Pointer internally probably needs to do FFI operations to serialize or deserialize it anyway.

aturley · 2016-10-17T13:37:25Z

@Praetonus what @jemc said is correct, the assumption here is that the user will use FFI for most of the implementation. This seems like a reasonable assumption, because the feature is intended to be used with classes that have pointer fields, which implies that FFI is already being used somewhere else in the code.

Praetonus · 2016-10-17T15:18:39Z

Ok, makes sense.

SeanTAllen · 2016-10-20T03:08:48Z

@aturley you ready to take this to final comment?

aturley · 2016-10-20T13:26:47Z

Yes, definitely.

jemc · 2016-10-25T03:54:20Z

text/0000-custom-serialization.md

+All of the following methods must be implemented for custom serialization:
+* `fun _serialise_space(): USize` -- returns the number of bytes to reserve for custom serialization
+* `fun _serialise(bytes: Pointer[U8] tag)` -- takes in a pointer to the location in the serialization buffer that has been reserved for this object's extra data, writes a serialized representation of its data to the buffer
+* `fun _deserialise(bytes: Pointer[U8] tag)` -- takes in a pointer to the location in the deserialization buffer that represents the object's extra data, reads the data out, and modifies the object using that data


Seems like this one needs to be fun ref.

I think this method needs to be passed a second argument - the space: USize that was determined on the other side by _serialise_space. Given that the result of _serialise_space is allowed to not be constant (allowed to vary based on the object instance being serialised), it seems like we need to be conveying it here so the user can know how many bytes are available to read.

@jemc take a look at my other comment about storing the size.

I believe you are right, fun _deserialise(...) should be fun ref _deserialise(...).

jemc · 2016-10-25T03:59:01Z

text/0000-custom-serialization.md

+## Methods
+
+All of the following methods must be implemented for custom serialization:
+* `fun _serialise_space(): USize` -- returns the number of bytes to reserve for custom serialization


I think this should be fun tag to make sure the result is a "constant". That is, a fun tag with no arguments should always produce the same result every time, barring any use of capability-insecure ambient authority.

I'm mainly concerned about the user accessing fields to produce the USize result for this method - it seems like this needs to be a constant value so it will be the same on the serialise end and the deserialise end.

Never mind this point - it seems like the ability to read from fields is part of your intention. I suppose as long as the _serialise_space is written as part of the serialised representation it can be read on the other side without calling the method.

The intention is that the API user is responsible for storing the size of the representation if that is required for deserialization. The size would be stored as part of the representation. I believe this is the ideal solution because it gives the developer more control over how much space is used by custom serialization, and also because storing the size and then passing it to the _deserialise function doesn't provide a real benefit in terms of safety, nor in terms of convenience in many cases.

Assume that we have an object with a field that is always serialized into a representation of the same size. In that case, storing the size and passing it as part of the call to _deserialise doesn't provide any useful information, because the API user already knows how many bytes to read and can therefore ignore the passed size. Storing the size in this case is wasted space in the representation.

Assume that we have an object with a field whose representation can vary in size (like an array of integers). In this case, the size could be used by the API user to avoid reading beyond the end of the byte array of the representation. The argument here is that there is a degree of safety given by providing the size and using it to stay within bounds. However, the API user is free to ignore that size parameter, or may make a mistake in implementing the logic that does bounds checking, at which point we are no safer that we were if we made the user responsible for the decisions of whether and how to pass size information.

Assume that we have an object with two pointer fields. Since there is only one serialization area for both objects, the overall size could be passed to _deserialise, but then the function would still have to deal with determining the size of each of the individual representations. As above, the only use I could see for having this size information is to use it to avoid reading off the end of the buffer, but the user could ignore this value and use it incorrectly, in which case it becomes useless.

Having Pony store the size and then pass it to the _deserialise function provides a small amound of convenience (but not safety) in some (but not all) cases; consequently I think that storing information about the size of a representation should be left to the API user.

If the serialised representation does have a way of indicating how many bytes of "user space" was granted (requested by the _serialise_space method), how is the Pony implementation able to deserialise the buffer? You point to the overhead of storing an "extra" size-indicating word, but surely the deserialisation mechanism will have to have some way of knowing where the user space ends and the next serialised object begins.

@jemc it depends on what you mean by "next serialized object begins". In the serialized representation, each field contains either a value (if the field stores a numeric value) or a "pointer" to the place in the buffer that represents object that goes in the field. So there isn't really a concept of "next" in the representation, only fields that point to other objects in the buffer. For all objects except Strings and Arrays, the size of the object's representation is dictated by the object itself, so there is no need to store the size. In the case of Strings and Arrays, the size is stored because these data structures are inherently variable in length.

Just to be clear here, deserialization (as currently implemented) doesn't work by linearly running through the buffer. Rather, the root object is the first object that appears in the buffer. From then on, the location in the buffer of other objects is determined by the location in the buffer indicated by the field pointer. You can actually stick big chunks of meaningless bytes in the representation and it will be fine as long as there are no fields that point to those bytes. There is no "next serialized object", only objects that were indicated by fields in another object.

Got it, thanks for explaining.

jemc · 2016-10-25T04:14:14Z

@aturley in my head I'm starting to go back to @Praetonus' original question about Pointer ref.

An Array[U8] can be "downgraded" to a Pointer[U8] tag, but a Pointer[U8] tag cannot become an Array[U8] (only a Pointer[U8] ref can). So if the user wanted to do something non-FFI with the bytes, they could not. Is there any reason not to use ref in the _deserialise call?

aturley · 2016-10-25T13:35:05Z

@jemc The intention for using tag instead of ref was to reenforce the idea that this feature is intended to be used in conjunction with FFI, since that's really the only way to use a Pointer[U8] tag. I can't come up with a good example of something useful that could be done with the Pointer[U8] ref without going through FFI, but that may simply be a lack of imagination on my part. Do you have any examples of things that one might want to do with the array from within Pony?

The ref cap for the `deserialise(...)` function should be `ref`, not default (`box`) because the function modifies the object's fields.

jemc · 2016-10-25T15:04:57Z

I can't come up with a good example of something useful that could be done with the Pointer[U8] ref without going through FFI, but that may simply be a lack of imagination on my part. Do you have any examples of things that one might want to do with the array from within Pony?

Let's iron out my other question about storing the size before coming back to this one. I imagine it will affect the answer.

jemc · 2016-10-28T15:40:30Z

Coming back to it, I'm now thinking that giving the user a Pointer[U8] ref is a bad idea. It would allow them to violate memory safety (by creating an Array[U8] with a bogus size and using that to access memory outside of the allocated block) without using FFI. Thus a package could violate memory safety without being in the FFI whitelist, breaking Pony guarantees.

So with that in mind, I think the RFC looks good as is (now that the fun _deserialise is now fun ref _deserialise).

Praetonus · 2016-10-31T10:29:45Z

It's been more than a week since the beginning of final comment period, should it be marked ready for vote @aturley?

aturley · 2016-11-01T22:35:31Z

@Praetonus yes, I think this is ready for voting.

aturley added 2 commits October 7, 2016 18:31

Add RFC for custom serialization

d431233

Provide more details about how this interacts with existing serializa…

70f1596

…tion Also fix typos.

aturley added the status - final comment period The RFC is finalized. Waiting for final comments. label Oct 20, 2016

jemc reviewed Oct 25, 2016

View reviewed changes

Update _deserialise(...) ref cap

5c6c283

The ref cap for the `deserialise(...)` function should be `ref`, not default (`box`) because the function modifies the object's fields.

aturley added status - ready for vote The RFC is ready to be voted on. and removed status - final comment period The RFC is finalized. Waiting for final comments. labels Nov 1, 2016

jemc merged commit 5928e4b into ponylang:master Nov 2, 2016

jemc removed the status - ready for vote The RFC is ready to be voted on. label Nov 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom serialization #41

Custom serialization #41

aturley commented Oct 11, 2016 •

edited

Loading

SeanTAllen commented Oct 11, 2016

Praetonus commented Oct 13, 2016

jemc commented Oct 13, 2016

aturley commented Oct 17, 2016

Praetonus commented Oct 17, 2016

SeanTAllen commented Oct 20, 2016

aturley commented Oct 20, 2016

jemc Oct 25, 2016

jemc Oct 25, 2016

aturley Oct 25, 2016

aturley Oct 25, 2016

jemc Oct 25, 2016

jemc Oct 25, 2016

aturley Oct 25, 2016

jemc Oct 25, 2016

aturley Oct 28, 2016 •

edited

Loading

jemc Oct 28, 2016

jemc commented Oct 25, 2016

aturley commented Oct 25, 2016

jemc commented Oct 25, 2016

jemc commented Oct 28, 2016

Praetonus commented Oct 31, 2016

aturley commented Nov 1, 2016

Custom serialization #41

Custom serialization #41

Conversation

aturley commented Oct 11, 2016 • edited Loading

SeanTAllen commented Oct 11, 2016

Praetonus commented Oct 13, 2016

jemc commented Oct 13, 2016

aturley commented Oct 17, 2016

Praetonus commented Oct 17, 2016

SeanTAllen commented Oct 20, 2016

aturley commented Oct 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aturley Oct 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jemc commented Oct 25, 2016

aturley commented Oct 25, 2016

jemc commented Oct 25, 2016

jemc commented Oct 28, 2016

Praetonus commented Oct 31, 2016

aturley commented Nov 1, 2016

aturley commented Oct 11, 2016 •

edited

Loading

aturley Oct 28, 2016 •

edited

Loading