-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom serialization #41
Conversation
I'm quite excited for this |
How could |
@Praetonus - I think it's assumed that FFI would be involved, in which users can do pretty much anything with a pointer. That is, I imagine any object using a |
@Praetonus what @jemc said is correct, the assumption here is that the user will use FFI for most of the implementation. This seems like a reasonable assumption, because the feature is intended to be used with classes that have pointer fields, which implies that FFI is already being used somewhere else in the code. |
Ok, makes sense. |
@aturley you ready to take this to final comment? |
Yes, definitely. |
All of the following methods must be implemented for custom serialization: | ||
* `fun _serialise_space(): USize` -- returns the number of bytes to reserve for custom serialization | ||
* `fun _serialise(bytes: Pointer[U8] tag)` -- takes in a pointer to the location in the serialization buffer that has been reserved for this object's extra data, writes a serialized representation of its data to the buffer | ||
* `fun _deserialise(bytes: Pointer[U8] tag)` -- takes in a pointer to the location in the deserialization buffer that represents the object's extra data, reads the data out, and modifies the object using that data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this one needs to be fun ref
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method needs to be passed a second argument - the space: USize
that was determined on the other side by _serialise_space
. Given that the result of _serialise_space
is allowed to not be constant (allowed to vary based on the object instance being serialised), it seems like we need to be conveying it here so the user can know how many bytes are available to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jemc take a look at my other comment about storing the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you are right, fun _deserialise(...)
should be fun ref _deserialise(...)
.
## Methods | ||
|
||
All of the following methods must be implemented for custom serialization: | ||
* `fun _serialise_space(): USize` -- returns the number of bytes to reserve for custom serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fun tag
to make sure the result is a "constant". That is, a fun tag
with no arguments should always produce the same result every time, barring any use of capability-insecure ambient authority.
I'm mainly concerned about the user accessing fields to produce the USize
result for this method - it seems like this needs to be a constant value so it will be the same on the serialise
end and the deserialise
end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind this point - it seems like the ability to read from fields is part of your intention. I suppose as long as the _serialise_space
is written as part of the serialised representation it can be read on the other side without calling the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is that the API user is responsible for storing the size of the representation if that is required for deserialization. The size would be stored as part of the representation. I believe this is the ideal solution because it gives the developer more control over how much space is used by custom serialization, and also because storing the size and then passing it to the _deserialise
function doesn't provide a real benefit in terms of safety, nor in terms of convenience in many cases.
Assume that we have an object with a field that is always serialized into a representation of the same size. In that case, storing the size and passing it as part of the call to _deserialise
doesn't provide any useful information, because the API user already knows how many bytes to read and can therefore ignore the passed size. Storing the size in this case is wasted space in the representation.
Assume that we have an object with a field whose representation can vary in size (like an array of integers). In this case, the size could be used by the API user to avoid reading beyond the end of the byte array of the representation. The argument here is that there is a degree of safety given by providing the size and using it to stay within bounds. However, the API user is free to ignore that size parameter, or may make a mistake in implementing the logic that does bounds checking, at which point we are no safer that we were if we made the user responsible for the decisions of whether and how to pass size information.
Assume that we have an object with two pointer fields. Since there is only one serialization area for both objects, the overall size could be passed to _deserialise
, but then the function would still have to deal with determining the size of each of the individual representations. As above, the only use I could see for having this size information is to use it to avoid reading off the end of the buffer, but the user could ignore this value and use it incorrectly, in which case it becomes useless.
Having Pony store the size and then pass it to the _deserialise
function provides a small amound of convenience (but not safety) in some (but not all) cases; consequently I think that storing information about the size of a representation should be left to the API user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the serialised representation does have a way of indicating how many bytes of "user space" was granted (requested by the _serialise_space
method), how is the Pony implementation able to deserialise the buffer? You point to the overhead of storing an "extra" size-indicating word, but surely the deserialisation mechanism will have to have some way of knowing where the user space ends and the next serialised object begins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jemc it depends on what you mean by "next serialized object begins". In the serialized representation, each field contains either a value (if the field stores a numeric value) or a "pointer" to the place in the buffer that represents object that goes in the field. So there isn't really a concept of "next" in the representation, only fields that point to other objects in the buffer. For all objects except String
s and Array
s, the size of the object's representation is dictated by the object itself, so there is no need to store the size. In the case of String
s and Array
s, the size is stored because these data structures are inherently variable in length.
Just to be clear here, deserialization (as currently implemented) doesn't work by linearly running through the buffer. Rather, the root object is the first object that appears in the buffer. From then on, the location in the buffer of other objects is determined by the location in the buffer indicated by the field pointer. You can actually stick big chunks of meaningless bytes in the representation and it will be fine as long as there are no fields that point to those bytes. There is no "next serialized object", only objects that were indicated by fields in another object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for explaining.
@aturley in my head I'm starting to go back to @Praetonus' original question about An |
@jemc The intention for using |
The ref cap for the `deserialise(...)` function should be `ref`, not default (`box`) because the function modifies the object's fields.
Let's iron out my other question about storing the size before coming back to this one. I imagine it will affect the answer. |
Coming back to it, I'm now thinking that giving the user a So with that in mind, I think the RFC looks good as is (now that the |
It's been more than a week since the beginning of final comment period, should it be marked ready for vote @aturley? |
@Praetonus yes, I think this is ready for voting. |
Rendered