-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink the serde
[u8; 16]
binary representation
#557
Comments
From @mitsuhiko #553 (comment)
|
From @Icarium-Lifestealer on Reddit:
|
[u8; 16]
binary representationserde
[u8; 16]
binary representation
Just a random idea, but how about a feature flag for the different formats. For example, a |
The problem with this is conflicting feature flags downstream, when different libraries decide to use one or other format by default. |
Hmm, I wouldn't consider serializing as If we are going to change the format at all, I think |
So if someone wanted to use a specific format for their UUIDs in their application, how would they go about doing it without manually implementing Serialize or Deserialize? |
@rakshith-ravi They'd have to manually implement |
We could equally well state this problem as an issue with the format, rather than with serde/uuid. I suppose the tuple serializing does not give the information upfront that the fields all have the same time, so perhaps this indicates that serde should have a way to serialize/deserialize fixed length arrays (currently you get either fixed size in tuples for fixed element type in sequences, but not both). |
If you peel the crossed out sticker off that fourth point I was wondering the same thing 🙂 Some kind of |
Unless there’s some clear desire to figure out |
I think the problem is not so much that serde does not have array serialization but that serde does not have a good strategy for bytes. In most formats there is a significant difference between bytes and arrays or vectors of other types. That's why It's true that serde generates out tuple serialization code for any fixed size array and I think this is absolutely valid for when you're working with non |
There is no way to support this in serde because of an unfortunate decision to have different trait bounds for |
Serde wouldn't need to explicitly support |
IMO going through |
I still think the most natural approach is to serialize a
At least for the What binary serde formats would be adversely affected by serlializing as a |
Any binary JSON-style format: serde-cbor, serde-smile, serde-bson, etc. |
It’s sounding like
Are there any thoughts on |
I'd be worried about the portability of u128 across formats - for example, serde-cbor can't encode them. |
@sfackler Ah ok, is it that it simply can’t encode them? Or that it needs to use a fallback? I’ve seen some formats route 128bit numbers through I’d kind of expected 128bit number support to be pretty good by now (at least not hotting the error path in the majority of formats), but maybe it’s just not yet. |
Ok, I've had a look at a couple binary formats to see how size compares between
It looks like 128bit numbers actually are rather spottily supported indeed, so I think that rules But
I'd be keen to add other non-human-readable formats to this list and evaluate them, but on balance |
I think @sfackler's original concern about the lazily written tuple performance is also justified though. Writing a struct with a single field containing the various representations on my M1 MacBook with maximum optimizations results in something like:
It's more expensive in all cases except for This still seems to be the case even when we amortize the cost of re-allocating the
|
I started writing up a proposal for supporting fixed-size byte arrays in Since I think that leaves us picking between the trade-offs of
|
Maybe we can do a bit of a reaction poll to gauge the current temperature. If it's not possible to have everything through explicit fixed-size-byte-array support should we:
(I couldn't think of which option should get the 👍 vs the 👎 so we get rockets and party poppers) |
If we want to hit the |
Yeh, the old implementation would call |
So, based on everything we've learned about the trade-offs of It's been great to really try dig into this, I appreciate everyone's involvement and I'll still give serde-rs/serde#2120 some time to play out, but realistically I expect we'll be back at |
Blocking #553
Following #329
The currently proposed stable
uuid
API changes theserde
implementation from&[u8]
to[u8; 16]
. This is a breaking change to existing serialized data.It's been pointed out on Reddit that
[u8; 16]
isn't really a universal improvement over&[u8]
for binary formats, because some can treat byte slices specially.Before stabilizing, we should try come to some consensus on this one way or another. If we can't, then I think we should fall back to the status-quo of
&[u8]
. So I think the options are:&[u8]
: the current approach. Can be optimized by some binary formats compared to tuples/sequences. Produces a redundant length field. May be possible to do bytemucking to re-interpret serialized bits directly as a UUID.[u8; 16]
: what's onmain
. Is better for some binary formats that don't have an overhead per field on sequences, but is less optimal for some binary formats than&[u8]
. Will cause breakage for existing data.u128
: what I'm currently proposing. Can be formatted optimally by any binary format. Isn't necessarily guaranteed to be supported by all serializers/deserializers. The encoded UUID won't look like a UUID. Will cause breakage for existing data.Serializer::serialize_byte_array<const N: usize>(bytes: &[u8; N])
inserde
that forwards to&[u8]
?cc @smarnach @Marwes @mitsuhiko (as having originally provided input on the
serde
format)Also just to re-iterate, this only affects non-human-readable formats like
bincode
. Other formats likeserde-json
are still going to use hyphenated strings by default.The text was updated successfully, but these errors were encountered: