-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional padding after the length of arrays without index tables? #38
Comments
Not sure if I understand the problem entirely, but the padding in general solves the following problem when building VelocyPack objects/arrays programmatically:
Upon opening the array, we always add the padding. And actual payload is added at the start byte + 9 bytes offset.
That latter reason is why there is optional padding allowed for array types. And as it is always unclear at start (upon opening) how many array members and payload size will follow, the padding is always considered a valid option. We can of course force memmoving the data, so there will never be padding. I agree this would simplify decoding logic, however, wouldn't that penalize larger arrays performance-wise, when there is no choice whether there can be padding or not? |
Just to clarify, I'm suggesting a third option for this case: Leave the array type as I'm not saying I don't see the value of leaving 8 bytes for the size, and then not moving the data; I just don't see the advantage of using, say, type |
WDYT about making the padding more controllable in the Builder, e.g. via #39? |
Let's use an example, an array consisting of the single number 1. We have seen that, assuming that the encoder doesn't know the (size of the) total size in advance and leaves 8 bytes for the size before encoding the array elements, there are three natural ways to encode this array without an index table:
To answer your question, it sounds like a nice idea to make the encoder more flexible! Perhaps with two configuration parameters:
My suggestion here is unchanged, rather than giving the choice between the 1st and 2nd possibilities I'd give it between the 1st and 3rd, get rid of the size padding for types 0x02 thru 0x04 entirely, and then simplify the decoder accordingly... Of course either configuration parameter can be implemented independent of the other, and one might want to add a further parameter that says "until size x use memmove, above that use the type 0x05 (or the one that was pre-chosen by the other config parameter). Does this make sense? |
I see. However, for downwards-compatibility the |
Sounds good, then the padding with 0x02/3/4 and zero-bytes can be deprecated and at some point removed. However regarding backwards compatibility, the important part is that the padding with 0x02/3/4 and zero-bytes is still recognised and can still be decoded and used, not that it can still be generated? PS: I didn't want to necessarily recommend adding all possible tuning knobs, I was just wondering out loud what else might make sense in the given context. |
Yes, our implementation will recognize the 0x02, 0x03, 0x04 and 0x05 formats. Regarding generating the "new" format: I am pretty sure other implementations will be able to handle 0x05 types with "small" contents, but I am not 100% sure as I haven't tested most of the other implementations so far. |
For arrays without index tables I can see no benefit in allowing the optional padding after the length. It adds complexity to both the encoder (if it wants to use this feature) and the decoder (which always needs to check for this case) without saving any space in the encoding.
In other words, I doubt that decoding a, say, 16bit integer and then checking for whether there is a zero-byte, and if so then checking for 6 zero-bytes, is noticeably slower than simply decoding a 64bit integer...
If using this padding with value types
0x02
,0x03
and0x04
effectively does the same as using0x05
in the first place, could0x05
not be made the only possibility for encoding a length in 8 bytes?(It's not like the case of arrays with index tables where the encoding of other parts of the array depends on the size of the encoding of the length of the array.)
The text was updated successfully, but these errors were encountered: