-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Binary value type for optimized binary arrays #6
Conversation
@fangq Have you perhaps had a chance to consider this proposal? Thanks in advance! |
@nebkat, thanks for the patch and rationales above. I understand the needs for semantically separating byte vs uint8/char. my main concern is that not all programming environments have this differentiation - for example, MATLAB has uint8 and char, but it does not have a native byte type; in other languages, this often fall back to uint8 type or is an alias - when some language do support this - such as python bytearray vs np.uint8, then it adds additional burden of conversion between. other than refined semantics, can you provide some use cases where such distinction offers notably better data handling? |
@fangq My particular application is in C++ Currently nlohmann/json will encode it's internal "binary" type as a nlohmann::json test = nonstd::json::binary(std::vector<uint8_t> { 1 });
test != json::from_bjdata(json::to_bjdata(test, true, true));// 🙁 If this was changed to perform binary decoding by default then similarly: nlohmann::json test = nonstd::json::array({ 1 });
test != json::from_bjdata(json::to_bjdata(test, true, true)); // 🙁 While this allows the binary data to remain efficient (by a factor of 16), it introduces a nasty problem: as the array type in nlohmann::json small = nonstd::json::array({ 1 });
nlohmann::json big = nonstd::json::array({ 1024 });
small != json::from_bjdata(json::to_bjdata(small, true, true)); // 🙁
big == json::from_bjdata(json::to_bjdata(big, true, true)); I have not used the Python package as much, but I suspect the same occurs when encoding an As for individual That said, I also see no harm in permitting it as an additional semantic type beside If this was accepted I think the path forward would be to introduce flags in libraries to continue encoding/decoding I have a patch ready for nebkat/nlohmann-json@37066db, can provide patches for the other libraries if needed, and will hopefully soon have a Dart library ready: nebkat/dart-bjdata. |
@fangq Sorry to ping again - we are approaching a production release on an embedded project where we make extensive use of binary arrays. Currently we are using this proposal as an unofficial extension but we would love to avoid further fragmentation in case a different solution is eventually accepted (as we will have to support this indefinitely going forward). As mentioned above the current implementation of nlohmann/json is essentially unusable with binary data, and there is no solution that does not further break encode/decode idempotence. I have since also implemented the necessary changes in nebkat/pybj@afa0a23. Would really appreciate if we could get this or an alternative solution approved before we are locked in with our release! |
Thanks, I think this is meaningful addition to bjdata, and I am happy to merge this to the bjdata spec. when your pybj patch is ready, happy to merge it and make a new release. I will also work on my matlab/octave and javascript bjdata parsers. |
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information.
Introduces a dedicated `B` marker for bytes. This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of decoding as an integer array). See NeuroJSON/bjdata#6 for further information. Signed-off-by: Nebojsa Cvetkovic <nebkat@gmail.com>
The original UBJSON solution for binary data was an array of
uint8
values. While this does sufficiently address the encoding of such data in the UBJSON format, it does not allow parsers to differentiate between a generic list of numbers and binary data.When dealing with large quantities of binary data this can have a significant negative impact on performance, as many languages provide optimized storage for binary data that is much more efficient than a standard array.
In the nlohmann C++ JSON library for example, a standard array can require 16 bytes per byte of data, while an optimized binary format would require exactly one.
The introduction of the other unsigned data types in BJData furthers the need for a dedicated byte type.
uint8
is no longer the lone unsigned data type, and for parsers to treatuint8
arrays differently as suggested in the UBJSON solution would lead to further confusion.This proposal aims to address this issue with the introduction of a dedicated
byte
(B
) type. This type would be identical to auint8
, but would be explicitly recommended for serializers/parsers to implement as an optimized data format type. Where such a type is not available, or parsers have not been upgraded to support the format, a standard integer array can be used instead.C++ provides
std::vector<std::byte or uint8_t>
, JavaScript providesUint8Array
, Dart providesUint8List
and Python providesbytearray
.UBJSON also states:
This solution does not fundamentally add any complexity, and without it many may be forced to use these other data formats along with all their baggage in order to achieve the desired efficiency.