Skip to content

Commit

Permalink
Merge pull request #2297 from nlohmann/issue2286
Browse files Browse the repository at this point in the history
Add support for high-precision numbers in UBJSON encoding
  • Loading branch information
nlohmann authored Jul 25, 2020
2 parents 40b78d3 + 980f8c6 commit 63e7c40
Show file tree
Hide file tree
Showing 8 changed files with 3,693 additions and 3,469 deletions.
2 changes: 1 addition & 1 deletion doc/mkdocs/docs/features/binary_formats/ubjson.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ number_unsigned | 128..255 | uint8 | `U`
number_unsigned | 256..32767 | int16 | `I`
number_unsigned | 32768..2147483647 | int32 | `l`
number_unsigned | 2147483648..9223372036854775807 | int64 | `L`
number_unsigned | 2147483649..18446744073709551615 | high-precision | `H`
number_float | *any value* | float64 | `D`
string | *with shortest length indicator* | string | `S`
array | *see notes on optimized format* | array | `[`
Expand All @@ -44,7 +45,6 @@ object | *see notes on optimized format* | map | `{`
The following values can **not** be converted to a UBJSON value:

- strings with more than 9223372036854775807 bytes (theoretical)
- unsigned integer numbers above 9223372036854775807

!!! info "Unused UBJSON markers"

Expand Down
14 changes: 14 additions & 0 deletions doc/mkdocs/docs/home/exceptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,16 @@ The parsing of the corresponding BSON record type is not implemented (yet).
[json.exception.parse_error.114] parse error at byte 5: Unsupported BSON record type 0xFF
```

### json.exception.parse_error.115

A UBJSON high-precision number could not be parsed.

!!! failure "Example message"

```
[json.exception.parse_error.115] parse error at byte 5: syntax error while parsing UBJSON high-precision number: invalid number text: 1A
```

## Iterator errors

This exception is thrown if iterators passed to a library function do not match
Expand Down Expand Up @@ -765,6 +775,10 @@ UBJSON and BSON only support integer numbers up to 9223372036854775807.
number overflow serializing '9223372036854775808'
```

!!! note

Since version 3.9.0, integer numbers beyond int64 are serialized as high-precision UBJSON numbers, and this exception does not further occur.

### json.exception.out_of_range.408

The size (following `#`) of an UBJSON array or object exceeds the maximal capacity.
Expand Down
3 changes: 2 additions & 1 deletion include/nlohmann/detail/exceptions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ json.exception.parse_error.110 | parse error at 1: cannot read 2 bytes from vect
json.exception.parse_error.112 | parse error at 1: error reading CBOR; last byte: 0xF8 | Not all types of CBOR or MessagePack are supported. This exception occurs if an unsupported byte was read.
json.exception.parse_error.113 | parse error at 2: expected a CBOR string; last byte: 0x98 | While parsing a map key, a value that is not a string has been read.
json.exception.parse_error.114 | parse error: Unsupported BSON record type 0x0F | The parsing of the corresponding BSON record type is not implemented (yet).
json.exception.parse_error.115 | parse error at byte 5: syntax error while parsing UBJSON high-precision number: invalid number text: 1A | A UBJSON high-precision number could not be parsed.
@note For an input with n bytes, 1 is the index of the first character and n+1
is the index of the terminating null byte or the end of file. This also
Expand Down Expand Up @@ -285,7 +286,7 @@ json.exception.out_of_range.403 | key 'foo' not found | The provided key was not
json.exception.out_of_range.404 | unresolved reference token 'foo' | A reference token in a JSON Pointer could not be resolved.
json.exception.out_of_range.405 | JSON pointer has no parent | The JSON Patch operations 'remove' and 'add' can not be applied to the root element of the JSON value.
json.exception.out_of_range.406 | number overflow parsing '10E1000' | A parsed number could not be stored as without changing it to NaN or INF.
json.exception.out_of_range.407 | number overflow serializing '9223372036854775808' | UBJSON and BSON only support integer numbers up to 9223372036854775807. |
json.exception.out_of_range.407 | number overflow serializing '9223372036854775808' | UBJSON and BSON only support integer numbers up to 9223372036854775807. (until version 3.8.0) |
json.exception.out_of_range.408 | excessive array size: 8658170730974374167 | The size (following `#`) of an UBJSON array or object exceeds the maximal capacity. |
json.exception.out_of_range.409 | BSON key cannot contain code point U+0000 (at byte 2) | Key identifiers to be serialized to BSON cannot contain code point U+0000, since the key is stored as zero-terminated c-string |
Expand Down
55 changes: 55 additions & 0 deletions include/nlohmann/detail/input/binary_reader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <nlohmann/detail/exceptions.hpp>
#include <nlohmann/detail/input/input_adapters.hpp>
#include <nlohmann/detail/input/json_sax.hpp>
#include <nlohmann/detail/input/lexer.hpp>
#include <nlohmann/detail/macro_scope.hpp>
#include <nlohmann/detail/meta/is_sax.hpp>
#include <nlohmann/detail/value_t.hpp>
Expand Down Expand Up @@ -2004,6 +2005,11 @@ class binary_reader
return get_number(input_format_t::ubjson, number) && sax->number_float(static_cast<number_float_t>(number), "");
}

case 'H':
{
return get_ubjson_high_precision_number();
}

case 'C': // char
{
get();
Expand Down Expand Up @@ -2180,6 +2186,55 @@ class binary_reader
// Note, no reader for UBJSON binary types is implemented because they do
// not exist

bool get_ubjson_high_precision_number()
{
// get size of following number string
std::size_t size{};
auto res = get_ubjson_size_value(size);
if (JSON_HEDLEY_UNLIKELY(!res))
{
return res;
}

// get number string
std::vector<char> number_vector;
for (std::size_t i = 0; i < size; ++i)
{
get();
if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::ubjson, "number")))
{
return false;
}
number_vector.push_back(static_cast<char>(current));
}

// parse number string
auto number_ia = detail::input_adapter(std::forward<decltype(number_vector)>(number_vector));
auto number_lexer = detail::lexer<BasicJsonType, decltype(number_ia)>(std::move(number_ia), false);
const auto result_number = number_lexer.scan();
const auto number_string = number_lexer.get_token_string();
const auto result_remainder = number_lexer.scan();

using token_type = typename detail::lexer_base<BasicJsonType>::token_type;

if (JSON_HEDLEY_UNLIKELY(result_remainder != token_type::end_of_input))
{
return sax->parse_error(chars_read, number_string, parse_error::create(115, chars_read, exception_message(input_format_t::ubjson, "invalid number text: " + number_lexer.get_token_string(), "high-precision number")));
}

switch (result_number)
{
case token_type::value_integer:
return sax->number_integer(number_lexer.get_number_integer());
case token_type::value_unsigned:
return sax->number_unsigned(number_lexer.get_number_unsigned());
case token_type::value_float:
return sax->number_float(number_lexer.get_number_float(), std::move(number_string));
default:
return sax->parse_error(chars_read, number_string, parse_error::create(115, chars_read, exception_message(input_format_t::ubjson, "invalid number text: " + number_lexer.get_token_string(), "high-precision number")));
}
}

///////////////////////
// Utility functions //
///////////////////////
Expand Down
46 changes: 34 additions & 12 deletions include/nlohmann/detail/output/binary_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1319,7 +1319,17 @@ class binary_writer
}
else
{
JSON_THROW(out_of_range::create(407, "integer number " + std::to_string(n) + " cannot be represented by UBJSON as it does not fit int64"));
if (add_prefix)
{
oa->write_character(to_char_type('H')); // high-precision number
}

const auto number = BasicJsonType(n).dump();
write_number_with_ubjson_prefix(number.size(), true);
for (std::size_t i = 0; i < number.size(); ++i)
{
oa->write_character(to_char_type(static_cast<std::uint8_t>(number[i])));
}
}
}

Expand Down Expand Up @@ -1373,19 +1383,23 @@ class binary_writer
// LCOV_EXCL_START
else
{
JSON_THROW(out_of_range::create(407, "integer number " + std::to_string(n) + " cannot be represented by UBJSON as it does not fit int64"));
if (add_prefix)
{
oa->write_character(to_char_type('H')); // high-precision number
}

const auto number = BasicJsonType(n).dump();
write_number_with_ubjson_prefix(number.size(), true);
for (std::size_t i = 0; i < number.size(); ++i)
{
oa->write_character(to_char_type(static_cast<std::uint8_t>(number[i])));
}
}
// LCOV_EXCL_STOP
}

/*!
@brief determine the type prefix of container values
@note This function does not need to be 100% accurate when it comes to
integer limits. In case a number exceeds the limits of int64_t,
this will be detected by a later call to function
write_number_with_ubjson_prefix. Therefore, we return 'L' for any
value that does not fit the previous limits.
*/
CharType ubjson_prefix(const BasicJsonType& j) const noexcept
{
Expand Down Expand Up @@ -1415,8 +1429,12 @@ class binary_writer
{
return 'l';
}
// no check and assume int64_t (see note above)
return 'L';
if ((std::numeric_limits<std::int64_t>::min)() <= j.m_value.number_integer && j.m_value.number_integer <= (std::numeric_limits<std::int64_t>::max)())
{
return 'L';
}
// anything else is treated as high-precision number
return 'H'; // LCOV_EXCL_LINE
}

case value_t::number_unsigned:
Expand All @@ -1437,8 +1455,12 @@ class binary_writer
{
return 'l';
}
// no check and assume int64_t (see note above)
return 'L';
if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int64_t>::max)()))
{
return 'L';
}
// anything else is treated as high-precision number
return 'H'; // LCOV_EXCL_LINE
}

case value_t::number_float:
Expand Down
3 changes: 2 additions & 1 deletion include/nlohmann/json.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7201,6 +7201,7 @@ class basic_json
number_unsigned | 256..32767 | int16 | `I`
number_unsigned | 32768..2147483647 | int32 | `l`
number_unsigned | 2147483648..9223372036854775807 | int64 | `L`
number_unsigned | 2147483649..18446744073709551615 | high-precision | `H`
number_float | *any value* | float64 | `D`
string | *with shortest length indicator* | string | `S`
array | *see notes on optimized format* | array | `[`
Expand All @@ -7211,7 +7212,6 @@ class basic_json
@note The following values can **not** be converted to a UBJSON value:
- strings with more than 9223372036854775807 bytes (theoretical)
- unsigned integer numbers above 9223372036854775807
@note The following markers are not used in the conversion:
- `Z`: no-op values are not created.
Expand Down Expand Up @@ -7687,6 +7687,7 @@ class basic_json
int16 | number_integer | `I`
int32 | number_integer | `l`
int64 | number_integer | `L`
high-precision number | number_integer, number_unsigned, or number_float - depends on number string | 'H'
string | string | `S`
char | string | `C`
array | array (optimized values are supported) | `[`
Expand Down
Loading

0 comments on commit 63e7c40

Please sign in to comment.