Skip to content

Commit

Permalink
Merge pull request #60 from TysonAndre/decode-integer-test
Browse files Browse the repository at this point in the history
Document differences from json_decode()
  • Loading branch information
crazyxman authored Sep 29, 2022
2 parents 104b90e + 54727fb commit c0e244e
Show file tree
Hide file tree
Showing 4 changed files with 141 additions and 13 deletions.
86 changes: 77 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,7 @@ extension=simdjson.so

## simdjson_php Usage
```php

//Check if a JSON string is valid:
$isValid = simdjson_is_valid($jsonString); //return bool

//Parsing a JSON string. similar to the json_decode() function but without the fourth argument
$parsedJSON = simdjson_decode($jsonString, true, 512); //return array|object|null. "null" string is not a standard json

/*
$jsonString = <<<'JSON'
{
"Image": {
"Width": 800,
Expand All @@ -68,7 +61,19 @@ $parsedJSON = simdjson_decode($jsonString, true, 512); //return array|object|nul
"IDs": [116, 943, 234, 38793, {"p": "30"}]
}
}
*/
JSON;

//Check if a JSON string is valid:
$isValid = simdjson_is_valid($jsonString); //return bool
var_dump($isValid); // true

//Parsing a JSON string. similar to the json_decode() function but without the fourth argument
try {
$parsedJSON = simdjson_decode($jsonString, true, 512); //return array|object|null. "null" string is not a standard json
var_dump($parsedJSON); // PHP array
} catch (RuntimeException $e) {
echo "Failed to parse $jsonString: {$e->getMessage()}\n";
}

//note. "/" is a separator. Can be used as the "key" of the object and the "index" of the array
//E.g. "Image/Thumbnail/Url" is ok.
Expand Down Expand Up @@ -97,5 +102,68 @@ var_dump($res) //int(5)

```

## simdjson_php API

```php
<?php
/**
* Similar to json_decode()
*
* @returns array|stdClass|string|float|int|bool|null
* @throws RuntimeException for invalid JSON (or document over 4GB, or out of range integer/float)
*/
function simdjson_decode(string $json, bool $assoc = false, int $depth = 512) {}

/**
* Returns true if json is valid.
*
* @returns ?bool (null if depth is invalid)
*/
function simdjson_is_valid(string $json, int $depth = 512) : ?bool {}

/**
* Parses $json and returns the number of keys in $json matching the JSON pointer $key
*
* @returns ?bool (null if depth is invalid)
*/
function simdjson_key_count(string $json, string $key, int $depth = 512) : ?int {}

/**
* Returns true if the JSON pointer $key could be found.
*
* @returns ?bool (null if depth is invalid, false if json is invalid or key is not found)
*/
function simdjson_key_exists(string $json, string $key, int $depth = 512) : ?bool {}

/**
* Returns the value at $key
*
* @returns array|stdClass|string|float|int|bool|null the value at $key
* @throws RuntimeException for invalid JSON (or document over 4GB, or out of range integer/float)
*/
function simdjson_key_value(string $json, string $key, bool $assoc = unknown, int $depth = unknown) {}
```

## Edge cases

There are some differences from `json_decode()` due to the implementation of the underlying simdjson library. This will throw a RuntimeException if simdjson rejects the JSON.

1) `simdjson_decode()` how out of range 64-bit integers and floats are handled.

See https://github.com/simdjson/simdjson/blob/master/doc/basics.md#standard-compliance

> - The specification allows implementations to set limits on the range and precision of numbers accepted. We support 64-bit floating-point numbers as well as integer values.
> - We parse integers and floating-point numbers as separate types which allows us to support all signed (two's complement) 64-bit integers, like a Java `long` or a C/C++ `long long` and all 64-bit unsigned integers. When we cannot represent exactly an integer as a signed or unsigned 64-bit value, we reject the JSON document.
> - We support the full range of 64-bit floating-point numbers (binary64). The values range from `std::numeric_limits<double>::lowest()` to `std::numeric_limits<double>::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document. Numbers are parsed with a perfect accuracy (ULP 0): the nearest floating-point value is chosen, rounding to even when needed. If you serialized your floating-point numbers with 17 significant digits in a standard compliant manner, the simdjson library is guaranteed to recover the same numbers, exactly.
2) The maximum string length that can be passed to `simdjson_decode()` is 4GiB (4294967295 bytes).
`json_decode()` can decode longer strings.

3) The handling of max depth is counted slightly differently for empty vs non-empty objects/arrays.
In `json_decode`, an array with a scalar has the same depth as an array with no elements.
In `simdjson_decode`, an array with a scalar is one level deeper than an array with no elements.
For typical use cases, this shouldn't matter.
(e.g. `simdjson_decode('[[]]', true, 2)` will succeed but `json_decode('[[]]', true, 2)` and `simdjson_decode('[[1]]', true, 2)` will fail.)

## Benchmarks
See the [benchmark](./benchmark) folder for more benchmarks.
1 change: 1 addition & 0 deletions package.xml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
<file name="64bit_support.phpt" role="test"/>
<file name="decode_args.phpt" role="test"/>
<file name="decode_exception.phpt" role="test"/>
<file name="decode_integer_overflow.phpt" role="test"/>
<file name="decode_invalid_property.phpt" role="test"/>
<file name="decode_max_depth.phpt" role="test"/>
<file name="decode_result.phpt" role="test"/>
Expand Down
54 changes: 54 additions & 0 deletions tests/decode_integer_overflow.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
--TEST--
simdjson_decode throws for integer syntax out of signed/unsigned 64-bit range due to C simdjson library
--SKIPIF--
<?php if (PHP_INT_SIZE < 8) echo "skip 64-bit test only\n"; ?>
--INI--
; in php 8.0 var_dump started using serialize_precision instead of precision
serialize_precision=20
precision=20
--FILE--
<?php
// https://github.com/simdjson/simdjson/blob/master/doc/basics.md#standard-compliance
// > - The specification allows implementations to set limits on the range and precision of numbers accepted. We support 64-bit floating-point numbers as well as integer values.
// > - We parse integers and floating-point numbers as separate types which allows us to support all signed (two's complement) 64-bit integers, like a Java `long` or a C/C++ `long long` and all 64-bit unsigned integers. When we cannot represent exactly an integer as a signed or unsigned 64-bit value, we reject the JSON document.
// > - We support the full range of 64-bit floating-point numbers (binary64). The values range from `std::numeric_limits<double>::lowest()` to `std::numeric_limits<double>::max()`, so from -1.7976e308 all the way to 1.7975e308. Extreme values (less or equal to -1e308, greater or equal to 1e308) are rejected: we refuse to parse the input document. Numbers are parsed with a perfect accuracy (ULP 0): the nearest floating-point value is chosen, rounding to even when needed. If you serialized your floating-point numbers with 17 significant digits in a standard compliant manner, the simdjson library is guaranteed to recover the same numbers, exactly.
function dump_result(string $x) {
echo "Testing " . var_export($x, true) . "\n";
try {
var_dump(simdjson_decode($x));
} catch (Exception $e) {
printf("Caught %s: %s\n", get_class($e), $e->getMessage());
}
}
dump_result('18446744073709551615');
dump_result('18446744073709551615.0');
dump_result('18446744073709551615E0');
dump_result('18446744073709551616'); // simdjson_decode throws but json_decode doesn't.
dump_result('18446744073709551616.0');
dump_result('-9223372036854775808');
dump_result('-9223372036854775809');
dump_result('-9223372036854775809.0');
dump_result('1e307');
dump_result('1e309');
?>
--EXPECT--
Testing '18446744073709551615'
float(18446744073709551616)
Testing '18446744073709551615.0'
float(18446744073709551616)
Testing '18446744073709551615E0'
float(18446744073709551616)
Testing '18446744073709551616'
Caught RuntimeException: Problem while parsing a number
Testing '18446744073709551616.0'
float(18446744073709551616)
Testing '-9223372036854775808'
int(-9223372036854775808)
Testing '-9223372036854775809'
Caught RuntimeException: Problem while parsing a number
Testing '-9223372036854775809.0'
float(-9223372036854775808)
Testing '1e307'
float(9.9999999999999998603E+306)
Testing '1e309'
Caught RuntimeException: Problem while parsing a number
13 changes: 9 additions & 4 deletions tests/depth.phpt
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ try {
}
var_dump(simdjson_decode('[1]', true, 2));
// XXX there's a difference between simdjson_decode and json_decode.
// In json_decode, an array with no elements has the same depth as an array of scalars.
// In simdjson_decode, an array with no elements is deeper than an array with no elements.
// For typical use cases this shouldn't matter.
// In json_decode, an array with a scalar has the same depth as an array with no elements.
// In simdjson_decode, an array with a scalar is deeper than an array with no elements.
// For typical use cases, this shouldn't matter.
try {
var_dump(simdjson_decode('[[1]]', true, 2));
} catch (RuntimeException $e) {
echo "Caught for [[1]]: {$e->getMessage()}\n";
}
var_dump(simdjson_decode('[[]]', true, 2));
var_dump(simdjson_decode('[[1]]', true, 3));

?>
--EXPECTF--
Warning: simdjson_decode(): Depth must be greater than zero in %s on line 2
Expand All @@ -43,6 +43,11 @@ array(1) {
int(1)
}
Caught for [[1]]: The JSON document was too deep (too many nested objects and arrays)
array(1) {
[0]=>
array(0) {
}
}
array(1) {
[0]=>
array(1) {
Expand Down

0 comments on commit c0e244e

Please sign in to comment.