Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] JSON Binary Encoding: Adds support for encoding uniform arrays #4866

Conversation

sboshra
Copy link
Contributor

@sboshra sboshra commented Nov 1, 2024

Description

Added full end-to-end support for writing and reading binary-encoded uniform number arrays, as well as nested arrays of uniform number arrays.

Uniform Number Arrays
A uniform number array is a JSON array where all items share the same numeric type. The encoding supports the following numeric types:

  • Int8: Signed 1-byte integer (-128 to 127)
  • UInt8: Unsigned 1-byte integer (0 to 255)
  • Int16: Signed 2-byte integer
  • Int32: Signed 4-byte integer
  • Int64: Signed 8-byte integer
  • Float16: 2-byte floating-point value (currently unsupported)
  • Float32: 4-byte floating-point value
  • Float64: 8-byte floating-point value

Uniform number arrays are represented by these new type markers:

  • ArrNumC1: Uniform number array with a 1-byte item count
  • ArrNumC2: Uniform number array with a 2-byte item count

Both type markers are encoded as follows:
| Type marker | Item type marker | Item count |

To maintain backward compatibility, writing uniform number arrays is controlled via the EnableNumberArrays write option. When enabled, at the end of writing an array, the writer checks if all values are numeric. It identifies the smallest numeric type that fits all values and compares the length of the uniform number array to the regular array. If the new length is less than or equal to the old one, the array is converted to a uniform number array.

Arrays of Uniform Number Arrays
This encoding enhancement allows for encoding multiple uniform number arrays with the same underlying numeric type and item count into a single contiguous array of numbers. The items in all arrays are preceded by a prefix indicating the common array encoding and the number of encoded arrays.

Arrays of uniform number arrays are supported by these two new type-markers:

  • ArrArrNumC1C1: Array of 1-byte item count of common uniform number arrays with 1-byte item count.
  • ArrArrNumC2C2: Array of 2-byte item count of common uniform number arrays with 2-byte item count.

Both new values are encoded as follows:
| Type-marker | Array type-marker | Number type-marker | Number item count | Array item count |

Similar to uniform number arrays, the writing of arrays of uniform number arrays is conditional on the EnableNumberArrays write option being specified. This ensures backward compatibility with readers and navigators that do not yet support this encoding.

JSON Serialization Testing

  • Introduced a new set of tests for both uniform number arrays and nested arrays of uniform number arrays.
  • Enhanced the JsonToken class to support representation of uniform number array tokens.
  • Updated JsonWriterTest to include additional validation. This now not only checks the expected output but also verifies round-trip consistency across different formats and write options for all three rewrite scenarios: JSON Navigator, JSON Reader - Write All, and JSON Reader - Write Current Token.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@sboshra sboshra changed the title [Internal] JSON Binary Encoding: Implement encoding of uniform arrays [Internal] JSON Binary Encoding: Added support for encoding uniform arrays Nov 1, 2024
@sboshra sboshra changed the title [Internal] JSON Binary Encoding: Added support for encoding uniform arrays [Internal] JSON Binary Encoding: Adds support for encoding uniform arrays Nov 1, 2024
neildsh
neildsh previously approved these changes Nov 3, 2024
@neildsh
Copy link
Contributor

neildsh commented Nov 3, 2024

In a future change, it may be good to consider adding some emulator equivalence tests that utilize these code paths. We can use supported serialization formats to get both binary and text responses and then transcribe the binary to text using the sdk and compare with the backend text response as a baseline. #Resolved

- Moved the round-trip execution and validation logic from `JsonWriterTests` to the shared utility class `JsonTestUtils`.
- Updated `JsonRoundTripTests` to utilize the common utility for execution and verification.
- Enhanced round-trip verification with the addition of a new rewrite scenario and improved error reporting.
- Added new set of tests to `JsonRoundTripTests` to cover uniform number arrays as well as arrays of uniform number arrays.

**JsonNavigator Fixes**
- Resolved an issue with writing a navigator node to an `IJsonWriter` instance when the node represents a uniform array or an item within such an array.
- Previously, when writing a root node and both the navigator and writer used the Binary serialization format, direct copying of the node was allowed. This is no longer valid as the write options between them may differ. This logic has been replaced with recursive node writing.

**Addressed PR Feedback**
@sboshra
Copy link
Contributor Author

sboshra commented Nov 4, 2024

I'm not too sure if we need to rely on the emulator for that. All the new tests I’ve added are direct ports from the backend, which include the expected binary output.

In addition, each implementation should be self-sufficient when it comes to verification as long as it's able to verify round-tripping between different formats. This is true for both SDK and backend tests.


In reply to: 2453253955

neildsh
neildsh previously approved these changes Nov 4, 2024
@adityasa
Copy link
Contributor

adityasa commented Nov 4, 2024

Does it make sense to put these changes behind a flag to minimize issues (since binary is on by default)? #Resolved

@sboshra
Copy link
Contributor Author

sboshra commented Nov 5, 2024

For the JsonBinaryWriter, the EnableUniformNumberArray option must be specified to enable writing uniform number arrays; without this, the behavior remains unchanged. For JsonReader and JsonNavigator, there is no option to disable the reading of uniform number arrays, and I don't believe such a feature is needed. I have significantly expanded the test coverage, including comprehensive round-trip testing, to ensure confidence in implementing these changes.


In reply to: 2455776367

adityasa
adityasa previously approved these changes Nov 5, 2024
Copy link
Contributor

@adityasa adityasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sboshra sboshra dismissed stale reviews from adityasa and neildsh via a253020 November 5, 2024 16:11
@adityasa adityasa added the auto-merge Enables automation to merge PRs label Nov 5, 2024
adityasa
adityasa previously approved these changes Nov 5, 2024
Copy link
Contributor

@adityasa adityasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@neildsh neildsh added the QUERY label Nov 5, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot merged commit 0018845 into master Nov 5, 2024
23 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the users/sboshra/BinaryEncodingUniformArrays branch November 5, 2024 19:19
kundadebdatta pushed a commit that referenced this pull request Nov 13, 2024
…rays (#4866)

## Description

Added full end-to-end support for writing and reading binary-encoded
uniform number arrays, as well as nested arrays of uniform number
arrays.

**Uniform Number Arrays**
A uniform number array is a JSON array where all items share the same
numeric type. The encoding supports the following numeric types:
- **Int8**: Signed 1-byte integer (-128 to 127)
- **UInt8**: Unsigned 1-byte integer (0 to 255)
- **Int16**: Signed 2-byte integer
- **Int32**: Signed 4-byte integer
- **Int64**: Signed 8-byte integer
- **Float16**: 2-byte floating-point value (currently unsupported)
- **Float32**: 4-byte floating-point value
- **Float64**: 8-byte floating-point value

Uniform number arrays are represented by these new type markers:
- **ArrNumC1**: Uniform number array with a 1-byte item count
- **ArrNumC2**: Uniform number array with a 2-byte item count

Both type markers are encoded as follows:
`| Type marker | Item type marker | Item count |`

To maintain backward compatibility, writing uniform number arrays is
controlled via the `EnableNumberArrays `write option. When enabled, at
the end of writing an array, the writer checks if all values are
numeric. It identifies the smallest numeric type that fits all values
and compares the length of the uniform number array to the regular
array. If the new length is less than or equal to the old one, the array
is converted to a uniform number array.

**Arrays of Uniform Number Arrays**
This encoding enhancement allows for encoding multiple uniform number
arrays with the same underlying numeric type and item count into a
single contiguous array of numbers. The items in all arrays are preceded
by a prefix indicating the common array encoding and the number of
encoded arrays.

Arrays of uniform number arrays are supported by these two new
type-markers:

- **ArrArrNumC1C1**: Array of 1-byte item count of common uniform number
arrays with 1-byte item count.
- **ArrArrNumC2C2**: Array of 2-byte item count of common uniform number
arrays with 2-byte item count.

Both new values are encoded as follows:
`| Type-marker | Array type-marker | Number type-marker | Number item
count | Array item count |`

Similar to uniform number arrays, the writing of arrays of uniform
number arrays is conditional on the `EnableNumberArrays` write option
being specified. This ensures backward compatibility with readers and
navigators that do not yet support this encoding.

**JSON Serialization Testing**

- Introduced a new set of tests for both uniform number arrays and
nested arrays of uniform number arrays.
- Enhanced the `JsonToken` class to support representation of uniform
number array tokens.
- Updated `JsonWriterTest` to include additional validation. This now
not only checks the expected output but also verifies round-trip
consistency across different formats and write options for all three
rewrite scenarios: JSON Navigator, JSON Reader - Write All, and JSON
Reader - Write Current Token.


## Type of change

Please delete options that are not relevant.

- [ ] New feature (non-breaking change which adds functionality)

## Closing issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs QUERY
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants