Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON continuation tests #42393

Merged
merged 25 commits into from
Nov 2, 2020
Merged

JSON continuation tests #42393

merged 25 commits into from
Nov 2, 2020

Conversation

devsko
Copy link
Contributor

@devsko devsko commented Sep 17, 2020

See #42158
comment and comment and comment

Test only

Test several scenarios in System.Text.Json where deserialization has to continue after the next chunk of data is available.

- Test continuation at every position inside the tested object
- Many member with primitive and nullable types
- One more level of nested object
- All combinations of class/struct for tested and nested object
- tested and nested object with parametrized ctor for some properties
Tweak the payload and expect `JsonException`
@devsko
Copy link
Contributor Author

devsko commented Sep 17, 2020

@layomia This is not yet ready for review. I hope to get done what I have in mind tomorrow.

@devsko devsko marked this pull request as ready for review September 17, 2020 21:39
@devsko
Copy link
Contributor Author

devsko commented Sep 17, 2020

Is there anything else about continuation / chunked buffer?

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 17, 2020

Is there anything else about continuation / chunked buffer?

Since you asked, how about continuation where the token being split isn't null but other types, like whitespace (\r\n), or true/false boolean, or some large string token, or a number?

In the Utf8JsonReader specific tests, a lot of those scenarios are covered, by building ROSequence with segments splits across a whole variety of locations within the JSON, and with partial data/state to test re-entrancy. But most of those are for relatively small payloads (due to test time), and the async Deserializer API for streams could benefit from that type of extensive coverage too :)

// TestCaseType is only used to give the json strings a descriptive name.
[Theory]
// Skipping large JSON since slicing them (O(n^2)) is too slow.
[MemberData(nameof(SmallTestCases))]
public static void TestJsonReaderUtf8SegmentSizeOne(bool compactData, TestCaseType type, string jsonString)
{
ReadPartialSegmentSizeOne(compactData, type, jsonString);
}
// TestCaseType is only used to give the json strings a descriptive name.
[Theory]
[MemberData(nameof(LargeTestCases))]
public static void TestJsonReaderLargeUtf8SegmentSizeOne(bool compactData, TestCaseType type, string jsonString)
{
// Skipping really large JSON on Browser to prevent OOM
if (PlatformDetection.IsBrowser && (type == TestCaseType.Json40KB || type == TestCaseType.Json400KB || type == TestCaseType.ProjectLockJson))
{
return;
}
ReadFullySegmentSizeOne(compactData, type, jsonString);
}
// TestCaseType is only used to give the json strings a descriptive name.
[Theory]
[OuterLoop]
[MemberData(nameof(LargeTestCases))]
public static void TestJsonReaderLargestUtf8SegmentSizeOne(bool compactData, TestCaseType type, string jsonString)
{
// Skipping really large JSON since slicing them (O(n^2)) is too slow.
if (type == TestCaseType.Json40KB || type == TestCaseType.Json400KB || type == TestCaseType.ProjectLockJson)
{
return;
}
ReadPartialSegmentSizeOne(compactData, type, jsonString);
}
private static void ReadPartialSegmentSizeOne(bool compactData, TestCaseType type, string jsonString)
{
// Remove all formatting/indendation
if (compactData)
{
jsonString = JsonTestHelper.GetCompactString(jsonString);
}
byte[] dataUtf8 = Encoding.UTF8.GetBytes(jsonString);
Stream stream = new MemoryStream(dataUtf8);
TextReader reader = new StreamReader(stream, Encoding.UTF8, false, 1024, true);
string expectedStr = JsonTestHelper.NewtonsoftReturnStringHelper(reader);
ReadOnlySequence<byte> sequence = JsonTestHelper.GetSequence(dataUtf8, 1);
for (int j = 0; j < dataUtf8.Length; j++)
{
var utf8JsonReader = new Utf8JsonReader(sequence.Slice(0, j), isFinalBlock: false, default);
byte[] resultSequence = JsonTestHelper.ReaderLoop(dataUtf8.Length, out int length, ref utf8JsonReader);
string actualStrSequence = Encoding.UTF8.GetString(resultSequence, 0, length);
long consumed = utf8JsonReader.BytesConsumed;
utf8JsonReader = new Utf8JsonReader(sequence.Slice(consumed), isFinalBlock: true, utf8JsonReader.CurrentState);
resultSequence = JsonTestHelper.ReaderLoop(dataUtf8.Length, out length, ref utf8JsonReader);
actualStrSequence += Encoding.UTF8.GetString(resultSequence, 0, length);
string message = $"Expected consumed: {dataUtf8.Length - consumed}, Actual consumed: {utf8JsonReader.BytesConsumed}, Index: {j}";
Assert.True(dataUtf8.Length - consumed == utf8JsonReader.BytesConsumed, message);
Assert.Equal(expectedStr, actualStrSequence);
}
}
private static void ReadFullySegmentSizeOne(bool compactData, TestCaseType type, string jsonString)
{
// Remove all formatting/indendation
if (compactData)
{
jsonString = JsonTestHelper.GetCompactString(jsonString);
}
byte[] dataUtf8 = Encoding.UTF8.GetBytes(jsonString);
Stream stream = new MemoryStream(dataUtf8);
TextReader reader = new StreamReader(stream, Encoding.UTF8, false, 1024, true);
string expectedStr = JsonTestHelper.NewtonsoftReturnStringHelper(reader);
ReadOnlySequence<byte> sequence = JsonTestHelper.GetSequence(dataUtf8, 1);
var utf8JsonReader = new Utf8JsonReader(sequence, isFinalBlock: true, default);
byte[] resultSequence = JsonTestHelper.ReaderLoop(dataUtf8.Length, out int length, ref utf8JsonReader);
string actualStrSequence = Encoding.UTF8.GetString(resultSequence, 0, length);
Assert.Equal(expectedStr, actualStrSequence);
}
[Theory]
[MemberData(nameof(SmallTestCases))]
public static void TestPartialJsonReaderMultiSegment(bool compactData, TestCaseType type, string jsonString)
{
_ = type;
// Remove all formatting/indendation
if (compactData)
{
jsonString = JsonTestHelper.GetCompactString(jsonString);
}
byte[] dataUtf8 = Encoding.UTF8.GetBytes(jsonString);
ReadOnlyMemory<byte> dataMemory = dataUtf8;
List<ReadOnlySequence<byte>> sequences = JsonTestHelper.GetSequences(dataMemory);
for (int i = 0; i < sequences.Count; i++)
{
ReadOnlySequence<byte> sequence = sequences[i];
var json = new Utf8JsonReader(sequence, isFinalBlock: true, default);
while (json.Read())
;
Assert.Equal(sequence.Length, json.BytesConsumed);
Assert.True(sequence.Slice(json.Position).IsEmpty);
}
for (int i = 0; i < sequences.Count; i++)
{
ReadOnlySequence<byte> sequence = sequences[i];
var json = new Utf8JsonReader(sequence);
while (json.Read())
;
Assert.Equal(sequence.Length, json.BytesConsumed);
Assert.True(sequence.Slice(json.Position).IsEmpty);
}
}
[Theory]
[OuterLoop]
[MemberData(nameof(SmallTestCases))]
public static void TestPartialJsonReaderSlicesMultiSegment(bool compactData, TestCaseType type, string jsonString)
{
_ = type;
// Remove all formatting/indendation
if (compactData)
{
jsonString = JsonTestHelper.GetCompactString(jsonString);
}
byte[] dataUtf8 = Encoding.UTF8.GetBytes(jsonString);
ReadOnlyMemory<byte> dataMemory = dataUtf8;
List<ReadOnlySequence<byte>> sequences = JsonTestHelper.GetSequences(dataMemory);
for (int i = 0; i < sequences.Count; i++)
{
ReadOnlySequence<byte> sequence = sequences[i];
for (int j = 0; j < dataUtf8.Length; j++)
{
var json = new Utf8JsonReader(sequence.Slice(0, j), isFinalBlock: false, default);
while (json.Read())
;
long consumed = json.BytesConsumed;
JsonReaderState jsonState = json.CurrentState;
byte[] consumedArray = sequence.Slice(0, consumed).ToArray();
Assert.Equal(consumedArray, sequence.Slice(0, json.Position).ToArray());
json = new Utf8JsonReader(sequence.Slice(consumed), isFinalBlock: true, jsonState);
while (json.Read())
;
Assert.Equal(dataUtf8.Length - consumed, json.BytesConsumed);
}
}
}

@devsko
Copy link
Contributor Author

devsko commented Sep 18, 2020

how about continuation where the token being split isn't null but other types, like whitespace (\r\n), or true/false boolean, or some large string token, or a number?

All tests here split the payload once on every single character. Thus all tokens are tested how they work when split into 2 chunks including all mentioned examples except whitespaces. I will add them by enabling WriteIndented

Added dictionary test
@devsko
Copy link
Contributor Author

devsko commented Sep 18, 2020

I'd say that's it. Thanks for your suggestions, help and reviews. Really appreciated. Feel free to change whatever you want - or wait 2 weeks. See you - peace

@stephentoub
Copy link
Member

@devsko, thanks for your efforts here. Are you still working on this?

Copy link
Contributor

@layomia layomia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - @devsko we can merge this once conflicts and #42393 (comment) are resolved.

@layomia layomia added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Oct 29, 2020
@layomia layomia self-assigned this Nov 2, 2020
@layomia layomia removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 2, 2020
@layomia
Copy link
Contributor

layomia commented Nov 2, 2020

I pushed a commit to finish this PR.

@layomia layomia merged commit e691753 into dotnet:master Nov 2, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
@devsko devsko deleted the streamtest branch March 5, 2021 18:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants