Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create class for reading Json files in chunks #5530

Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d1e4d8a
Moved files over and addressed some PR comments
jgonz120 Dec 5, 2023
9ec0869
added comment
jgonz120 Dec 5, 2023
cfa2169
switched to true and false strings
jgonz120 Dec 5, 2023
c1753f9
Added ctr to specify buffer for testing purposes.
jgonz120 Dec 5, 2023
cc2ad30
remove commented code
jgonz120 Dec 6, 2023
32ec713
switch to use Utf8 preamble for BOM
jgonz120 Dec 6, 2023
a9940e9
Create method for checking complete
jgonz120 Dec 6, 2023
6f87583
combined code for ReadStringArray
jgonz120 Dec 7, 2023
0f75860
Updated buffer size to match STJ's default buffer size
jgonz120 Dec 7, 2023
5c4269a
Switch Utf8JsonStreamReader to be disposable.
jgonz120 Dec 7, 2023
c469899
Switch to read the value for numbers into a string directly
jgonz120 Dec 7, 2023
d0f9f5e
revert back to using private var for utf8Bom
jgonz120 Dec 7, 2023
d3e6ab8
Remove ReadStringArrayAsList
jgonz120 Dec 7, 2023
2d7cba8
Avoid referencing buffer after returning
jgonz120 Dec 7, 2023
4192d9b
Actually avoid referencing _buffer after returning
jgonz120 Dec 7, 2023
f67239b
Update how buffers are fed into Utf8JsonReader to avoid feeding extra…
jgonz120 Dec 8, 2023
ca6e1d7
remove extra line
jgonz120 Dec 8, 2023
fa9639d
Reverted back to using try get int for ReadTokenAsString
jgonz120 Dec 11, 2023
997f199
Update src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs
jgonz120 Dec 11, 2023
3e4146c
Remove ValueTextEquals taking in string
jgonz120 Dec 11, 2023
b403ed8
Switched to Skip instead of TrySkip
jgonz120 Dec 11, 2023
a1c4844
Update src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs
jgonz120 Dec 11, 2023
4ff0f7e
Added some unit tests
jgonz120 Dec 11, 2023
a9884ec
merge
jgonz120 Dec 11, 2023
7a467d5
fix Bom
jgonz120 Dec 11, 2023
86d3524
Switched to using Moq
jgonz120 Dec 11, 2023
c559e69
Update src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs
jgonz120 Dec 11, 2023
74b2e54
loop through stream when reading to ensure reading full bytes or to t…
jgonz120 Dec 11, 2023
0c05eb8
update signature comment
jgonz120 Dec 12, 2023
a233b40
Switched stream back to field and supress warning
jgonz120 Dec 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions src/NuGet.Core/NuGet.ProjectModel/Utf8JsonReaderExtensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// Copyright (c) .NET Foundation. All rights reserved.
// Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.

using System;
using System.Buffers;
using System.Text;
using System.Text.Json;

namespace NuGet.ProjectModel
{
internal static class Utf8JsonReaderExtensions
{
private static readonly UTF8Encoding Utf8Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true);

internal static string ReadTokenAsString(this ref Utf8JsonReader reader)
{
switch (reader.TokenType)
{
case JsonTokenType.True:
return bool.TrueString;
case JsonTokenType.False:
return bool.FalseString;
case JsonTokenType.Number:
var span = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
#if NETCOREAPP
return Utf8Encoding.GetString(span);
#else
return Utf8Encoding.GetString(span.ToArray());
#endif
case JsonTokenType.String:
return reader.GetString();
case JsonTokenType.None:
case JsonTokenType.Null:
return null;
default:
throw new InvalidCastException();
}
}
}
}
274 changes: 274 additions & 0 deletions src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
// Copyright (c) .NET Foundation. All rights reserved.
// Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.

using System;
using System.Buffers;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.Json;

namespace NuGet.ProjectModel
{
/// <summary>
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
/// This struct is used to read over a memeory stream in parts, in order to avoid reading the entire stream into memory.
/// It functions as a wrapper around <see cref="Utf8JsonStreamReader"/>, while maintaining a stream and a buffer to read from.
/// </summary>
internal ref struct Utf8JsonStreamReader
Copy link
Contributor

@kartheekp-ms kartheekp-ms Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utf8JsonStreamReader struct appears to have dual responsibilities violating single responsibility principle.

Stream Wrapper: The struct acts as a wrapper around a stream, specifically for the purpose of reading it in chunks rather than loading the entire stream into memory.

Utf8JsonReader Wrapper: The struct also wraps the functionality of Utf8JsonReader. This includes functionalities like reading various data types (strings, integers, booleans) from JSON, handling different JSON token types, and managing the state of the JSON reader.

How about splitting it into 2 structs, one handles stream buffer, resizing the buffer etc., and another one wraps UTF8JsonReader methods such as GetString(), GetBoolean() etc.,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a large change if we want to do this I would like to do it in a new PR after the last one is complete.

{
private static readonly char[] DelimitedStringDelimiters = new char[] { ' ', ',' };
nkolev92 marked this conversation as resolved.
Show resolved Hide resolved
private const int BufferSizeDefault = 16 * 1024;
private ReadOnlySpan<byte> _utf8Bom = new byte[] { 0xEF, 0xBB, 0xBF };
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
private Utf8JsonReader _reader;
// The buffer is used to read from the stream in chunks.
private byte[] _buffer;
private bool _disposed;

internal Utf8JsonStreamReader(Stream stream) : this(stream, ArrayPool<byte>.Shared.Rent(BufferSizeDefault))
{

}
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved

internal Utf8JsonStreamReader(Stream stream, byte[] buffer)
{
if (stream is null)
{
throw new ArgumentNullException(nameof(stream));
}

_disposed = false;
Stream = stream;
_buffer = buffer;
Stream.Read(_buffer, 0, 3);
var offset = 0;
if (!_utf8Bom.SequenceEqual(_buffer.AsSpan(0, 3)))
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
offset = 3;
}
var blocksRead = Stream.Read(_buffer, offset, _buffer.Length - offset);

_reader = new Utf8JsonReader(_buffer.AsSpan(0, blocksRead + offset), isFinalBlock: blocksRead + offset < _buffer.Length, state: new JsonReaderState(new JsonReaderOptions
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
AllowTrailingCommas = true,
CommentHandling = JsonCommentHandling.Skip,
}));
_reader.Read();
}

private Stream Stream { get; set; }
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved

internal bool IsFinalBlock => _reader.IsFinalBlock;

internal JsonTokenType TokenType => _reader.TokenType;

internal int BufferSize()
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
ThrowExceptionIfDisposed();

return _buffer.Length;
}

internal bool ValueTextEquals(ReadOnlySpan<byte> utf8Text) => _reader.ValueTextEquals(utf8Text);

internal bool ValueTextEquals(string text) => _reader.ValueTextEquals(text);
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved

internal bool TryGetInt32(out int value) => _reader.TryGetInt32(out value);

internal string GetString() => _reader.GetString();

internal bool GetBoolean() => _reader.GetBoolean();

internal int GetInt32() => _reader.GetInt32();

internal bool Read()
{
ThrowExceptionIfDisposed();

bool wasRead;
while (!(wasRead = _reader.Read()) && !_reader.IsFinalBlock)
{
GetMoreBytesFromStream();
}
Comment on lines +85 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, which I currently don't understand, the STJ implementation raises an exception when IsFinalBlock is set to true. Therefore, it would be better to check this value first before invoking the underlying Read method. Another advantage is that reading IsFinalBlock is an O(1) operation.

Suggested change
while (!(wasRead = _reader.Read()) && !_reader.IsFinalBlock)
{
GetMoreBytesFromStream();
}
while (!_reader.IsFinalBlock && !(wasRead = _reader.Read()))
{
GetMoreBytesFromStream();
}

https://github.com/dotnet/runtime/blob/3a5bea5d60ea04b897ac968a358ca99a1189d368/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs#L269-L289

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we check for the final block first then we won't all the read at all. The exception you're seeing there means that the Utf8JsonReader was told it has all the data but the current property is none. None meaning that there is no JSON data in the reader. If we want to check for the scenario we can, I would think it should be in the constructor though not here.

return wasRead;
}

internal bool TrySkip()
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
ThrowExceptionIfDisposed();

bool wasSkipped;
while (!(wasSkipped = _reader.TrySkip()) && !_reader.IsFinalBlock)
{
GetMoreBytesFromStream();
}
return wasSkipped;
}

internal string ReadNextTokenAsString()
{
ThrowExceptionIfDisposed();

if (Read())
{
return _reader.ReadTokenAsString();
}

return null;
zivkan marked this conversation as resolved.
Show resolved Hide resolved
}

internal string GetCurrentBufferAsString()
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
ThrowExceptionIfDisposed();

return Encoding.UTF8.GetString(_buffer);
}

internal IList<string> ReadStringArrayAsIList(IList<string> strings = null)
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
if (TokenType == JsonTokenType.StartArray)
{
while (Read() && TokenType != JsonTokenType.EndArray)
{
string value = _reader.ReadTokenAsString();

strings = strings ?? new List<string>();
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved

strings.Add(value);
}
}
return strings;
}

internal IReadOnlyList<string> ReadDelimitedString()
{
ThrowExceptionIfDisposed();

if (Read())
{
switch (TokenType)
{
case JsonTokenType.String:
var value = GetString();

return value.Split(DelimitedStringDelimiters, StringSplitOptions.RemoveEmptyEntries);

default:
var invalidCastException = new InvalidCastException();
throw new JsonException(invalidCastException.Message, invalidCastException);
}
}

return null;
}

internal bool ReadNextTokenAsBoolOrFalse()
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
{
ThrowExceptionIfDisposed();

if (Read() && (TokenType == JsonTokenType.False || TokenType == JsonTokenType.True))
{
return GetBoolean();
}
return false;
}

internal IReadOnlyList<string> ReadNextStringOrArrayOfStringsAsReadOnlyList()
{
ThrowExceptionIfDisposed();

if (Read())
{
switch (_reader.TokenType)
{
case JsonTokenType.String:
return new[] { (string)_reader.GetString() };

case JsonTokenType.StartArray:
return ReadStringArrayAsReadOnlyListFromArrayStart();

case JsonTokenType.StartObject:
return null;
}
}

return null;
}

internal IReadOnlyList<string> ReadStringArrayAsReadOnlyListFromArrayStart()
{
ThrowExceptionIfDisposed();

List<string> strings = null;

while (Read() && _reader.TokenType != JsonTokenType.EndArray)
{
string value = _reader.ReadTokenAsString();

strings = strings ?? new List<string>();

strings.Add(value);
}

return (IReadOnlyList<string>)strings ?? Array.Empty<string>();
}

// This function is called when Read() returns false
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
private void GetMoreBytesFromStream()
{
int leftoverBytes = 0;
int bytesReadFromStream;

if (_reader.BytesConsumed < _buffer.Length)
{
// If the number of bytes consumed by the reader is less than the buffer size then we have leftover bytes that need to be shifted
var oldBuffer = _buffer;
ReadOnlySpan<byte> leftover = oldBuffer.AsSpan((int)_reader.BytesConsumed);
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved

var returnOldBuffer = false;

// If the leftover bytes are the same as the buffer size then we are at capacity and need to double the buffer size
if (leftover.Length == _buffer.Length)
{
returnOldBuffer = true;
_buffer = ArrayPool<byte>.Shared.Rent(_buffer.Length * 2);
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
}

//Copy the leftover bytes to the beginning of the new buffer
leftover.CopyTo(_buffer);

// Read the rest of the bytes from the stream, keeping track of the number of bytes that need to be processed in the new buffer
leftoverBytes = leftover.Length;
bytesReadFromStream = Stream.Read(_buffer, leftover.Length, _buffer.Length - leftover.Length);
if (returnOldBuffer)
{
ArrayPool<byte>.Shared.Return(oldBuffer);
}
}
else
{
bytesReadFromStream = Stream.Read(_buffer, 0, _buffer.Length);
}
_reader = new Utf8JsonReader(_buffer.AsSpan(0, leftoverBytes + bytesReadFromStream), isFinalBlock: leftoverBytes + bytesReadFromStream < _buffer.Length, _reader.CurrentState);
jgonz120 marked this conversation as resolved.
Show resolved Hide resolved
}

public void Dispose()
{
if (!_disposed)
{
_disposed = true;
byte[] toReturn = _buffer;
_buffer = null!;
ArrayPool<byte>.Shared.Return(toReturn);
}
}

private void ThrowExceptionIfDisposed()
{
if (_disposed)
{
throw new ObjectDisposedException(nameof(Utf8JsonStreamReader));
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// Copyright (c) .NET Foundation. All rights reserved.
// Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.
namespace NuGet.ProjectModel
{
/// <summary>
/// An abstract class that defines a function for reading a <see cref="Utf8JsonStreamReader"/> into a <typeparamref name="T"/>
/// </summary>
/// <typeparam name="T"></typeparam>
internal abstract class Utf8JsonStreamReaderConverter<T>
{
public abstract T Read(ref Utf8JsonStreamReader reader);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,6 @@ public void LockFileFormat_ReadsLockFileWithNoTools()

var target = lockFile.Targets.Single();
Assert.Equal(NuGetFramework.Parse("dotnet"), target.TargetFramework);

var runtimeTargetLibrary = target.Libraries.Single();
Assert.Equal("System.Runtime", runtimeTargetLibrary.Name);
Assert.Equal(NuGetVersion.Parse("4.0.20-beta-22927"), runtimeTargetLibrary.Version);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
// Copyright (c) .NET Foundation. All rights reserved.
// Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.

using System.Text;
using System.Text.Json;
using Xunit;

namespace NuGet.ProjectModel.Test
{
[UseCulture("")] // Fix tests failing on systems with non-English locales
public class Utf8JsonReaderExtensionsTests
{
[Theory]
[InlineData("null", null)]
[InlineData("true", "True")]
[InlineData("false", "False")]
[InlineData("-2", "-2")]
[InlineData("9223372036854775807", "9223372036854775807")]
[InlineData("3.14", "3.14")]
[InlineData("\"b\"", "b")]
public void ReadTokenAsString_WhenValueIsConvertibleToString_ReturnsValueAsString(
string value,
string expectedResult)
{
var json = $"{{\"a\":{value}}}";
var encodedBytes = Encoding.UTF8.GetBytes(json);
var reader = new Utf8JsonReader(encodedBytes);
reader.Read();
reader.Read();
reader.Read();
string actualResult = reader.ReadTokenAsString();
Assert.Equal(expectedResult, actualResult);
}
}
}
Loading