Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonSerializer deserialization using overloads taking Utf8JsonReader are unexpectedly slower than equivalents taking string #99674

Open
cosborne83 opened this issue Mar 13, 2024 · 5 comments
Labels
area-System.Text.Json help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@cosborne83
Copy link

Description

Whilst testing JSON deserialization performance of moderately-sized (10-15KiB) UTF-8 data with the latest System.Text.Json NuGet package (8.0.3), I found that the JsonSerializer.Deserialize(ref Utf8JsonReader, ...) overloads seem to be unexpectedly much slower than those taking the data as string. The difference is such that it can be faster to convert the underlying UTF-8 bytes to a string first and then use a JsonSerializer.Deserialize(string, ...) overload instead.

The problem seems most pronounced when the JSON data contains string values with many escape sequences (escaped double-quotes in my testing, but others may also exhibit the issue).

Configuration

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19044.4046/21H2/November2021Update)
13th Gen Intel Core i9-13900H, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Regression?

I don't believe so.

Data

The following code was used to perform the benchmark. I have omitted the JSON data itself from the snippet below due to its size, but can provide it separately.

[MemoryDiagnoser]
public partial class JsonDeserializeTests
{
    private static readonly JsonTypeInfo<DataRecord> DataRecordTypeInfo = DataRecordContext.Default.DataRecord;
    private static readonly string JsonString = """...[omitted for brevity]...""";
    private static readonly ReadOnlyMemory<byte> JsonUtf8 = Encoding.UTF8.GetBytes(JsonString).ToArray();

    [Benchmark(Baseline = true)]
    public DataRecord? DeserializeSpan()
    {
        return JsonSerializer.Deserialize(JsonUtf8.Span, DataRecordTypeInfo);
    }

    [Benchmark]
    public DataRecord? DeserializeStringFromUtf8()
    {
        return JsonSerializer.Deserialize(Encoding.UTF8.GetString(JsonUtf8.Span), DataRecordTypeInfo);
    }

    [Benchmark]
    public DataRecord? DeserializeJsonReader()
    {
        var reader = new Utf8JsonReader(JsonUtf8.Span);
        return JsonSerializer.Deserialize(ref reader, DataRecordTypeInfo);
    }
}

[JsonSerializable(typeof(DataRecord))]
public partial class DataRecordContext : JsonSerializerContext
{
}

public class DataRecord(string? u825587097135, string u30475139121, string c876626645106780540461092564)
{
    public string? U825587097135 { get; } = u825587097135;
    public string? U30475139121 { get; } = u30475139121;
    public string? C876626645106780540461092564 { get; } = c876626645106780540461092564;
}
Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
DeserializeSpan 16.37 μs 0.155 μs 0.137 μs 1.00 0.00 0.0305 664 B 1.00
DeserializeStringFromUtf8 18.30 μs 0.152 μs 0.135 μs 1.12 0.01 2.0447 25712 B 38.72
DeserializeJsonReader 32.05 μs 0.384 μs 0.359 μs 1.96 0.03 - 664 B 1.00

Analysis

Cursory profiling suggests that the overhead of the Deserialize(ref Utf8JsonReader, ...) overloads stems from the calls to GetReaderScopedToNextValue(), which performs a reader.TrySkip() that would seem to result in essentially parsing the JSON data twice - first while skipping the current array/object in GetReaderScopedToNextValue, then again when actually deserializing the data. This perhaps explains why it's approximately twice as slow as the baseline in the above results.

Unfortunately all Deserialize(ref Utf8JsonReader, ...) overloads appear to call GetReaderScopedToNextValue, so this overhead cannot be avoided, even if the caller knows that the reader is already appropriately "scoped".

@cosborne83 cosborne83 added the tenet-performance Performance related issue label Mar 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 13, 2024
@steveharter
Copy link
Member

I believe GetReaderScopedToNextValue() along with TrySkip() was intended to be resilient for an incorrectly positioned reader or perhaps a bad custom converter for a JSON object or array. It creates a new reader scoped to just the JSON for the current object\array to prevent any read-after-value issues for example and ensures the reader has enough data.

The serializer, for custom converter validation, is done in a performant way:

internal void VerifyRead(JsonTokenType tokenType, int depth, long bytesConsumed, bool isValueConverter, ref Utf8JsonReader reader)
. This logic above does minimal checks for example to ensure when an object is done being read, the current reader is positioned on JsonTokenType.EndObject and the current depth is correct.

Perhaps the TrySkip() can be replaced with logic similar to the custom converter validation above + any existing logic that resets the reader if there is an exception or out-of-data.

@eiriktsarpalis
Copy link
Member

The current behavior is very much intentional, but IIRC this is mostly forced because of assumptions that the serializer is making about input data (must be a self-contained JSON value without trailing data). In principle I think it might be possible to avoid some of that double parsing by reworking the core serialization routines (allowing trailing data, adding extra checks ensuring deserialization does not escape its current scope) but that would require benchmarking to ensure that it does result in net perf improvements.

@eiriktsarpalis eiriktsarpalis added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Mar 19, 2024
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Mar 19, 2024
@cosborne83
Copy link
Author

@eiriktsarpalis - The scenario that caused me to encounter this unexpected performance difference was the need to deserialize from a ReadOnlySequence<byte>, for which there are no dedicated Deserialize() overloads as there are for e.g. ReadOnlySpan<byte>, necessitating first constructing a Utf8JsonReader from the sequence, and then using the slower overload instead.

Given the apparent difficulty in changing the behaviour of the Deserialize(ref Utf8JsonReader, ...) overload to reduce the performance impact, perhaps adding Deserialize(ReadOnlySequence<byte>, ...) overloads to JsonSerializer analogous to the existing ReadOnlySpan<byte> versions would be possible as a simpler/lower-risk alternative?

Unless perhaps there's already a way to deserialize from a ReadOnlySequence<byte> that I've missed that doesn't hit this slower path?

@eiriktsarpalis
Copy link
Member

That is one possibility, FWIW ReadOnlySequence was never a first-class citizen in the serializer layer (in many cases converters assume the underlying data is a span) however #68586 is going to change this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.Json help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants