JsonSerializer deserialization using overloads taking Utf8JsonReader are unexpectedly slower than equivalents taking string #99674

cosborne83 · 2024-03-13T11:27:35Z

Description

Whilst testing JSON deserialization performance of moderately-sized (10-15KiB) UTF-8 data with the latest System.Text.Json NuGet package (8.0.3), I found that the JsonSerializer.Deserialize(ref Utf8JsonReader, ...) overloads seem to be unexpectedly much slower than those taking the data as string. The difference is such that it can be faster to convert the underlying UTF-8 bytes to a string first and then use a JsonSerializer.Deserialize(string, ...) overload instead.

The problem seems most pronounced when the JSON data contains string values with many escape sequences (escaped double-quotes in my testing, but others may also exhibit the issue).

Configuration

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19044.4046/21H2/November2021Update)
13th Gen Intel Core i9-13900H, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Regression?

I don't believe so.

Data

The following code was used to perform the benchmark. I have omitted the JSON data itself from the snippet below due to its size, but can provide it separately.

[MemoryDiagnoser]
public partial class JsonDeserializeTests
{
    private static readonly JsonTypeInfo<DataRecord> DataRecordTypeInfo = DataRecordContext.Default.DataRecord;
    private static readonly string JsonString = """...[omitted for brevity]...""";
    private static readonly ReadOnlyMemory<byte> JsonUtf8 = Encoding.UTF8.GetBytes(JsonString).ToArray();

    [Benchmark(Baseline = true)]
    public DataRecord? DeserializeSpan()
    {
        return JsonSerializer.Deserialize(JsonUtf8.Span, DataRecordTypeInfo);
    }

    [Benchmark]
    public DataRecord? DeserializeStringFromUtf8()
    {
        return JsonSerializer.Deserialize(Encoding.UTF8.GetString(JsonUtf8.Span), DataRecordTypeInfo);
    }

    [Benchmark]
    public DataRecord? DeserializeJsonReader()
    {
        var reader = new Utf8JsonReader(JsonUtf8.Span);
        return JsonSerializer.Deserialize(ref reader, DataRecordTypeInfo);
    }
}

[JsonSerializable(typeof(DataRecord))]
public partial class DataRecordContext : JsonSerializerContext
{
}

public class DataRecord(string? u825587097135, string u30475139121, string c876626645106780540461092564)
{
    public string? U825587097135 { get; } = u825587097135;
    public string? U30475139121 { get; } = u30475139121;
    public string? C876626645106780540461092564 { get; } = c876626645106780540461092564;
}

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
DeserializeSpan	16.37 μs	0.155 μs	0.137 μs	1.00	0.00	0.0305	664 B	1.00
DeserializeStringFromUtf8	18.30 μs	0.152 μs	0.135 μs	1.12	0.01	2.0447	25712 B	38.72
DeserializeJsonReader	32.05 μs	0.384 μs	0.359 μs	1.96	0.03	-	664 B	1.00

Analysis

Cursory profiling suggests that the overhead of the Deserialize(ref Utf8JsonReader, ...) overloads stems from the calls to GetReaderScopedToNextValue(), which performs a reader.TrySkip() that would seem to result in essentially parsing the JSON data twice - first while skipping the current array/object in GetReaderScopedToNextValue, then again when actually deserializing the data. This perhaps explains why it's approximately twice as slow as the baseline in the above results.

Unfortunately all Deserialize(ref Utf8JsonReader, ...) overloads appear to call GetReaderScopedToNextValue, so this overhead cannot be avoided, even if the caller knows that the reader is already appropriately "scoped".

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-03-13T11:27:53Z

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

steveharter · 2024-03-13T19:47:06Z

I believe GetReaderScopedToNextValue() along with TrySkip() was intended to be resilient for an incorrectly positioned reader or perhaps a bad custom converter for a JSON object or array. It creates a new reader scoped to just the JSON for the current object\array to prevent any read-after-value issues for example and ensures the reader has enough data.

The serializer, for custom converter validation, is done in a performant way:

runtime/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverterOfT.cs

Line 503 in 4cf19ee

    
           internal void VerifyRead(JsonTokenType tokenType, int depth, long bytesConsumed, bool isValueConverter, ref Utf8JsonReader reader)

. This logic above does minimal checks for example to ensure when an object is done being read, the current reader is positioned on JsonTokenType.EndObject and the current depth is correct.

Perhaps the TrySkip() can be replaced with logic similar to the custom converter validation above + any existing logic that resets the reader if there is an exception or out-of-data.

eiriktsarpalis · 2024-03-19T10:42:45Z

The current behavior is very much intentional, but IIRC this is mostly forced because of assumptions that the serializer is making about input data (must be a self-contained JSON value without trailing data). In principle I think it might be possible to avoid some of that double parsing by reworking the core serialization routines (allowing trailing data, adding extra checks ensuring deserialization does not escape its current scope) but that would require benchmarking to ensure that it does result in net perf improvements.

cosborne83 · 2024-03-19T13:22:24Z

@eiriktsarpalis - The scenario that caused me to encounter this unexpected performance difference was the need to deserialize from a ReadOnlySequence<byte>, for which there are no dedicated Deserialize() overloads as there are for e.g. ReadOnlySpan<byte>, necessitating first constructing a Utf8JsonReader from the sequence, and then using the slower overload instead.

Given the apparent difficulty in changing the behaviour of the Deserialize(ref Utf8JsonReader, ...) overload to reduce the performance impact, perhaps adding Deserialize(ReadOnlySequence<byte>, ...) overloads to JsonSerializer analogous to the existing ReadOnlySpan<byte> versions would be possible as a simpler/lower-risk alternative?

Unless perhaps there's already a way to deserialize from a ReadOnlySequence<byte> that I've missed that doesn't hit this slower path?

eiriktsarpalis · 2024-03-20T09:20:32Z

That is one possibility, FWIW ReadOnlySequence was never a first-class citizen in the serializer layer (in many cases converters assume the underlying data is a span) however #68586 is going to change this.

cosborne83 added the tenet-performance Performance related issue label Mar 13, 2024

dotnet-issue-labeler bot added the area-System.Text.Json label Mar 13, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 13, 2024

eiriktsarpalis added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Mar 19, 2024

eiriktsarpalis added this to the Future milestone Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JsonSerializer deserialization using overloads taking Utf8JsonReader are unexpectedly slower than equivalents taking string #99674

JsonSerializer deserialization using overloads taking Utf8JsonReader are unexpectedly slower than equivalents taking string #99674

cosborne83 commented Mar 13, 2024

dotnet-policy-service bot commented Mar 13, 2024

steveharter commented Mar 13, 2024

eiriktsarpalis commented Mar 19, 2024

cosborne83 commented Mar 19, 2024

eiriktsarpalis commented Mar 20, 2024

JsonSerializer deserialization using overloads taking Utf8JsonReader are unexpectedly slower than equivalents taking string #99674

JsonSerializer deserialization using overloads taking Utf8JsonReader are unexpectedly slower than equivalents taking string #99674

Comments

cosborne83 commented Mar 13, 2024

Description

Configuration

Regression?

Data

Analysis

dotnet-policy-service bot commented Mar 13, 2024

steveharter commented Mar 13, 2024

eiriktsarpalis commented Mar 19, 2024

cosborne83 commented Mar 19, 2024

eiriktsarpalis commented Mar 20, 2024