Skip to content

Add ability to stream large strings in Utf8JsonWriter/Utf8JsonReader #67337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anhadi2 opened this issue Mar 30, 2022 · 45 comments · Fixed by #111041
Open

Add ability to stream large strings in Utf8JsonWriter/Utf8JsonReader #67337

anhadi2 opened this issue Mar 30, 2022 · 45 comments · Fixed by #111041
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json in-pr There is an active PR which will close this issue when it is merged tenet-performance Performance related issue
Milestone

Comments

@anhadi2
Copy link

anhadi2 commented Mar 30, 2022

EDIT See #67337 (comment) for an API proposal.

Background and motivation

I have a requirement to write large binary content in json.
In order to do this, I need to encode it to base64 before writing.
The resultant json looks like this:

{
  "data": "large_base64_encoded_string"
}

I have a PipeReader using which I read bytes in loop and keep appending to a list of bytes.
I then convert the list into a byte array, then convert it to base64 string and use WriteStringValue to write it.

public void WriteBinaryContent(PipeReader reader, Utf8JsonWriter writer)
{
    writer.WriteStartObject();
    writer.WritePropertyName("data");
    byte[] byteArray = ReadBinaryData(reader).;
    string base64data = Convert.ToBase64String(byteArray);
    writer.WriteStringValue(base64data);
    writer.WriteEndObject();
}

public byte[] ReadBinaryData(PipeReader reader)
{
    List<byte> bytes = new List<byte>();
    while (reader.TryRead(out ReadResult result))
    {
        ReadOnlySequence<byte> buffer = result.Buffer;
        bytes.AddRange(buffer.ToArray());

        // Tell the PipeReader how much of the buffer has been consumed.
        reader.AdvanceTo(buffer.End);

        // Stop reading if there's no more data coming.
        if (result.IsCompleted)
        {
            break;
        }
    }

    // Mark the PipeReader as complete.
    await reader.Complete();
    byte[] byteArray = bytes.ToArray();
    return byteArray;
}

The problem with this approach is excessive memory consumption. We need to keep the whole binary content in memory, convert to base64 and then write it.

Memory consumption is critical when using Utf8JsonWriter in the override of JsonConverter.Write() method in a web application.

Instead I am proposing a way to stream large binary content.

API Proposal

namespace System.Text.Json
{
    public class Utf8JsonWriter : IAsyncDisposable, IDisposable
    {
        // This returns a stream on which binary data can be written.
        // It will encode it to base64 and write to output stream.
        public Stream CreateBinaryWriteStream();
    }
}

API Usage

public void WriteBinaryContent(PipeReader reader, Utf8JsonWriter writer)
{
    writer.WriteStartObject();
    writer.WritePropertyName("data");
    using (Stream binaryStream = writer.CreateBinaryWriteStream())
    {
        StreamBinaryData(reader, binaryStream);
        binaryStream.Flush();
    }
    writer.WriteEndObject();
}

public void StreamBinaryData(PipeReader reader, Stream stream)
{
    List<byte> bytes = new List<byte>();
    while (reader.TryRead(out ReadResult result))
    {
        ReadOnlySequence<byte> buffer = result.Buffer;
        byte[] byteArray = buffer.ToArray();
        stream.Write(byteArray, 0, byteArray.Length);
        stream.Flush();
        // Tell the PipeReader how much of the buffer has been consumed.
        reader.AdvanceTo(buffer.End);

        // Stop reading if there's no more data coming.
        if (result.IsCompleted)
        {
            break;
        }
    }

    // Mark the PipeReader as complete.
    await reader.Complete();
}

Alternative Designs

No response

Risks

No response

@anhadi2 anhadi2 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 30, 2022
@ghost ghost added area-System.IO untriaged New issue has not been triaged by the area owner labels Mar 30, 2022
@ghost
Copy link

ghost commented Mar 30, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

I have a requirement to write large binary content in json.
In order to do this, I need to encode it to base64 before writing.
The resultant json looks like this:

{
  "data": "large_base64_encoded_string"
}

I have a PipeReader using which I read bytes in loop and keep appending to a list of bytes.
I then convert the list into a byte array, then convert it to base64 string and use WriteStringValue to write it.

public void WriteBinaryContent(PipeReader reader, Utf8JsonWriter writer)
{
    writer.WriteStartObject();
    writer.WritePropertyName("data");
    byte[] byteArray = ReadBinaryData(reader).;
    string base64data = Convert.ToBase64String(byteArray);
    writer.WriteStringValue(base64data);
    writer.WriteEndObject();
}

public byte[] ReadBinaryData(PipeReader reader)
{
    List<byte> bytes = new List<byte>();
    while (reader.TryRead(out ReadResult result))
    {
        ReadOnlySequence<byte> buffer = result.Buffer;
        bytes.AddRange(buffer.ToArray());

        // Tell the PipeReader how much of the buffer has been consumed.
        reader.AdvanceTo(buffer.End);

        // Stop reading if there's no more data coming.
        if (result.IsCompleted)
        {
            break;
        }
    }

    // Mark the PipeReader as complete.
    await reader.Complete();
    byte[] byteArray = bytes.ToArray();
    return byteArray;
}

The problem with this approach is excessive memory consumption. We need to keep the whole binary content in memory, convert to base64 and then write it.

Instead I am proposing a way to stream large binary content.

API Proposal

namespace System.Text.Json
{
    public class Utf8JsonWriter : IAsyncDisposable, IDisposable
    {
        // This returns a stream on which binary data can be written.
        // It will encode it to base64 and write to output stream.
        public Stream CreateBinaryWriteStream();
    }
}

API Usage

public void WriteBinaryContent(PipeReader reader, Utf8JsonWriter writer)
{
    writer.WriteStartObject();
    writer.WritePropertyName("data");
    using (Stream binaryStream = writer.CreateBinaryWriteStream())
    {
        StreamBinaryData(reader, binaryStream);
        binaryStream.Flush();
    }
    writer.WriteEndObject();
}

public void StreamBinaryData(PipeReader reader, Stream stream)
{
    List<byte> bytes = new List<byte>();
    while (reader.TryRead(out ReadResult result))
    {
        ReadOnlySequence<byte> buffer = result.Buffer;
        byte[] byteArray = buffer.ToArray();
        stream.Write(byteArray, 0, byteArray.Length);
        stream.Flush();
        // Tell the PipeReader how much of the buffer has been consumed.
        reader.AdvanceTo(buffer.End);

        // Stop reading if there's no more data coming.
        if (result.IsCompleted)
        {
            break;
        }
    }

    // Mark the PipeReader as complete.
    await reader.Complete();
}

Alternative Designs

No response

Risks

No response

Author: anhadi2
Assignees: -
Labels:

api-suggestion, area-System.IO, untriaged

Milestone: -

@ghost
Copy link

ghost commented Mar 30, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

I have a requirement to write large binary content in json.
In order to do this, I need to encode it to base64 before writing.
The resultant json looks like this:

{
  "data": "large_base64_encoded_string"
}

I have a PipeReader using which I read bytes in loop and keep appending to a list of bytes.
I then convert the list into a byte array, then convert it to base64 string and use WriteStringValue to write it.

public void WriteBinaryContent(PipeReader reader, Utf8JsonWriter writer)
{
    writer.WriteStartObject();
    writer.WritePropertyName("data");
    byte[] byteArray = ReadBinaryData(reader).;
    string base64data = Convert.ToBase64String(byteArray);
    writer.WriteStringValue(base64data);
    writer.WriteEndObject();
}

public byte[] ReadBinaryData(PipeReader reader)
{
    List<byte> bytes = new List<byte>();
    while (reader.TryRead(out ReadResult result))
    {
        ReadOnlySequence<byte> buffer = result.Buffer;
        bytes.AddRange(buffer.ToArray());

        // Tell the PipeReader how much of the buffer has been consumed.
        reader.AdvanceTo(buffer.End);

        // Stop reading if there's no more data coming.
        if (result.IsCompleted)
        {
            break;
        }
    }

    // Mark the PipeReader as complete.
    await reader.Complete();
    byte[] byteArray = bytes.ToArray();
    return byteArray;
}

The problem with this approach is excessive memory consumption. We need to keep the whole binary content in memory, convert to base64 and then write it.

Memory consumption is critical when using Utf8JsonWriter in the override of JsonConverter.Write() method in a web application.

Instead I am proposing a way to stream large binary content.

API Proposal

namespace System.Text.Json
{
    public class Utf8JsonWriter : IAsyncDisposable, IDisposable
    {
        // This returns a stream on which binary data can be written.
        // It will encode it to base64 and write to output stream.
        public Stream CreateBinaryWriteStream();
    }
}

API Usage

public void WriteBinaryContent(PipeReader reader, Utf8JsonWriter writer)
{
    writer.WriteStartObject();
    writer.WritePropertyName("data");
    using (Stream binaryStream = writer.CreateBinaryWriteStream())
    {
        StreamBinaryData(reader, binaryStream);
        binaryStream.Flush();
    }
    writer.WriteEndObject();
}

public void StreamBinaryData(PipeReader reader, Stream stream)
{
    List<byte> bytes = new List<byte>();
    while (reader.TryRead(out ReadResult result))
    {
        ReadOnlySequence<byte> buffer = result.Buffer;
        byte[] byteArray = buffer.ToArray();
        stream.Write(byteArray, 0, byteArray.Length);
        stream.Flush();
        // Tell the PipeReader how much of the buffer has been consumed.
        reader.AdvanceTo(buffer.End);

        // Stop reading if there's no more data coming.
        if (result.IsCompleted)
        {
            break;
        }
    }

    // Mark the PipeReader as complete.
    await reader.Complete();
}

Alternative Designs

No response

Risks

No response

Author: anhadi2
Assignees: -
Labels:

api-suggestion, area-System.Text.Json, untriaged

Milestone: -

@teo-tsirpanis
Copy link
Contributor

I think it should be an IBufferWriter<byte> instead of a Stream. Even better, a struct implementing IBufferWriter, with a Complete method that tells the Utf8JsonWriter that writing the big string finished.

@FiniteReality
Copy link

with a Complete method that tells the Utf8JsonWriter that writing the big string finished.

Personally, I prefer the idea of using IDisposable here and then having the Dispose method act as the Complete method you're talking about. This is similar to how other types work like log scopes.

@teo-tsirpanis
Copy link
Contributor

That's also an option, since this is the type's only purpose.

@ericstj
Copy link
Member

ericstj commented Mar 30, 2022

I have a requirement to write large binary content in json.

What's the reason for this constraint? Wouldn't it be more efficient to serve it up directly instead of embedding in a JSON payload?

@anhadi2
Copy link
Author

anhadi2 commented Mar 31, 2022

I have a requirement to write large binary content in json.

What's the reason for this constraint? Wouldn't it be more efficient to serve it up directly instead of embedding in a JSON payload?

We want to serve multiple binary content in the JSON with some additional fields.
The actual JSON will contain some more fields and response looks like:

{
  "value": [
    {
      "field1": "some_small_string",
      "data": "large_base64_encoded_string"
    },
    {
      "field1": "some_small_string",
      "data": "large_base64_encoded_string"
    }
  ]
}

@krwq
Copy link
Member

krwq commented Mar 31, 2022

@anhadi2 wouldn't it make more sense to append that data after the JSON is complete? I.e.:

{
  "value": [
    {
      "field1": "some_small_string",
      "data": "$binary_payload"
      //or "binary_payload": true or some other marker to tell "I'll be writing rest later"
    },
    {
      "field2": "some_small_string",
      "data": "$binary_payload"
    }
  ]
}
<base64 of payload for field 1, this could even be 4 bytes of length + data directly>
<base64 of payload for field 2>

@anhadi2
Copy link
Author

anhadi2 commented Apr 1, 2022

@anhadi2 wouldn't it make more sense to append that data after the JSON is complete? I.e.:

{
  "value": [
    {
      "field1": "some_small_string",
      "data": "$binary_payload"
      //or "binary_payload": true or some other marker to tell "I'll be writing rest later"
    },
    {
      "field2": "some_small_string",
      "data": "$binary_payload"
    }
  ]
}
<base64 of payload for field 1, this could even be 4 bytes of length + data directly>
<base64 of payload for field 2>

This will not work for us. We want the response to be a valid JSON.

@teo-tsirpanis
Copy link
Contributor

Something you could do is put a URL to the data in your JSON instead of the data themselves. The data will be transmitted in binary and much more efficiently than Base64.

Unless you want to persist this JSON file.

@anhadi2
Copy link
Author

anhadi2 commented Apr 1, 2022

Something you could do is put a URL to the data in your JSON instead of the data themselves. The data will be transmitted in binary and much more efficiently than Base64.

Unless you want to persist this JSON file.

Thanks for the suggestion. Yes, there might be other ways to do this.
However, the response format is already decided and we cannot change it at this stage.
Hence looking for an efficient way to transfer the base 64 encoded binary payload as a part of the JSON response body.

@krwq
Copy link
Member

krwq commented Jun 7, 2022

@anhadi2 would #68223 help for your use case?

@krwq krwq added needs-author-action An issue or pull request that requires more info or actions from the author. and removed untriaged New issue has not been triaged by the area owner labels Jun 7, 2022
@krwq krwq added this to the Future milestone Jun 7, 2022
@ghost
Copy link

ghost commented Jun 7, 2022

This issue has been marked needs-author-action and may be missing some important information.

@ghost ghost added the no-recent-activity label Jun 21, 2022
@ghost
Copy link

ghost commented Jun 21, 2022

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost
Copy link

ghost commented Jul 5, 2022

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@ghost ghost closed this as completed Jul 5, 2022
@teo-tsirpanis teo-tsirpanis closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2022
@oocx
Copy link

oocx commented Jul 21, 2022

Such an api would be useful for us as well. We are sending data to a 3rd party api that we cannot change, and that api expects files to be sent base64 encoded as part of a json object. The documents can be up to 100 MB in size, we'd like to avoid having to load the complete document into memory.

@ghost ghost removed the no-recent-activity label Jul 21, 2022
@mriehm
Copy link

mriehm commented Jul 30, 2022

We are in the same position of sending large base64 data in JSON to a 3rd party and being unable to change the contract.

@krwq You added the needs-author-action tag which seems to have led to this issue being closed. Since a couple other people have responded now requesting this feature, can it be reopened?

To answer your question about #68223, it would not solve the issue that all the data needs to be present in memory at once.

In the meantime I'll say to anyone else searching for a solution, if you have access to the underlying output Stream, then this ugly workaround should be viable:

static void WriteStreamValue(Stream outputStream, Utf8JsonWriter writer, Stream inputStream)
{
    writer.Flush();
    outputStream.Write(Encoding.UTF8.GetBytes("\""));

    using (CryptoStream base64Stream = new CryptoStream(outputStream, new ToBase64Transform(), CryptoStreamMode.Write, true))
    {
        inputStream.CopyTo(base64Stream);
    }

    writer.WriteRawValue("\"", skipInputValidation: true);
}

@teo-tsirpanis
Copy link
Contributor

Reopening.

@eiriktsarpalis eiriktsarpalis changed the title Add ability to stream large binary content to Utf8JsonWriter Add ability to stream large strings to Utf8JsonWriter Feb 15, 2024
@eiriktsarpalis eiriktsarpalis changed the title Add ability to stream large strings to Utf8JsonWriter Add ability to stream large strings in Utf8JsonWriter/Utf8JsonReader Apr 4, 2024
@Tragetaschen
Copy link
Contributor

Back in the day, #68223 (comment) removed the

public void WriteRawValue(ReadOnlySequence<char> json, bool skipInputValidation = false);

overload due to the surrogate pair handling necessary for that. This new API here would require that anyways and it would also be the building block for this method's implementation. It might be a good time to reintroduce this.

It would also tie in nicely with #97570 for example

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Jun 28, 2024

AI-related use case:

When receiving a response from an LLM in JSON format (e.g., with response format = json_object for OpenAI), it might represent something you want to process in a streaming way, e.g., if it's a chatbot answering a question and you want it to show up in realtime in the UI. For example it might be returning a JSON object like:

{ "confidence": 1, "citations": [...], "answer_text": "A really long string goes here, so long that you want to see it arrive incrementally in the UI" }

Currently that's not really viable to do with System.Text.Json.

While we don't have an API design in mind, here are some possibilities:

  • We could support this at the JsonReader level only. It would have to be able to read from a Stream and have some methods like ReadStringChunkAsync that gives you the next block from a string you're currently reading.
  • Or we could support it in the deserializer by mapping to properties of type Stream. However it's unclear how the developer could know which of the output object properties have been populated by the time the stream starts arriving, as the response object properties could appear in any order.

Evidence required

TBH we're still short on evidence that people really need to do this when working with an LLM, because:

  1. As far as we know, the main reason to want to do streaming at all when working with an LLM is to present the output in realtime in a chat-style UI. But in most cases a chat response will be done as plain text, not JSON. Even if the developer is thinking of using JSON because they want structured info (e.g., an array of citations, or a flag to indicate something about the response type), there are usually better alternatives. Examples:
    • If you want flags indicating the nature of the response, either have that determined by a JSON-returning planning phase that executes before you ask the LLM for its final response, or tell the LLM to embed the flag in its response in some format you decide (e.g., <is_low_certainty>).
    • If you want structured citations, tell the LLM to return them inline in the text in a known format you can find with a regex, e.g., <cite id="123">text</cite>.
  2. In non-chat scenarios, non-UI code will simply wait until the response completes before it takes action on it, so it doesn't need to parse in a streaming way.
  3. Even if you could do this via Utf8JsonReader, it would result in extremely painful code since you couldn't use a regular deserialization API and would instead have to write some kind of state machine that interprets a single specific JSON format.

Here's one bit of evidence that people will want to parse large strings in streaming JSON: https://www.boundaryml.com/blog/nextjs-rag-streaming-example. Again, it's not the only way to do this, but suggests some people will want to.

If anyone reading this has clear examples of LLM-related cases where they find it desirable to process a JSON response in a streaming way, please let us know!

@habbes
Copy link
Contributor

habbes commented Oct 16, 2024

@eiriktsarpalis I'm interested in contributing to this, our codebase still relies on less-than-ideal workarounds. I see that there was another PR that attempted to address this and was closed. I'd like to get some context on why the PR was closed to I'm well aligned with expectations. Also wanted to confirm that the scope of approved APIs only covers Utf8JsonWriter and not JsonSerializer or JsonConverters or Utf8JsonReader, is that correct? Also, the different proposed methods for Utf8JsonWriter can be implemented in separate PRs (for ease of review and workload management), is that correct?

@eiriktsarpalis
Copy link
Member

@habbes can you point out the PR you're referring to? I couldn't immediately find it skimming through the issue history.

@habbes
Copy link
Contributor

habbes commented Oct 16, 2024

@eiriktsarpalis this one: #101356

@habbes
Copy link
Contributor

habbes commented Oct 16, 2024

Seems like it was auto-closed due to lack of activity after receiving some comments and being marked as draft.

@jeffhandley jeffhandley modified the milestones: Future, 10.0.0 Dec 4, 2024
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jan 2, 2025
@davidfowl
Copy link
Member

@PranavSenthilnathan are you planning to do #67337 (comment) as well?

@habbes
Copy link
Contributor

habbes commented Jan 4, 2025

Hey @davidfowl is this issue being worked on by the team? I had it on my radar to try and take a stab at it from Monday as I have some bandwidth for the next 2 weeks. But if it’s already prioritized by the .NET team I can sit this one out.

@PranavSenthilnathan
Copy link
Member

Hey @davidfowl is this issue being worked on by the team? I had it on my radar to try and take a stab at it from Monday as I have some bandwidth for the next 2 weeks. But if it’s already prioritized by the .NET team I can sit this one out.

Thanks for the interest! The APIs in this issue are implemented in these PRs already:
#111041
#101356

@PranavSenthilnathan are you planning to do #67337 (comment) as well?

Yes, I have a local implementation of a byte[] converter with this new API but the perf isn't quite there yet. Once I fix that, I'll tackle string.

@eiriktsarpalis
Copy link
Member

Reopening this issue so we can track incorporation of streaming in the built-in converters as well.

@PranavSenthilnathan
Copy link
Member

PranavSenthilnathan commented Apr 15, 2025

I've been incorporating the new APIs into our built-in byte[] and string converters, and unfortunately, I'm not seeing any performance improvements - in many cases, performance is actually worse. There are two components to consider: serialization (writing) and deserialization (reading).

Serialization (Writing)

On the writing side, I tested an implementation that writes in chunks when the payload is large enough. During async serialization, it would serialize and asynchronously flush chunks of the string one at a time instead of all at once. This approach didn’t show any improvement in performance benchmarks.

There are a couple of reasons for this:

  • Fast baseline: Serialization of strings using Utf8JsonWriter is already very fast — especially for longer payloads — due to our vectorized implementation.

  • Reduced vectorization benefits: Dividing serialization into chunks diminishes the benefits of vectorization. The performance gain per chunk drops, and the sum ends up being slower than the monolithic approach.

  • Minimal GC benefit: While chunking might reduce allocations on the Large Object Heap (LOH), we already mitigate GC pressure through pooling. So there’s no substantial memory advantage.

I ran the implementation against the TechEmpower benchmarks (and additional benchmarks in Crank), and even under high concurrency, CPU usage remained in the single digits. Basically, segmented writing didn’t improve performance because the primary bottleneck is network I/O, not serialization.

Deserialization (Reading)

On the reading side, the prototype reads in chunks (partially deserializing UTF-8 as it arrives) until the full JSON string is received, at which point the final C# string is constructed. This approach consistently performed worse in benchmarks.

Here’s why:

  • Final materialization cost remains: Even if we process the incoming JSON as chunks, we still have to build a complete C# string at the end. Doing this all at once is just as fast as doing it in parts.

  • Loss of vectorization/pooling benefits: As with writing, our current implementation benefits heavily from vectorization and pooling, which chunking disrupts.

  • Unpredictable string length: Unlike writing, where string.Length is known in advance, reading doesn’t know a string is “long” until it’s already being parsed. This adds book-keeping overhead.

  • Performance regressions: Because string deserialization is such a hot path, even small overheads cause significant regressions — up to ~10%, and in some cases, as much as 40%. Additionally, TryRead is a performance-critical code path that relies on the JIT inlining everything just right; any deviation impacts performance noticeably.

Recommendation

Based on these results, I recommend that we do not use the segmented APIs in our built-in converters. We should also reconsider whether the Utf8JsonWriter APIs in this proposal should still be added if we’re not going to use them ourselves. Additionally, since resumable converters aren’t public, users can’t easily create custom converters to leverage these APIs.

That said, these APIs may still offer value outside of converters (like the usage example in the original API proposal). Also, in AI-related streaming scenarios (see: #67337 (comment)), they could be more applicable.

Prototype: #112129

Benchmark diffs
Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Tests.Perf_Depth.ReadSpanEmptyLoop(Depth: 1) 1.43 22.61 32.32
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.41 3799.42 5346.02
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromUt 1.40 3688.06 5172.93
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.40 3816.98 5342.58
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.39 405.07 562.91
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString(Mo 1.38 417.80 576.53
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromUt 1.38 3766.13 5181.19
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.38 142.15 195.48
System.Text.Json.Serialization.Tests.ReadMissingAndCaseInsensitive.Cas 1.38 424.07 583.12
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.37 401.33 550.19
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.37 137.42 188.24
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.36 146.98 200.56
System.Text.Json.Serialization.Tests.ReadMissingAndCaseInsensitive.Cas 1.36 433.05 587.35
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.36 102.23 138.61
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromRe 1.35 4582.81 6186.89
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream(Mo 1.35 514.35 693.74
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.34 95.08 127.83
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream(Mo 1.34 528.15 709.80
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString(Mo 1.33 432.76 577.54
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.33 151.54 201.23
System.Text.Json.Serialization.Tests.ReadMissingAndCaseInsensitive.Bas 1.33 430.56 571.71
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromRe 1.32 4587.25 6054.85
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.32 106.02 139.61
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.31 98.39 129.32
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byt 1.31 144.26 189.43
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.31 339.96 444.52
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byt 1.27 141.30 180.01
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString( 1.27 179.71 228.33
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.27 4247.56 5375.78
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromRea 1.27 205.60 260.09
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.26 376.06 475.24
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromRea 1.26 211.47 267.00
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.26 353.84 445.96
System.Text.Json.Serialization.Tests.ReadJson<HashSet>.DeserializeFromSt 1.26 4304.56 5411.67
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader(Mo 1.25 580.61 727.48
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.25 235.20 293.71
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.25 313.58 391.39
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.25 6324.13 7886.38
System.Text.Json.Serialization.Tests.ColdStartSerialization<SimpleStructWithProp 1.25 175.33 218.58
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8B 1.25 212.17 264.41
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStrin 1.24 229.27 284.97
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.24 371.63 461.65
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8By 1.24 288.25 357.62
System.Text.Json.Serialization.Tests.ColdStartSerialization<SimpleStructWithProp 1.24 198.76 246.19
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStrea 1.23 320.55 395.81
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8By 1.23 302.94 371.59
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.22 8847.95 10828.99
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.22 244.93 299.29
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.22 472.44 577.05
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.22 315.10 383.90
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString( 1.22 180.37 219.71
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.22 6489.84 7903.03
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.22 9112.33 11094.42
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.22 8793.06 10700.99
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.22 152.42 185.49
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8B 1.22 210.79 256.39
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.21 9095.14 11035.61
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.21 6791.57 8198.81
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.21 6480.05 7811.52
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.20 6531.87 7870.37
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.20 521.51 625.33
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.20 6800.95 8128.08
System.Text.Json.Serialization.Tests.ColdStartSerialization<SimpleStructWithProp 1.19 500.58 597.60
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.19 7987.94 9525.55
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.19 195.85 233.06
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.19 9376.51 11155.14
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.19 157.34 187.18
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.19 9424.81 11195.09
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStrea 1.19 316.72 376.10
System.Text.Json.Serialization.Tests.ReadJson<Dictionary<String, String>>.Deseri 1.18 8063.43 9479.86
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.17 527.92 619.45
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.16 395.29 459.83
System.Text.Json.Serialization.Tests.ReadJson.Deseria 1.16 491.47 569.92
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStrin 1.16 247.87 287.36
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.16 95016.48 110126.41
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromRea 1.16 11740.27 13601.01
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.16 406.36 470.63
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader 1.15 418.34 482.90
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream( 1.15 268.27 309.58
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.15 12635.61 14578.94
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader( 1.15 203.53 234.82
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReade 1.15 328.27 378.51
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader 1.15 419.29 482.93
System.Text.Json.Serialization.Tests.ColdStartSerialization<SimpleStructWithProp 1.15 470.25 540.87
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString(Mode 1.15 2411.53 2773.56
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromRea 1.15 11892.28 13677.06
System.Text.Json.Serialization.Tests.WriteJson<ImmutableDictionary<String, Strin 1.14 5836.89 6658.24
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader( 1.14 207.65 236.24
System.Text.Json.Serialization.Tests.ColdStartSerialization<SimpleStructWithProp 1.13 459.09 520.83
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.13 87802.37 99509.33
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.13 12761.47 14401.64
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPropert 1.13 2512.42 2835.26
System.Text.Json.Node.Tests.Perf_ParseThenWrite.ParseThenWrite(IsDataIndented: T 1.13 2755178.57 3106541.67
System.Text.Json.Serialization.Tests.WriteJson.SerializeToWriter(Mode 1.13 2244.09 2527.38
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.13 91362.21 102819.12
System.Text.Json.Tests.Perf_Reader.ReadMultiSpanSequenceEmptyLoop(IsDataCompact: 1.12 1752.74 1970.21
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.12 13195.10 14823.47
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString(Mode 1.12 2444.02 2740.55
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.12 14351.38 16083.18
System.Text.Json.Serialization.Tests.ColdStartSerialization<SimpleStructWithProp 1.12 553.82 619.64
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReade 1.12 338.52 378.71
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.12 88071.80 98371.93
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.12 88381.30 98559.39
System.Text.Json.Serialization.Tests.WriteJson.SerializeToStream(Mode 1.11 2323.47 2588.92
System.Text.Json.Serialization.Tests.WriteJson.SerializeToWriter(Mode 1.11 2237.25 2492.61
System.Text.Json.Serialization.Tests.ReadJson.Deseri 1.11 204.32 227.61
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPropert 1.11 2512.02 2795.73
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes(M 1.11 2357.86 2623.02
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.11 86537.71 96233.26
System.Text.Json.Serialization.Tests.WriteJson.SerializeToStream(Mode 1.11 2353.41 2615.38
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream( 1.11 267.09 295.78
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.10 14316.59 15784.02
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.10 12981.90 14304.26
System.Text.Json.Serialization.Tests.WriteJson<HashSet>.SerializeObjectP 1.10 2134.19 2347.97
System.Text.Json.Tests.Perf_Reader.ReadMultiSpanSequenceEmptyLoop(IsDataCompact: 1.10 4597.28 5053.14
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.10 119528.01 131366.61
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.10 13322.30 14619.57
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPr 1.09 168.21 184.06
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString(Mode 1.09 4285.31 4676.82
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.09 35.28 38.32
System.Text.Json.Serialization.Tests.WriteJson.SerializeToWriter(Mode 1.08 3980.49 4315.21
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: True, TestCa 1.07 24.33 26.13
System.Text.Json.Serialization.Tests.WriteJson<Dictionary<String, String>>.Seria 1.07 2818.49 3027.18
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.07 26720.95 28695.75
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPropert 1.07 4419.30 4744.99
System.Text.Json.Serialization.Tests.WriteJson<ImmutableSortedDictionary<String, 1.07 3302.70 3543.95
System.Text.Json.Serialization.Tests.WriteJson<Dictionary<String, String>>.Seria 1.07 3203.78 3434.26
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes(M 1.07 4176.64 4472.69
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.07 26486.68 28345.35
System.Text.Json.Serialization.Tests.WriteJson.Serial 1.07 166.49 178.13
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.07 27.46 29.36
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.07 33.75 36.08
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes(Mo 1.07 236.37 252.47
System.Text.Json.Serialization.Tests.WriteJson.SerializeObjectPropert 1.07 4393.66 4689.76
System.Text.Json.Serialization.Tests.ReadJson<Nullable>.Deserial 1.07 70.76 75.47
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString(Mode 1.07 4325.79 4613.39
System.Text.Json.Serialization.Tests.WriteJson.SerializeToUtf8Bytes(M 1.07 2466.98 2630.99
System.Text.Json.Serialization.Tests.WriteJson<HashSet>.SerializeToStrin 1.06 1957.61 2084.29
System.Text.Json.Serialization.Tests.WriteJson.SerializeToStream(Mo 1.06 167.46 178.25
System.Text.Json.Serialization.Tests.WriteJson.SerializeToString(Mo 1.06 144.96 154.04
System.Text.Json.Serialization.Tests.ReadJson<ImmutableDictionary<String, String 1.06 13586.04 14434.66
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.06 28404.79 30156.48
System.Text.Json.Serialization.Tests.WriteJson.SerializeToWriter(Mode 1.06 3988.90 4234.15
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.06 35.11 37.24
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Byte 1.06 5459.38 5785.78
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.06 51.14 54.19
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.06 26951.72 28536.83
System.Text.Json.Serialization.Tests.ReadJson<ImmutableSortedDictionary<String, 1.06 26680.62 28246.70
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader(Mode: 1.06 44.18 46.65
System.Text.Json.Tests.Perf_Reader.ReadMultiSpanSequenceEmptyLoop(IsDataCompact: 1.06 2936.65 3099.40
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.05 25.98 27.41
System.Text.Json.Serialization.Tests.WriteJson.SerializeToWriter(Mo 1.05 98.02 103.32
Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Node.Tests.Perf_ParseThenWrite.ParseThenWrite(IsDataIndented: F 1.43 2690406.06 1886629.03 bimodal
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: False, TestC 1.30 2774.20 2132.67
System.Text.Json.Serialization.Tests.WriteJson.Seria 1.27 83.36 65.51
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromReader(Mo 1.19 886.35 746.50
System.Text.Json.Tests.Perf_Base64.WriteByteArrayAsBase64_NoEscaping(NumberOfByt 1.18 44.65 37.68
System.Text.Json.Tests.Perf_Segment.ReadMultiSegmentSequenceUsingSpan(segmentSiz 1.13 2217.39 1960.20
System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes: 1.12 2341.61 2084.21
System.Text.Json.Tests.Perf_Reader.ReadSingleSpanSequenceEmptyLoop(IsDataCompact 1.11 1113.63 998.88
System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes: 1.11 2365.85 2126.25
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: True, TestCa 1.10 1115.53 1009.73
System.Text.Json.Tests.Perf_Base64.WriteByteArrayAsBase64_HeavyEscaping(NumberOf 1.10 39.07 35.38
System.Text.Json.Tests.Perf_Segment.ReadMultiSegmentSequenceUsingSpan(segmentSiz 1.10 2248.02 2040.69
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.08 31.10 28.76
System.Text.Json.Tests.Utf8JsonReaderCommentsTests.Utf8JsonReaderCommentParsing( 1.08 160.22 148.29
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream(Mode: 1.08 104.03 96.48
System.Text.Json.Serialization.Tests.WriteJson.Serializ 1.08 109619.19 101674.39
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: True, TestCa 1.08 2339.98 2171.22
System.Text.Json.Tests.Perf_Get.GetBoolean 1.07 40.28 37.79
System.Text.Json.Tests.Perf_Reader.ReadSingleSpanSequenceEmptyLoop(IsDataCompact 1.06 2342.91 2205.74
System.Text.Json.Document.Tests.Perf_ParseThenWrite.ParseThenWrite(IsDataIndente 1.06 8915.15 8407.96
System.Text.Json.Serialization.Tests.WriteJson<Nullable>.Seriali 1.06 95.20 89.80

@davidfowl
Copy link
Member

davidfowl commented Apr 16, 2025

Minimal GC benefit: While chunking might reduce allocations on the Large Object Heap (LOH), we already mitigate GC pressure through pooling. So there’s no substantial memory advantage.

Where is the memory use in the benchmarks? Do you have a GC trace we can look at? I wouldn't expect more throughput, I'd expect better scalability and memory usage if you were doing LOTS of these in concurrently.

It might also be worth running an end-to-end ASP.NET Core scenario with a large JSON payload.

@PranavSenthilnathan
Copy link
Member

Because of pooling, the memory numbers in the microbenchmarks are not useful (the warmup iterations populate the pool and the test iterations just use memory from the populated pool). However, I did run the Crank 400K JSON benchmark for a more representative workload. Here is a diff:

application main PR
Max Process CPU Usage (%) 24 24 0.00%
Max Cores usage (%) 1,332 1,371 +2.93%
Max Working Set (MB) 256 221 -13.67%
Max Private Memory (MB) 1,010 859 -14.95%
Build Time (ms) 6,902 6,848 -0.78%
Start Time (ms) 257 272 +5.84%
Published Size (KB) 102,474 102,474 0.00%
Symbols Size (KB) 28 28 0.00%
.NET Core SDK Version 10.0.100-preview.4.25214.32 10.0.100-preview.4.25214.32
Max Global CPU Usage (%) 24 27 +12.50%
Max CPU Usage (%) 21 25 +18.27%
Max Working Set (MB) 268 232 -13.41%
Max GC Heap Size (MB) 246 101 -59.05%
Size of committed memory by the GC (MB) 287 155 -45.95%
Max Number of Gen 0 GCs / sec 12.00 4.00 -66.67%
Max Number of Gen 1 GCs / sec 10.00 1.00 -90.00%
Max Number of Gen 2 GCs / sec 9.00 1.00 -88.89%
Max Gen 0 GC Budget (MB) 49 73 +48.98%
Max Time in GC (%) 65.00 21.00 -67.69%
Max Gen 0 Size (B) 8,589,304 642,032 -92.53%
Max Gen 1 Size (B) 6,158,856 16,142,536 +162.10%
Max Gen 2 Size (B) 13,066,280 33,844,440 +159.02%
Max LOH Size (B) 124,535,328 800,056 -99.36%
Max POH Size (B) 92,184,808 3,592,448 -96.10%
Max Allocation Rate (B/sec) 207,902,704 49,875,776 -76.01%
Max GC Heap Fragmentation (%) 540% 2,345% +334.09%
# of Assemblies Loaded 85 85 0.00%
Max Exceptions (#/s) 510 510 0.00%
Max Lock Contention (#/s) 59 1,183 +1,905.08%
Max ThreadPool Threads Count 59 56 -5.08%
Max ThreadPool Queue Length 7 5 -28.57%
Max ThreadPool Items (#/s) 66,036 187,966 +184.64%
Max Active Timers 1 1 0.00%
IL Jitted (B) 350,761 378,728 +7.97%
Methods Jitted 3,771 4,402 +16.73%
Load Working Set - P90 (MB) 255 221 -13.33%
Load CPU Usage - P90 (%) 15 15 0.00%
load main PR
Max Process CPU Usage (%) 12 12 0.00%
Max Cores usage (%) 673 679 +0.89%
Max Working Set (MB) 43 44 +2.33%
Max Private Memory (MB) 358 358 0.00%
Start Time (ms) 1 1 0.00%
Max Global CPU Usage (%) 16 16 0.00%
First Request (ms) 170 193 +13.53%
Requests/sec 11,780 11,778 -0.02%
Requests 177,749 177,631 -0.07%
Mean latency (ms) 24.65 26.27 +6.57%
Max latency (ms) 379.82 416.80 +9.74%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 4,495.36 4,495.36 0.00%
Latency 50th (ms) 19.97 19.71 -1.30%
Latency 75th (ms) 34.83 35.30 +1.35%
Latency 90th (ms) 52.56 55.91 +6.37%
Latency 99th (ms) 91.87 115.79 +26.04%

Note that this does show GC improvements in some cases (LOH/POH size) and regressions in others (Gen 1/Gen 2 size), but overall the CPU usage is the same. If you would like to see a different ASP.NET-specific benchmarks feel free to point me to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json in-pr There is an active PR which will close this issue when it is merged tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.