Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json: Utf8JsonWriter support for application/x-json-stream #82314

Open
CodeBlanch opened this issue Feb 17, 2023 · 9 comments
Open
Labels
Milestone

Comments

@CodeBlanch
Copy link
Contributor

Greetings once again STJ team! 😄

I ran into a kind of interesting situation trying to send an application/x-json-stream POST using Utf8JsonWriter.

The format I want is...

{"item1key1": "value1"}\n
{"item2key1": "value1"}\n
{"item3key1": "value1"}\n

Essentially newline delimited JSON.

I made it work by doing this (more or less)...

        var stream = new MemoryStream();
        var writer = new Utf8JsonWriter(stream, new JsonWriterOptions { SkipValidation = true });

        foreach (var item in batch)
        {
            // This is the interesting line here.
            writer.Reset(stream);

            this.SerializeItemToJson(item, writer);

            writer.Flush();

            stream.Write(NewLine, 0, 1);
        }

The interesting line is the writer.Reset. That is needed because of depth tracking inside Utf8JsonWriter which causes a , to be written out after the first item.

The challenge I have is the Reset method does a whole lot of housekeeping which slows things down.

What I would like is either a way to just reset the depth so the comma isn't emitted or a way to tell Utf8JsonWriter to emit \n where it would have done the comma (but only for these top-level objects).

Thoughts?

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 17, 2023
@ghost
Copy link

ghost commented Feb 17, 2023

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Greetings once again STJ team! 😄

I ran into a kind of interesting situation trying to send an application/x-json-stream POST using Utf8JsonWriter.

The format I want is...

{"item1key1": "value1"}\n
{"item2key1": "value1"}\n
{"item3key1": "value1"}\n

Essentially newline delimited JSON.

I made it work by doing this (more or less)...

        var stream = new MemoryStream();
        var writer = new Utf8JsonWriter(stream, new JsonWriterOptions { SkipValidation = true });

        foreach (var item in batch)
        {
            // This is the interesting line here.
            writer.Reset(stream);

            this.SerializeItemToJson(item, writer);

            writer.Flush();

            stream.Write(NewLine, 0, 1);
        }

The interesting line is the writer.Reset. That is needed because of depth tracking inside Utf8JsonWriter which causes a , to be written out after the first item.

The challenge I have is the Reset method does a whole lot of housekeeping which slows things down.

What I would like is either a way to just reset the depth so the comma isn't emitted or a way to tell Utf8JsonWriter to emit \n where it would have done the comma (but only for these top-level objects).

Thoughts?

Author: CodeBlanch
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

@gregsdennis
Copy link
Contributor

gregsdennis commented Feb 17, 2023

I'll let the dotnet team answer officially, but my guess is that the writer wasn't designed to write multiple JSON instances (as you do with JSON streams). It's probably expected that you create a new writer for a new instance (you can likely keep the same stream, though).

I like your solution, though.

@svick
Copy link
Contributor

svick commented Feb 19, 2023

Related issue for the reading side: #33030.

@eiriktsarpalis
Copy link
Member

The challenge I have is the Reset method does a whole lot of housekeeping which slows things down.

What does the Reset method do specifically that slows things down? Looking at its documentation it seems to have been designed specifically to address your use case and I couldn't detect anything in the implementation that might drastically impact performance.

@eiriktsarpalis eiriktsarpalis added the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 21, 2023
@ghost
Copy link

ghost commented Feb 21, 2023

This issue has been marked needs-author-action and may be missing some important information.

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 21, 2023
@CodeBlanch
Copy link
Contributor Author

@eiriktsarpalis The bummer with the current design is I have to call it in my inner loop. So I wouldn't say it is a slow operation, but it is slower than needed. Here's a benchmark:

Method NumberOfItems Mean Error StdDev
ResetBenchmark 1 56.98 ns 0.862 ns 0.720 ns
ResetDepthBenchmark 1 58.13 ns 1.074 ns 1.433 ns
ResetBenchmark 100 5,528.35 ns 90.470 ns 84.625 ns
ResetDepthBenchmark 100 5,062.41 ns 65.433 ns 61.206 ns
ResetBenchmark 1000 56,238.61 ns 434.657 ns 406.579 ns
ResetDepthBenchmark 1000 49,154.42 ns 682.534 ns 638.442 ns

My particular use case is I have a batch of items in memory I need to send off at some interval or when it gets full.

Benchmark code
public class Utf8JsonWriterBenchmarks
{
    private static readonly byte[] NewLine = "\n"u8.ToArray();
    private static readonly JsonEncodedText IndexPropertyName = JsonEncodedText.Encode("Index");
    private static readonly JsonEncodedText NamePropertyName = JsonEncodedText.Encode("Name");
    private static readonly Action<Utf8JsonWriter> ResetDepth = BuildResetDepthAction();
    private readonly MemoryStream stream = new();
    private readonly Utf8JsonWriter writer;

    public Utf8JsonWriterBenchmarks()
    {
        this.writer = new(this.stream, new JsonWriterOptions { SkipValidation = true });
    }

    [Params(1, 100, 1000)]
    public int NumberOfItems { get; set; }

    [Benchmark]
    public void ResetBenchmark()
    {
        this.stream.Position = 0;

        for (int i = 0; i < this.NumberOfItems; i++)
        {
            this.writer.Reset(this.stream);

            SerializeItem(i, this.writer);

            this.writer.Flush();

            this.stream.Write(NewLine, 0, 1);
        }
    }

    [Benchmark]
    public void ResetDepthBenchmark()
    {
        this.stream.Position = 0;
        this.writer.Reset(this.stream);

        for (int i = 0; i < this.NumberOfItems; i++)
        {
            ResetDepth(this.writer);

            SerializeItem(i, this.writer);

            this.writer.Flush();

            this.stream.Write(NewLine, 0, 1);
        }
    }

    private static void SerializeItem(int index, Utf8JsonWriter writer)
    {
        writer.WriteStartObject();
        writer.WriteNumber(IndexPropertyName, index);
        writer.WriteString(NamePropertyName, "name");
        writer.WriteEndObject();
    }

    private static Action<Utf8JsonWriter> BuildResetDepthAction()
    {
        var currentDepthField = typeof(Utf8JsonWriter).GetField("_currentDepth", BindingFlags.Instance | BindingFlags.NonPublic);

        var dynamicMethod = new DynamicMethod(
            "ResetDepth",
            returnType: null,
            new Type[] { typeof(Utf8JsonWriter) },
            typeof(Utf8JsonWriterBenchmarks).Module,
            skipVisibility: true);

        var generator = dynamicMethod.GetILGenerator();

        generator.Emit(OpCodes.Ldarg_0);
        generator.Emit(OpCodes.Ldc_I4_0);
        generator.Emit(OpCodes.Stfld, currentDepthField);
        generator.Emit(OpCodes.Ret);

        return (Action<Utf8JsonWriter>)dynamicMethod.CreateDelegate(typeof(Action<Utf8JsonWriter>));
    }
}

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 21, 2023
@eiriktsarpalis
Copy link
Member

You might be able to avoid a number of checks if you just use the parameterless Reset() method:

[Benchmark(Baseline = true)]
public void ResetBenchmark()
{
    this.stream.Position = 0;

    for (int i = 0; i < this.NumberOfItems; i++)
    {
        SerializeItem(i, this.writer);

        this.writer.Flush();
        this.stream.Write(NewLine, 0, 1);
        this.writer.Reset();
    }
}

Which produces slightly better numbers percentage-wise:

Method NumberOfItems Mean Error StdDev Ratio RatioSD
ResetBenchmark 1 79.85 ns 0.881 ns 0.824 ns 1.00 0.00
ResetDepthBenchmark 1 83.67 ns 1.060 ns 0.992 ns 1.05 0.02
ResetBenchmark 100 7,790.08 ns 83.742 ns 74.235 ns 1.00 0.00
ResetDepthBenchmark 100 7,570.72 ns 56.274 ns 52.639 ns 0.97 0.01
ResetBenchmark 1000 77,599.60 ns 485.364 ns 454.010 ns 1.00 0.00
ResetDepthBenchmark 1000 76,339.86 ns 607.348 ns 568.114 ns 0.98 0.01

Improvement is marginal, at the expense of not fully resetting the state of the writer. I would expect it to be even less pronounced if each loop called into JsonSerializer methods or if a less trivial JSON object was serialized on each iteration.

So I do think Utf8JsonWriter.Reset() is the way to go here.

@CodeBlanch
Copy link
Contributor Author

@eiriktsarpalis

In my actual code I do have to reset to a different stream but...

writer.Reset(inputStream);
for (...)
{
    // do stuff
    writer.Reset();
}

...works for me. Didn't realize the parameter-less reset preserves the current stream. Nice!

I will say trying to do this application/x-json-stream using Utf8JsonWriter is still a bit awkward.

  • SkipValidation must be set to true
  • You have to write your own \n somehow

Probably something could be done to make it better but I'm not blocked so I'll just leave this out there in case anyone else wants to push for it 😄

@eiriktsarpalis eiriktsarpalis added this to the Future milestone Feb 21, 2023
@eiriktsarpalis eiriktsarpalis added tenet-performance Performance related issue and removed needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Feb 21, 2023
@cmbankester
Copy link

cmbankester commented Mar 16, 2023

Another downside to needing to reset the writer is that it also requires flushing the writer and by extension, the underlying stream. This causes problems with, e.g., using a BufferedStream as the buffer gets flushed each time the json writer is flushed, eliminating the performance gains of using a BufferedStream altogether. This applies to other buffering streams as well, such as Azure Blob write streams. Of course a custom duplex stream could be created that doesn't flush to the destination each time the writer flushes, but that seems abnormal and like sidestepping the actual problem.

As @CodeBlanch said, using Utf8JsonWriter in this situation is pretty awkward, which is a shame considering this is precisely the type of situation where I'd want to use it (i.e. customization of JSON stream serialization/formatting).

I also just want to note that if we had the ability to disable the automatic comma insertion (e.g. a property like SkipAutomaticListSeparatorInsertionAtDepthZero on JsonWriterOptions [or something better, just spitballing]), this would be a non issue, as we can write newlines to the Utf8JsonWriter via WriteRawValue(..., skipInputValidation: true), meaning we could implement as:

var stream = new MemoryStream();
var writer = new Utf8JsonWriter(stream, new JsonWriterOptions { SkipValidation = true, SkipAutomaticListSeparatorInsertionAtDepthZero = true });
foreach (var item in batch)
{
    this.SerializeItemToJson(item, writer);
    writer.WriteRawValue("\n", skipInputValidation: true);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants