Fix bugs causing separator to be omitted after ODataUtf8JsonWriter.WriteRawValue #2527

habbes · 2022-10-14T13:37:16Z

Issues

*This pull request fixes #2525

Description

This PR fixes a bug in ODataUtf8JsonWriter.WriteRawValue that leads to invalid JSON being written. One common use case for WriteRawValue is to write the value of ODataUntypedValue objects, which could be anything.

ODataUtf8JsonWriter.WriteRawValue was implemented by writing the input directly to the output stream, bypassing the internal Utf8JsonWriter. This is because Utf8JsonWriter.WriteRawValue method does not exist in .NET Core 3.1. We intend to use Utf8JsonWriter in ODL v8 when we drop support for older frameworks.

Utf8JsonWriter keeps track of whether or not it's writing the first item in an object or an array. It knows that it needs to place a separator (comma) between the end of one key-value pair and the beginning of another, it also knows that it needs to place separators between array elements.

Here's an example of a normal flow:

ODataUtf8JsonWriter.WriteName("SomeKey") -> calls Utf8JsonWriter.WritePropertyName("SomeKey")
ODataUtf8JsonWriter.WriteValue("SomeVal") -> calls Utf8JsonWriter.WriteStringValue("SomeVal"). Utf8JsonWriter sets a flag signaling that the next property write should place a comma before the property name
ODataUtf8JsonWriter.WriteName("AnotherKey") -> calls Utf8JsonWriter.WritePropertyName("AnotherKey"). Utf8JsonWriter writes a comma before the property name
output excerpt: "SomeKey":"SomeVal","AnotherKey":

When you bypass the Utf8JsonWriter and write directly to the output stream, the Utf8JsonWriter is not aware of this write. So, the Utf8JsonWriter will behave as if that raw value was never written. This may lead to missing or repeated separators, both of which lead to invalid JSON being written to the output.

Here's an example of an invalid JSON from using the raw value:

ODataUtf8JsonWriter.WriteName("SomeKey") -> calls Utf8JsonWriter.WritePropertyName("SomeKey")
ODataUtf8JsonWriter.WriteRawValue("\"SomeRawVal\"") -> calls writeStream.Write("\"SomeRawVal\""). Utf8JsonWriter is not aware of this and does not set a flag to write a separator before the next property.
ODataUtf8JsonWriter.WriteName("AnotherKey") -> calls Utf8JsonWriter.WritePropertyName("AnotherKey").
output excerpt: "SomeKey":"SomeVal""AnotherKey": (comma missing between "SomeRawVal" and "AnotherKey")

Similarly, Utf8JsonWriter automatically places a separator before each element of an array except for the first element. However, when raw values are involved, the following bugs surface:

The separator is not placed before raw values, because Ut8fJsonWriter is not aware of them
When a raw value is the first item of an array, then followed by a non-raw value (written by Utf8JsonWriter), not comma will be placed before this non-raw value because Utf8JsonWriter mistakenly think it's the first item of the array.

To fix these issues, this PR writes a separator manually in the following scenarios:

Inside an object, before writing a property name that follows a raw value
Inside an array, before writing a raw value, unless the raw value is the first element in the array
Inside an array, before writing a value that follows a sequence of one or more consecutive raw values that start the array e.g. ["raw1", "raw2", "raw3", "raw3", "non-raw value"] (from the perspective of Utf8JsonWriter, "non-raw value" is the first item in the array, and so it won't automatically place a comma before it).

This PR achieves this by adding a couple of variables to keep track of when we've written raw values as well as a stack to keep track of whether we're in an array or object. The stack is based on BitStack implemented I "borrowed" from .NET runtime (which is used in the Utf8JsonWriter implementation). I slightly adapted because it used some syntax not supported in the C# version used by ODL and also because its Pop() method does not return the most recently pushed item on the stack. Not sure whether this is a bug or not. But I also pulled the tests and adjusted them accordingly. The BitStack is an efficient stack implementation optimized for stacks where there are only two types of values (represented as true and false). It stores the values in a sequence of integers, where a value in the stack is stored in a bit in one of the ints.

The first iteration of this PR caused a performance regression when the calls to ODataUtf8JsonWriter.WriteRawValue() are made. This was mainly due to frequent flushing (I flushed the writer before writing raw values to the stream, to ensure correct order of writes). I fixed the perf issue by writing to the same ArrayBufferWriter<byte> that is passed to Utf8JsonWriter, this means there's no need to flush before the buffer is full. This improved performance. I also updated the benchmarks to include to optionally include raw values.

Checklist (Uncheck if it is not completed)

Test cases added
Build and test with one-click build and test script passed

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.

habbes · 2022-10-24T06:41:09Z

I've added to raw values to the benchmarks and noticed a considerable perf degradation. After I added raw values to the written payload, ODataMessageWriter-ODataUtf8JsonWriter seemed to get slower than ODataMessageWriter. Memory allocations have also increased. The memory increased memory allocation in NoOpJsonWriter-Direct (from 18KB to 5K) is very baffling. I do expect things to be slighly slower and allocate slightly more memory cause we're writing more data, but not by the observed extent. And I'd still expected ODataUtf8JsonWriter-based writer to be faster than their JsonWriter` counterparts.

Benchmarks with PR changes, but before writing raw values:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348
Intel Xeon E-2336 CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK= 5.0.404 [C:\Program Files\dotnet\sdk]
[Host] : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT

Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1

Method	WriterName	Mean	Error	StdDev	Median	Gen 0	Gen 1	Allocated
WriteToFileAsync	JsonSerializer	33.444 ms	0.6622 ms	0.8132 ms	32.950 ms	-	-	12 KB
WriteToFileAsync	NoOpWriter-Direct	1.317 ms	0.0133 ms	0.0118 ms	1.313 ms	-	-	8 KB
WriteToFileAsync	ODataJsonWriter-Direct	35.745 ms	0.3827 ms	0.3579 ms	35.850 ms	1000.0000	-	6,675 KB
WriteToFileAsync	ODataJsonWriter-Direct-Async	179.549 ms	0.5134 ms	0.4551 ms	179.462 ms	8000.0000	-	48,550 KB
WriteToFileAsync	ODataMessageWriter	280.770 ms	0.7588 ms	0.7098 ms	280.512 ms	36000.0000	-	224,024 KB
WriteToFileAsync	ODataMessageWriter-Async	834.186 ms	2.3173 ms	2.0542 ms	833.958 ms	54000.0000	1000.0000	334,997 KB
WriteToFileAsync	ODataMessageWriter-NoOp	226.985 ms	1.1005 ms	1.0294 ms	227.011 ms	35000.0000	-	217,365 KB
WriteToFileAsync	ODataMessageWriter-NoOp-Async	335.748 ms	0.7169 ms	0.6705 ms	335.673 ms	35000.0000	-	217,368 KB
WriteToFileAsync	ODataMessageWriter-NoValidation	247.049 ms	0.8430 ms	0.7885 ms	246.715 ms	32000.0000	-	200,583 KB
WriteToFileAsync	ODataMessageWriter-NoValidation-Async	794.218 ms	1.2409 ms	1.1607 ms	794.490 ms	51000.0000	1000.0000	311,358 KB
WriteToFileAsync	ODataMessageWriter-Utf16	292.286 ms	0.6426 ms	0.6011 ms	292.312 ms	36000.0000	-	224,018 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter	268.767 ms	0.5205 ms	0.4614 ms	268.758 ms	35000.0000	1000.0000	217,599 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-Async	438.272 ms	0.4929 ms	0.4611 ms	438.149 ms	35000.0000	1000.0000	219,011 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-NoValidation	232.884 ms	0.3154 ms	0.2950 ms	232.763 ms	31000.0000	1000.0000	194,160 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-NoValidation-Async	392.491 ms	0.5989 ms	0.5309 ms	392.433 ms	31000.0000	1000.0000	195,572 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-Utf16	272.817 ms	0.5228 ms	0.4890 ms	272.629 ms	35000.0000	1000.0000	217,600 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct	21.425 ms	0.0402 ms	0.0376 ms	21.438 ms	-	-	241 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct-Async	43.708 ms	0.3825 ms	0.3391 ms	43.687 ms	-	-	520 KB
WriteToFileAsync	Utf8JsonWriter-Direct-ArrayPool-NoValidation	18.465 ms	0.0574 ms	0.0537 ms	18.482 ms	-	-	29 KB

Benchmarks results when including raw values in payload:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348
Intel Xeon E-2336 CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK= 5.0.404 [C:\Program Files\dotnet\sdk]
[Host] : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT

Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1

Method	WriterName	Mean	Error	StdDev	Gen 0	Gen 1	Allocated
WriteToFileAsync	JsonSerializer	39.725 ms	0.2850 ms	0.2526 ms	-	-	16 KB
WriteToFileAsync	NoOpWriter-Direct	3.808 ms	0.0745 ms	0.0697 ms	-	-	5,003 KB
WriteToFileAsync	ODataJsonWriter-Direct	43.317 ms	0.8210 ms	0.7680 ms	1000.0000	-	11,675 KB
WriteToFileAsync	ODataJsonWriter-Direct-Async	205.117 ms	0.5807 ms	0.4849 ms	8000.0000	-	53,798 KB
WriteToFileAsync	ODataMessageWriter	298.201 ms	0.3155 ms	0.2797 ms	39000.0000	-	239,573 KB
WriteToFileAsync	ODataMessageWriter-Async	866.389 ms	1.6864 ms	1.4082 ms	57000.0000	1000.0000	350,650 KB
WriteToFileAsync	ODataMessageWriter-NoOp	231.295 ms	0.4075 ms	0.3812 ms	38000.0000	-	232,914 KB
WriteToFileAsync	ODataMessageWriter-NoOp-Async	354.659 ms	0.9602 ms	0.8982 ms	38000.0000	-	232,917 KB
WriteToFileAsync	ODataMessageWriter-NoValidation	264.587 ms	0.6677 ms	0.6246 ms	35000.0000	-	216,132 KB
WriteToFileAsync	ODataMessageWriter-NoValidation-Async	812.586 ms	1.9369 ms	1.8118 ms	53000.0000	1000.0000	327,209 KB
WriteToFileAsync	ODataMessageWriter-Utf16	310.894 ms	0.5446 ms	0.4828 ms	39000.0000	-	239,567 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter	422.922 ms	0.8506 ms	0.7957 ms	38000.0000	-	236,455 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-Async	1,295.575 ms	2.9140 ms	2.7258 ms	62000.0000	1000.0000	378,681 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-NoValidation	383.425 ms	0.8009 ms	0.7492 ms	34000.0000	-	213,016 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-NoValidation-Async	1,202.503 ms	5.7807 ms	5.4073 ms	58000.0000	1000.0000	355,241 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-Utf16	470.336 ms	0.5858 ms	0.4892 ms	38000.0000	-	238,800 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct	117.280 ms	0.4300 ms	0.3812 ms	1000.0000	-	8,547 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct-Async	353.166 ms	3.5165 ms	3.2893 ms	8000.0000	-	51,325 KB
WriteToFileAsync	Utf8JsonWriter-Direct-ArrayPool-NoValidation	32.796 ms	0.0554 ms	0.0491 ms	-	-	5,034 KB

habbes · 2022-10-24T06:54:05Z

Based on some high-level profiler analysis, one of the possible explanations of the performance regression could be the fact that when we're writing raw values value, we have to flush the contents of the Utf8JsonWriter buffer to the stream to ensure correct order of writes. However, this doesn't sound like a good explanation for the memory regression on NoOpWriter-Direct

ElizabethOkerio · 2022-10-24T07:03:20Z

src/Microsoft.OData.Core/Json/BitStack.cs

+    /// <remarks>
+    /// This has been adapted from the .NET runtime's internal BitStack which is used by Utf8JsonWriter
+    /// https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Json/src/System/Text/Json/BitStack.cs
+    /// This has been slightly modified because the original Pop() method did not return the last value pushed.


Just curious. What does the original Pop() method return?

If you use the Utf8JsonWriter's WriteRawValue method in .net 6.0, do you still need to do this modification to the Pop method? That's if Utf8JsonWriter uses the BitStack internally to track when or how to use the separators.

It seems to return the value of the item before the most recent item, or something like that.

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

habbes · 2022-11-23T16:10:43Z

I made some changes to the PR to address the performance issues. Instead of a passing the writeStream to Utf8JsonWriter, I pass an ArrayBufferWriter<byte> buffer. Now when we need to write raw values directly, we write them to the buffer writer, this means we don't need to flush to ensure correct order of writes. We only write to the stream when we flush.

Here are the new perf figures when the payload includes raw values

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348
Intel Xeon E-2336 CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.404 [C:\Program Files\dotnet\sdk]
[Host] : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT

Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1

Now ODataUtf8JsonWriter is more efficient than the default JsonWriter even when raw values are involved. These benchmarks report more memory allocations than the baseline because it performs string interpolation to add quotes to the "raw" string values, this results in additional string allocations.

Method	WriterName	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
WriteWithRawValues	NoOpWriter-Direct	3.918 ms	0.0755 ms	0.0776 ms	-	-	-	5,003 KB
WriteWithRawValues	ODataJsonWriter-Direct	44.062 ms	0.8742 ms	0.8978 ms	1000.0000	-	-	11,675 KB
WriteWithRawValues	ODataJsonWriter-Direct-Async	206.520 ms	1.0503 ms	0.9825 ms	8000.0000	-	-	53,797 KB
WriteWithRawValues	ODataMessageWriter	299.875 ms	0.6858 ms	0.6415 ms	39000.0000	-	-	239,573 KB
WriteWithRawValues	ODataMessageWriter-Async	894.690 ms	1.8934 ms	1.6784 ms	57000.0000	1000.0000	-	350,650 KB
WriteWithRawValues	ODataMessageWriter-Utf8JsonWriter	288.920 ms	0.6315 ms	0.5907 ms	38000.0000	1000.0000	-	236,511 KB
WriteWithRawValues	ODataMessageWriter-Utf8JsonWriter-Async	487.874 ms	0.4348 ms	0.4067 ms	40000.0000	3000.0000	2000.0000	256,846 KB
WriteWithRawValues	ODataUtf8JsonWriter-Direct	32.389 ms	0.0979 ms	0.0916 ms	1000.0000	-	-	8,604 KB
WriteWithRawValues	ODataUtf8JsonWriter-Direct-Async	60.699 ms	0.2026 ms	0.1896 ms	3000.0000	2000.0000	2000.0000	28,935 KB

New benchmark figures without raw values

ODataUtf8JsonWriter still performing better than JsonWriter. There's also been a slight performance improvement in ODataUtf8JsonWriter's async API, and slight memory improvement in ODataJsonWriter (sync and async).

Method	WriterName	Mean	Error	StdDev	Gen 0	Gen 1	Allocated
WriteToFileAsync	NoOpWriter-Direct	1.383 ms	0.0237 ms	0.0210 ms	-	-	4 KB
WriteToFileAsync	ODataJsonWriter-Direct	35.790 ms	0.1317 ms	0.1100 ms	1000.0000	-	6,675 KB
WriteToFileAsync	ODataJsonWriter-Direct-Async	178.943 ms	0.6467 ms	0.5733 ms	8000.0000	-	48,537 KB
WriteToFileAsync	ODataMessageWriter	271.745 ms	0.3669 ms	0.3432 ms	36000.0000	-	224,024 KB
WriteToFileAsync	ODataMessageWriter-Async	811.026 ms	2.0030 ms	1.7756 ms	54000.0000	1000.0000	335,004 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter	260.598 ms	0.4387 ms	0.4104 ms	35000.0000	1000.0000	217,443 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct	21.663 ms	0.0379 ms	0.0317 ms	-	-	84 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct-Async	43.895 ms	0.2531 ms	0.2114 ms	-	-	294 KB

Baseline results (from master branch)

Method	WriterName	Mean	Error	StdDev	Gen 0	Gen 1	Allocated
WriteToFileAsync	NoOpWriter-Direct	1.277 ms	0.0071 ms	0.0063 ms	-	-	3 KB
WriteToFileAsync	ODataJsonWriter-Direct	34.563 ms	0.1445 ms	0.1281 ms	1000.0000	-	6,676 KB
WriteToFileAsync	ODataJsonWriter-Direct-Async	180.394 ms	0.4863 ms	0.4061 ms	8000.0000	-	48,538 KB
WriteToFileAsync	ODataMessageWriter	282.512 ms	0.4670 ms	0.4368 ms	36000.0000	-	224,024 KB
WriteToFileAsync	ODataMessageWriter-Async	839.509 ms	1.8395 ms	1.6306 ms	54000.0000	1000.0000	334,978 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter	261.859 ms	0.3300 ms	0.2925 ms	35000.0000	1000.0000	217,599 KB
WriteToFileAsync	ODataMessageWriter-Utf8JsonWriter-Async	447.947 ms	0.4497 ms	0.4206 ms	35000.0000	1000.0000	219,011 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct	21.609 ms	0.0564 ms	0.0471 ms	-	-	241 KB
WriteToFileAsync	ODataUtf8JsonWriter-Direct-Async	44.011 ms	0.4400 ms	0.4115 ms	-	-	520 KB

habbes · 2022-11-23T17:54:36Z

I've added tests for the buffer-flushing logic because there was a gap in the tests.

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

gathogojr · 2022-11-28T12:14:10Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+            this.CommitWriterContentsToBuffer();
+            this.writeStream.Write(this.bufferWriter.WrittenMemory.Span);
+            this.bufferWriter.Clear();
+            this.writeStream.Flush();
        }

        private void FlushIfBufferThresholdReached()


Because of how frequent FlushIfBufferThresholdReached method is called, do you think adding MethodImpl(MethodImplOptions.AggressiveInlining) attribute to this method could improve performance? Could you perhaps try and get some perf stats around such a change?

Let me try it out. I usually leave that to the compiler to decide, but in the BCL there's a lot of explicit MethodIml(MethodImplOptions.AggressiveInlining). I'm also curious to see the result, and also whether the compiler is already inlining this.

I've added the inlining attribute, but I can't tell from the benchmarks whether it leads to an improvement. The absolute mean time of ODataUtf8JsonWriter-Direct benchmark has stayed the same (about 22ms). The ODataMessageWriter-Utf8JsonWriter mean duration seems to have gone down to <254ms. But this happens sometimes, so it's not necessarily due to this change. The relative gap between that and the baseline ODataMessageWriter has remain the same (about 6-7% difference). Though it went up as high as 9% difference in one of the runs.

gathogojr · 2022-11-28T12:15:11Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+        /// bufferWriter. This should be called before writing directly
+        /// to the bufferWriter.
+        /// </summary>
+        private void CommitWriterContentsToBuffer()


Because of how frequent CommitWriterContentsToBuffer method is called, do you think adding MethodImpl(MethodImplOptions.AggressiveInlining) attribute to this method could improve performance? Could you perhaps try and get some perf stats around such a change?

Let me try that out and share the results.

See: https://github.com/OData/odata.net/pull/2527/files#r1035532741

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

test/FunctionalTests/Microsoft.OData.Core.Tests/Json/JsonWriterBaseTests.cs

test/FunctionalTests/Microsoft.OData.Core.Tests/Json/ODataUtf8JsonWriterAsyncTests.cs

gathogojr · 2022-11-28T13:37:00Z

test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterAsyncPayloadWriter.cs

@@ -92,14 +94,30 @@ public async Task WritePayloadAsync(IEnumerable<Customer> payload, Stream stream
                // start write homeAddress
                await writer.WriteStartAsync(homeAddressInfo);

-                var homeAddressResource = new ODataResource
+                ODataResource homeAddressResource;
+                if (includeRawValues)


I don't know if there was a specific reason why in test/PerformanceTests/SerializationComparisonsTests/Lib/DataModel.cs you added Misc property in between City and Street but if you were to add it to the bottom, I believe you'd be able to avoid duplicate code in this file plus other files with similar duplicated code:

var homeAddressPropertes = new List<ODataProperty> { new ODataProperty { Name = "City", Value = customer.HomeAddress.City }, new ODataProperty { Name = "Street", Value = customer.HomeAddress.Street } }; if (includeRawValues) { homeAddressProperties.Add(new ODataProperty { Name = "Misc", Value = new ODataUntypedValue() { RawValue = $"\"{customer.HomeAddress.Misc}\"" } }) } var homeAddressResource = new ODataResource { Properties = homeAddressProperties }

Even with Misc sandwiched between City and Street, you should be able to do this:

var homeAddressPropertes = new List<ODataProperty> { new ODataProperty { Name = "City", Value = customer.HomeAddress.City } }; if (includeRawValues) { homeAddressProperties.Add(new ODataProperty { Name = "Misc", Value = new ODataUntypedValue() { RawValue = $"\"{customer.HomeAddress.Misc}\"" } }) } homeAddressProperties.Add( new ODataProperty { Name = "Street", Value = customer.HomeAddress.Street }); var homeAddressResource = new ODataResource { Properties = homeAddressProperties }

This comment applies to both sync and async sections of similar code in the following files:

test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterPayloadWriter.cs

test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterAsyncPayloadWriter.cs

Thanks. There's no reason to have it in between. Add the property first, then added the conditional statement later in order to be able to run tests without raw values.

By the way, the reason I had the if statement outside of the loop was to avoid calling the if statement in each iteration since the value of includeRawValues doesn't change inside the loop. I thought having the if statement inside the loop means the benchmark test would pay the penalty of the if statement even when the test run doesn't include raw values. Maybe this is something that the system can optimize on its own via branch prediction. I didn't compare the benchmarks between having the if statement in the loop and outside to validate whether it makes a difference.

I have modified the foreach (address in customer.Addresses) code to move the if statement inside the loop and noticed the performance slow down by 10s on the ODataMessageWriter sync and async tests. However, this could be a fluke. But I've just realized your question was based on customer.HomeAddress and not customer.Addresses.

I think the perf degradation was noise, I've ran it again and the results were closer to the original run.

@habbes Curious, why does includeRawValues have to be an option? Why can't you just update the existing benchmark to include raw values?

The reason I made it an option is that it made the benchmarks slower (partly because we write more data as result of additional property and partly because we have to manually write a semicolon keep track of related flags), which makes it harder to compare to older benchmarks that did not include raw values. I also wanted to ensure that changes made to address raw values did not affect performance when not using raw values. My assumption is that raw values are an edge case and so people who do not use raw values should not incur a noticeable performance penalty because we added code that handles raw values. So, I wanted to be able to see the perf with and without raw values.

Another reason the raw value benchmarks are slower is because I'm doing the string interpolation inside the benchmark and not when the benchmark data is generated.

#2527 (comment)
"people who do not use raw values should not incur a noticeable performance penalty because we added code that handles raw values" - By this statement I assume you're only referring to performance penalty in respect to running the benchmarks, not penalty in using the actual library..?

Performance penalty with respect to the benchmarks. But for me to know that performance for scenarios without raw values was not affected after the change, I needed a way to run benchmarks without writing raw values.

Co-authored-by: John Gathogo <john.gathogo@gmail.com>

KenitoInc · 2022-11-30T08:42:39Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+        /// The Utf8JsonWriter internally keeps track of when to write the item separtor ','
+        /// between key-value pairs in an object and between items in an array
+        /// However, we bypass the Utf8JsonWriter in our implementation of WriteRawValue
+        /// and write directly to the destination stream. This means that we have to manually


You should update this comment based on the perf fix

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

KenitoInc · 2022-11-30T11:54:27Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+                return false;
+            }
+
+            // BitStack doesn't implement a Peek()


Is this more efficient than having a Peek() method?

It would probably be more efficient to have a specific Peek() but it would increase the complexity of the code, especially if the depth > 64 (this is rare in practice, but we'd still need to implement the logic to handle it). Since Peek() and Pop() are relatively cheap and are aggressively inlined, I don't think we'll save that much in implementing a Peek() method. That's why I felt like it wasn't worth the effort. That said, I haven't actually measured the cost savings of implementing a Peek() (because doing so would require me to implement Peek() in the first place).

src/Microsoft.OData.Core/Json/BitStack.cs

gathogojr · 2022-12-02T08:11:51Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+                this.shouldWriteSeparator = true;
+            }
+
+            if (this.isWritingFirstElementInArray || this.isWritingConsecutiveRawValuesAtStartOfArray)


Shouldn't this be just:

Suggested change

if (this.isWritingFirstElementInArray || this.isWritingConsecutiveRawValuesAtStartOfArray)

if (this.isWritingFirstElementInArray)

Because otherwise we're setting this.isWritingConsecutiveRawValuesAtStartOfArray to true when it already has a value of true...

gathogojr · 2022-12-02T08:14:51Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+                this.shouldWriteSeparator = true;
+            }
+
+            if (this.isWritingAtStartOfArray || this.isWritingConsecutiveRawValuesAtStartOfArray)


Suggested change

if (this.isWritingAtStartOfArray || this.isWritingConsecutiveRawValuesAtStartOfArray)

if (this.isWritingAtStartOfArray)

Otherwise we're setting this.isWritingConsecutiveRawValuesAtStartOfArray to true when it already has a value of true...

gathogojr · 2022-12-02T08:19:05Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+        /// This method should not be called by <see cref="WriteRawValue(string)"/>.
+        /// </summary>


Suggested change

/// This method should not be called by <see cref="WriteRawValue(string)"/>.

/// </summary>

/// </summary>

/// <remarks>

/// This method should not be called by <see cref="WriteRawValue(string)"/>.

/// </remarks>

gathogojr · 2022-12-02T08:19:56Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+        {
+            this.CommitUtf8JsonWriterContentsToBuffer();
+            Span<byte> buf = this.bufferWriter.GetSpan(1);
+            buf[0] = (byte)',';


Another place where you could use parantheses field?

gathogojr · 2022-12-02T08:21:19Z

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

+        /// </summary>
+        private void ExitScope()
+        {
+            this.isWritingAtStartOfArray = false;


Suggested change

this.isWritingAtStartOfArray = false;

this.isWritingAtStartOfArray = false;

this.isWritingConsecutiveRawValuesAtStartOfArray = false;

For our piece of mind...

Or maybe that could introduce a bug... Please double-check

gathogojr

Great work!

pull-request-quantifier-deprecated · 2022-12-03T07:45:59Z

This PR has 1116 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!

Quantification details

Label      : Extra Large
Size       : +988 -128
Percentile : 100%

Total files changed: 25

Change summary by file extension:
.yml : +2 -1
.cs : +978 -127
.txt : +0 -0
.md : +8 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

Fast and predictable releases to production:
- Optimal size changes are more likely to be reviewed faster with fewer
  iterations.
- Similarity in low PR complexity drives similar review times.
Review quality is likely higher as complexity is lower:
- Bugs are more likely to be detected.
- Code inconsistencies are more likely to be detected.
Knowledge sharing is improved within the participants:
- Small portions can be assimilated better.
Better engineering practices are exercised:
- Solving big problems by dividing them in well contained, smaller problems.
- Exercising separation of concerns within the code changes.

What can I do to optimize my changes

Use the PullRequestQuantifier to quantify your PR accurately
- Create a context profile for your repo using the context generator
- Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
- Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
- Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
Change your engineering behaviors
- For PRs that fall outside of the desired spectrum, review the details and check if:
  - Your PR could be split in smaller, self-contained PRs instead
  - Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

One line was added: +1 -0
One line was deleted: +0 -1
One line was modified: +1 -1 (git diff doesn't know about modified, it will
interpret that line like one addition plus one deletion)
Change percentiles: Change characteristics (addition, deletion, modification)
of this PR in relation to all other PRs within the repository.

Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

gathogojr

habbes added 3 commits October 13, 2022 11:24

Add tests to verify raw values

48535b6

Add BitStack implementation

9bcd90f

Fix ODataUtf8JsonWriter WriteRawValue bugs

ba37735

pull-request-quantifier-deprecated bot added the Extra Large label Oct 14, 2022

Fix raw value handling in async ODataUtf8JsonWriter

d1ae04c

habbes requested review from gathogojr, corranrogue9, ElizabethOkerio, KenitoInc, lisicase and mikepizzo October 17, 2022 15:11

Add raw values to benchmarks

2822b80

ElizabethOkerio reviewed Oct 24, 2022

View reviewed changes

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Outdated Show resolved Hide resolved

ElizabethOkerio reviewed Oct 24, 2022

View reviewed changes

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Show resolved Hide resolved

ElizabethOkerio reviewed Oct 24, 2022

View reviewed changes

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Show resolved Hide resolved

habbes added 5 commits November 23, 2022 08:59

Write to buffer writer directly instead of writing to stream

8d96c34

Transcode raw value directly to bufferwriter

7a7178f

Use PooledByteBufferWriter to reduce allocs

c1595eb

Flush if buffer written count approaches threshold

a7cc8cd

Refactor async writer to use buffer writer and update benchmarks

857c744

habbes requested a review from ElizabethOkerio November 23, 2022 16:25

habbes added 2 commits November 23, 2022 19:30

Remove obsolete PooledByteBufferWriter

360501f

Test buffer flushing logic

9817e0a

habbes added 2 commits November 24, 2022 07:54

Remove obsolete comments

45c84e3

Add flushing edge case tests

0d60b3b

gathogojr reviewed Nov 28, 2022

View reviewed changes

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Outdated Show resolved Hide resolved

gathogojr reviewed Nov 28, 2022

View reviewed changes

habbes and others added 6 commits November 29, 2022 12:09

Update src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs

e09398a

Co-authored-by: John Gathogo <john.gathogo@gmail.com>

Apply suggestions from code review

7cccade

Co-authored-by: John Gathogo <john.gathogo@gmail.com>

Apply suggestions from code review

65e98aa

Co-authored-by: John Gathogo <john.gathogo@gmail.com>

Update license file

c530efa

Refactor WriterRawValueCore and make minor fixes in tests

54a56d8

Inline frequently called helper methods

fe5427e

KenitoInc reviewed Nov 30, 2022

View reviewed changes

gathogojr reviewed Dec 2, 2022

View reviewed changes

src/Microsoft.OData.Core/Json/BitStack.cs Outdated Show resolved Hide resolved

habbes added 4 commits December 2, 2022 10:23

Address review comments

5bb5203

Rename writer to utf8JsonWriter

63443bf

Rename method for clarity

64a8bcc

Rename fields

85068cc

gathogojr reviewed Dec 2, 2022

View reviewed changes

Minor code fixes

13286b5

gathogojr reviewed Dec 2, 2022

View reviewed changes

gathogojr previously approved these changes Dec 2, 2022

View reviewed changes

Minor fixes

283f125

habbes dismissed gathogojr’s stale review via 283f125 December 3, 2022 07:45

habbes requested a review from gathogojr December 5, 2022 08:02

gathogojr approved these changes Dec 5, 2022

View reviewed changes

xuzhg approved these changes Dec 7, 2022

View reviewed changes

habbes merged commit a9760d6 into OData:master Dec 8, 2022

habbes mentioned this pull request Jan 24, 2024

Using ODataUtf8JsonWriter leads to increased memory usage and latency when writing properties with large string or byte[] values #2845

Closed

	if (this.isWritingFirstElementInArray \|\| this.isWritingConsecutiveRawValuesAtStartOfArray)
	if (this.isWritingFirstElementInArray)

	if (this.isWritingAtStartOfArray \|\| this.isWritingConsecutiveRawValuesAtStartOfArray)
	if (this.isWritingAtStartOfArray)

		/// This method should not be called by <see cref="WriteRawValue(string)"/>.
		/// </summary>

	this.isWritingAtStartOfArray = false;
	this.isWritingAtStartOfArray = false;
	this.isWritingConsecutiveRawValuesAtStartOfArray = false;

Fix bugs causing separator to be omitted after ODataUtf8JsonWriter.WriteRawValue #2527

Fix bugs causing separator to be omitted after ODataUtf8JsonWriter.WriteRawValue #2527

Conversation

habbes commented Oct 14, 2022 • edited Loading

Issues

Description

Checklist (Uncheck if it is not completed)

Additional work necessary

habbes commented Oct 24, 2022

habbes commented Oct 24, 2022

Choose a reason for hiding this comment

ElizabethOkerio Oct 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

habbes commented Nov 23, 2022 • edited Loading

habbes commented Nov 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gathogojr left a comment

Choose a reason for hiding this comment

pull-request-quantifier-deprecated bot commented Dec 3, 2022

What can I do to optimize my changes

How to interpret the change counts in git diff output

gathogojr left a comment

Choose a reason for hiding this comment

habbes commented Oct 14, 2022 •

edited

Loading

ElizabethOkerio Oct 24, 2022 •

edited

Loading

habbes commented Nov 23, 2022 •

edited

Loading

habbes commented Nov 23, 2022 •

edited

Loading