Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs causing separator to be omitted after ODataUtf8JsonWriter.WriteRawValue #2527

Merged

Conversation

habbes
Copy link
Contributor

@habbes habbes commented Oct 14, 2022

Issues

*This pull request fixes #2525

Description

This PR fixes a bug in ODataUtf8JsonWriter.WriteRawValue that leads to invalid JSON being written. One common use case for WriteRawValue is to write the value of ODataUntypedValue objects, which could be anything.

ODataUtf8JsonWriter.WriteRawValue was implemented by writing the input directly to the output stream, bypassing the internal Utf8JsonWriter. This is because Utf8JsonWriter.WriteRawValue method does not exist in .NET Core 3.1. We intend to use Utf8JsonWriter in ODL v8 when we drop support for older frameworks.

Utf8JsonWriter keeps track of whether or not it's writing the first item in an object or an array. It knows that it needs to place a separator (comma) between the end of one key-value pair and the beginning of another, it also knows that it needs to place separators between array elements.

Here's an example of a normal flow:

  • ODataUtf8JsonWriter.WriteName("SomeKey") -> calls Utf8JsonWriter.WritePropertyName("SomeKey")
  • ODataUtf8JsonWriter.WriteValue("SomeVal") -> calls Utf8JsonWriter.WriteStringValue("SomeVal"). Utf8JsonWriter sets a flag signaling that the next property write should place a comma before the property name
  • ODataUtf8JsonWriter.WriteName("AnotherKey") -> calls Utf8JsonWriter.WritePropertyName("AnotherKey"). Utf8JsonWriter writes a comma before the property name
    output excerpt: "SomeKey":"SomeVal","AnotherKey":

When you bypass the Utf8JsonWriter and write directly to the output stream, the Utf8JsonWriter is not aware of this write. So, the Utf8JsonWriter will behave as if that raw value was never written. This may lead to missing or repeated separators, both of which lead to invalid JSON being written to the output.

Here's an example of an invalid JSON from using the raw value:

  • ODataUtf8JsonWriter.WriteName("SomeKey") -> calls Utf8JsonWriter.WritePropertyName("SomeKey")
  • ODataUtf8JsonWriter.WriteRawValue("\"SomeRawVal\"") -> calls writeStream.Write("\"SomeRawVal\""). Utf8JsonWriter is not aware of this and does not set a flag to write a separator before the next property.
  • ODataUtf8JsonWriter.WriteName("AnotherKey") -> calls Utf8JsonWriter.WritePropertyName("AnotherKey").
    output excerpt: "SomeKey":"SomeVal""AnotherKey": (comma missing between "SomeRawVal" and "AnotherKey")

Similarly, Utf8JsonWriter automatically places a separator before each element of an array except for the first element. However, when raw values are involved, the following bugs surface:

  • The separator is not placed before raw values, because Ut8fJsonWriter is not aware of them
  • When a raw value is the first item of an array, then followed by a non-raw value (written by Utf8JsonWriter), not comma will be placed before this non-raw value because Utf8JsonWriter mistakenly think it's the first item of the array.

To fix these issues, this PR writes a separator manually in the following scenarios:

  • Inside an object, before writing a property name that follows a raw value
  • Inside an array, before writing a raw value, unless the raw value is the first element in the array
  • Inside an array, before writing a value that follows a sequence of one or more consecutive raw values that start the array e.g. ["raw1", "raw2", "raw3", "raw3", "non-raw value"] (from the perspective of Utf8JsonWriter, "non-raw value" is the first item in the array, and so it won't automatically place a comma before it).

This PR achieves this by adding a couple of variables to keep track of when we've written raw values as well as a stack to keep track of whether we're in an array or object. The stack is based on BitStack implemented I "borrowed" from .NET runtime (which is used in the Utf8JsonWriter implementation). I slightly adapted because it used some syntax not supported in the C# version used by ODL and also because its Pop() method does not return the most recently pushed item on the stack. Not sure whether this is a bug or not. But I also pulled the tests and adjusted them accordingly. The BitStack is an efficient stack implementation optimized for stacks where there are only two types of values (represented as true and false). It stores the values in a sequence of integers, where a value in the stack is stored in a bit in one of the ints.

The first iteration of this PR caused a performance regression when the calls to ODataUtf8JsonWriter.WriteRawValue() are made. This was mainly due to frequent flushing (I flushed the writer before writing raw values to the stream, to ensure correct order of writes). I fixed the perf issue by writing to the same ArrayBufferWriter<byte> that is passed to Utf8JsonWriter, this means there's no need to flush before the buffer is full. This improved performance. I also updated the benchmarks to include to optionally include raw values.

Checklist (Uncheck if it is not completed)

  • Test cases added
  • Build and test with one-click build and test script passed

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.

@habbes
Copy link
Contributor Author

habbes commented Oct 24, 2022

I've added to raw values to the benchmarks and noticed a considerable perf degradation. After I added raw values to the written payload, ODataMessageWriter-ODataUtf8JsonWriter seemed to get slower than ODataMessageWriter. Memory allocations have also increased. The memory increased memory allocation in NoOpJsonWriter-Direct (from 18KB to 5K) is very baffling. I do expect things to be slighly slower and allocate slightly more memory cause we're writing more data, but not by the observed extent. And I'd still expected ODataUtf8JsonWriter-based writer to be faster than their JsonWriter` counterparts.

Benchmarks with PR changes, but before writing raw values:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348
Intel Xeon E-2336 CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK= 5.0.404 [C:\Program Files\dotnet\sdk]
[Host] : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT

Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1

Method WriterName Mean Error StdDev Median Gen 0 Gen 1 Allocated
WriteToFileAsync JsonSerializer 33.444 ms 0.6622 ms 0.8132 ms 32.950 ms - - 12 KB
WriteToFileAsync NoOpWriter-Direct 1.317 ms 0.0133 ms 0.0118 ms 1.313 ms - - 8 KB
WriteToFileAsync ODataJsonWriter-Direct 35.745 ms 0.3827 ms 0.3579 ms 35.850 ms 1000.0000 - 6,675 KB
WriteToFileAsync ODataJsonWriter-Direct-Async 179.549 ms 0.5134 ms 0.4551 ms 179.462 ms 8000.0000 - 48,550 KB
WriteToFileAsync ODataMessageWriter 280.770 ms 0.7588 ms 0.7098 ms 280.512 ms 36000.0000 - 224,024 KB
WriteToFileAsync ODataMessageWriter-Async 834.186 ms 2.3173 ms 2.0542 ms 833.958 ms 54000.0000 1000.0000 334,997 KB
WriteToFileAsync ODataMessageWriter-NoOp 226.985 ms 1.1005 ms 1.0294 ms 227.011 ms 35000.0000 - 217,365 KB
WriteToFileAsync ODataMessageWriter-NoOp-Async 335.748 ms 0.7169 ms 0.6705 ms 335.673 ms 35000.0000 - 217,368 KB
WriteToFileAsync ODataMessageWriter-NoValidation 247.049 ms 0.8430 ms 0.7885 ms 246.715 ms 32000.0000 - 200,583 KB
WriteToFileAsync ODataMessageWriter-NoValidation-Async 794.218 ms 1.2409 ms 1.1607 ms 794.490 ms 51000.0000 1000.0000 311,358 KB
WriteToFileAsync ODataMessageWriter-Utf16 292.286 ms 0.6426 ms 0.6011 ms 292.312 ms 36000.0000 - 224,018 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter 268.767 ms 0.5205 ms 0.4614 ms 268.758 ms 35000.0000 1000.0000 217,599 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-Async 438.272 ms 0.4929 ms 0.4611 ms 438.149 ms 35000.0000 1000.0000 219,011 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-NoValidation 232.884 ms 0.3154 ms 0.2950 ms 232.763 ms 31000.0000 1000.0000 194,160 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-NoValidation-Async 392.491 ms 0.5989 ms 0.5309 ms 392.433 ms 31000.0000 1000.0000 195,572 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-Utf16 272.817 ms 0.5228 ms 0.4890 ms 272.629 ms 35000.0000 1000.0000 217,600 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct 21.425 ms 0.0402 ms 0.0376 ms 21.438 ms - - 241 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct-Async 43.708 ms 0.3825 ms 0.3391 ms 43.687 ms - - 520 KB
WriteToFileAsync Utf8JsonWriter-Direct-ArrayPool-NoValidation 18.465 ms 0.0574 ms 0.0537 ms 18.482 ms - - 29 KB

Benchmarks results when including raw values in payload:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348
Intel Xeon E-2336 CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK= 5.0.404 [C:\Program Files\dotnet\sdk]
[Host] : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT

Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1

Method WriterName Mean Error StdDev Gen 0 Gen 1 Allocated
WriteToFileAsync JsonSerializer 39.725 ms 0.2850 ms 0.2526 ms - - 16 KB
WriteToFileAsync NoOpWriter-Direct 3.808 ms 0.0745 ms 0.0697 ms - - 5,003 KB
WriteToFileAsync ODataJsonWriter-Direct 43.317 ms 0.8210 ms 0.7680 ms 1000.0000 - 11,675 KB
WriteToFileAsync ODataJsonWriter-Direct-Async 205.117 ms 0.5807 ms 0.4849 ms 8000.0000 - 53,798 KB
WriteToFileAsync ODataMessageWriter 298.201 ms 0.3155 ms 0.2797 ms 39000.0000 - 239,573 KB
WriteToFileAsync ODataMessageWriter-Async 866.389 ms 1.6864 ms 1.4082 ms 57000.0000 1000.0000 350,650 KB
WriteToFileAsync ODataMessageWriter-NoOp 231.295 ms 0.4075 ms 0.3812 ms 38000.0000 - 232,914 KB
WriteToFileAsync ODataMessageWriter-NoOp-Async 354.659 ms 0.9602 ms 0.8982 ms 38000.0000 - 232,917 KB
WriteToFileAsync ODataMessageWriter-NoValidation 264.587 ms 0.6677 ms 0.6246 ms 35000.0000 - 216,132 KB
WriteToFileAsync ODataMessageWriter-NoValidation-Async 812.586 ms 1.9369 ms 1.8118 ms 53000.0000 1000.0000 327,209 KB
WriteToFileAsync ODataMessageWriter-Utf16 310.894 ms 0.5446 ms 0.4828 ms 39000.0000 - 239,567 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter 422.922 ms 0.8506 ms 0.7957 ms 38000.0000 - 236,455 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-Async 1,295.575 ms 2.9140 ms 2.7258 ms 62000.0000 1000.0000 378,681 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-NoValidation 383.425 ms 0.8009 ms 0.7492 ms 34000.0000 - 213,016 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-NoValidation-Async 1,202.503 ms 5.7807 ms 5.4073 ms 58000.0000 1000.0000 355,241 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-Utf16 470.336 ms 0.5858 ms 0.4892 ms 38000.0000 - 238,800 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct 117.280 ms 0.4300 ms 0.3812 ms 1000.0000 - 8,547 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct-Async 353.166 ms 3.5165 ms 3.2893 ms 8000.0000 - 51,325 KB
WriteToFileAsync Utf8JsonWriter-Direct-ArrayPool-NoValidation 32.796 ms 0.0554 ms 0.0491 ms - - 5,034 KB

@habbes
Copy link
Contributor Author

habbes commented Oct 24, 2022

Based on some high-level profiler analysis, one of the possible explanations of the performance regression could be the fact that when we're writing raw values value, we have to flush the contents of the Utf8JsonWriter buffer to the stream to ensure correct order of writes. However, this doesn't sound like a good explanation for the memory regression on NoOpWriter-Direct

/// <remarks>
/// This has been adapted from the .NET runtime's internal BitStack which is used by Utf8JsonWriter
/// https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Json/src/System/Text/Json/BitStack.cs
/// This has been slightly modified because the original Pop() method did not return the last value pushed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. What does the original Pop() method return?

Copy link
Contributor

@ElizabethOkerio ElizabethOkerio Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the Utf8JsonWriter's WriteRawValue method in .net 6.0, do you still need to do this modification to the Pop method? That's if Utf8JsonWriter uses the BitStack internally to track when or how to use the separators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to return the value of the item before the most recent item, or something like that.

@habbes
Copy link
Contributor Author

habbes commented Nov 23, 2022

I made some changes to the PR to address the performance issues. Instead of a passing the writeStream to Utf8JsonWriter, I pass an ArrayBufferWriter<byte> buffer. Now when we need to write raw values directly, we write them to the buffer writer, this means we don't need to flush to ensure correct order of writes. We only write to the stream when we flush.

Here are the new perf figures when the payload includes raw values

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348
Intel Xeon E-2336 CPU 2.90GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.404 [C:\Program Files\dotnet\sdk]
[Host] : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT

Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1

Now ODataUtf8JsonWriter is more efficient than the default JsonWriter even when raw values are involved. These benchmarks report more memory allocations than the baseline because it performs string interpolation to add quotes to the "raw" string values, this results in additional string allocations.

Method WriterName Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
WriteWithRawValues NoOpWriter-Direct 3.918 ms 0.0755 ms 0.0776 ms - - - 5,003 KB
WriteWithRawValues ODataJsonWriter-Direct 44.062 ms 0.8742 ms 0.8978 ms 1000.0000 - - 11,675 KB
WriteWithRawValues ODataJsonWriter-Direct-Async 206.520 ms 1.0503 ms 0.9825 ms 8000.0000 - - 53,797 KB
WriteWithRawValues ODataMessageWriter 299.875 ms 0.6858 ms 0.6415 ms 39000.0000 - - 239,573 KB
WriteWithRawValues ODataMessageWriter-Async 894.690 ms 1.8934 ms 1.6784 ms 57000.0000 1000.0000 - 350,650 KB
WriteWithRawValues ODataMessageWriter-Utf8JsonWriter 288.920 ms 0.6315 ms 0.5907 ms 38000.0000 1000.0000 - 236,511 KB
WriteWithRawValues ODataMessageWriter-Utf8JsonWriter-Async 487.874 ms 0.4348 ms 0.4067 ms 40000.0000 3000.0000 2000.0000 256,846 KB
WriteWithRawValues ODataUtf8JsonWriter-Direct 32.389 ms 0.0979 ms 0.0916 ms 1000.0000 - - 8,604 KB
WriteWithRawValues ODataUtf8JsonWriter-Direct-Async 60.699 ms 0.2026 ms 0.1896 ms 3000.0000 2000.0000 2000.0000 28,935 KB

New benchmark figures without raw values

ODataUtf8JsonWriter still performing better than JsonWriter. There's also been a slight performance improvement in ODataUtf8JsonWriter's async API, and slight memory improvement in ODataJsonWriter (sync and async).

Method WriterName Mean Error StdDev Gen 0 Gen 1 Allocated
WriteToFileAsync NoOpWriter-Direct 1.383 ms 0.0237 ms 0.0210 ms - - 4 KB
WriteToFileAsync ODataJsonWriter-Direct 35.790 ms 0.1317 ms 0.1100 ms 1000.0000 - 6,675 KB
WriteToFileAsync ODataJsonWriter-Direct-Async 178.943 ms 0.6467 ms 0.5733 ms 8000.0000 - 48,537 KB
WriteToFileAsync ODataMessageWriter 271.745 ms 0.3669 ms 0.3432 ms 36000.0000 - 224,024 KB
WriteToFileAsync ODataMessageWriter-Async 811.026 ms 2.0030 ms 1.7756 ms 54000.0000 1000.0000 335,004 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter 260.598 ms 0.4387 ms 0.4104 ms 35000.0000 1000.0000 217,443 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct 21.663 ms 0.0379 ms 0.0317 ms - - 84 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct-Async 43.895 ms 0.2531 ms 0.2114 ms - - 294 KB

Baseline results (from master branch)

Method WriterName Mean Error StdDev Gen 0 Gen 1 Allocated
WriteToFileAsync NoOpWriter-Direct 1.277 ms 0.0071 ms 0.0063 ms - - 3 KB
WriteToFileAsync ODataJsonWriter-Direct 34.563 ms 0.1445 ms 0.1281 ms 1000.0000 - 6,676 KB
WriteToFileAsync ODataJsonWriter-Direct-Async 180.394 ms 0.4863 ms 0.4061 ms 8000.0000 - 48,538 KB
WriteToFileAsync ODataMessageWriter 282.512 ms 0.4670 ms 0.4368 ms 36000.0000 - 224,024 KB
WriteToFileAsync ODataMessageWriter-Async 839.509 ms 1.8395 ms 1.6306 ms 54000.0000 1000.0000 334,978 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter 261.859 ms 0.3300 ms 0.2925 ms 35000.0000 1000.0000 217,599 KB
WriteToFileAsync ODataMessageWriter-Utf8JsonWriter-Async 447.947 ms 0.4497 ms 0.4206 ms 35000.0000 1000.0000 219,011 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct 21.609 ms 0.0564 ms 0.0471 ms - - 241 KB
WriteToFileAsync ODataUtf8JsonWriter-Direct-Async 44.011 ms 0.4400 ms 0.4115 ms - - 520 KB

@habbes
Copy link
Contributor Author

habbes commented Nov 23, 2022

I've added tests for the buffer-flushing logic because there was a gap in the tests.

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Outdated Show resolved Hide resolved
this.CommitWriterContentsToBuffer();
this.writeStream.Write(this.bufferWriter.WrittenMemory.Span);
this.bufferWriter.Clear();
this.writeStream.Flush();
}

private void FlushIfBufferThresholdReached()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of how frequent FlushIfBufferThresholdReached method is called, do you think adding MethodImpl(MethodImplOptions.AggressiveInlining) attribute to this method could improve performance? Could you perhaps try and get some perf stats around such a change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try it out. I usually leave that to the compiler to decide, but in the BCL there's a lot of explicit MethodIml(MethodImplOptions.AggressiveInlining). I'm also curious to see the result, and also whether the compiler is already inlining this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the inlining attribute, but I can't tell from the benchmarks whether it leads to an improvement. The absolute mean time of ODataUtf8JsonWriter-Direct benchmark has stayed the same (about 22ms). The ODataMessageWriter-Utf8JsonWriter mean duration seems to have gone down to <254ms. But this happens sometimes, so it's not necessarily due to this change. The relative gap between that and the baseline ODataMessageWriter has remain the same (about 6-7% difference). Though it went up as high as 9% difference in one of the runs.

/// bufferWriter. This should be called before writing directly
/// to the bufferWriter.
/// </summary>
private void CommitWriterContentsToBuffer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of how frequent CommitWriterContentsToBuffer method is called, do you think adding MethodImpl(MethodImplOptions.AggressiveInlining) attribute to this method could improve performance? Could you perhaps try and get some perf stats around such a change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try that out and share the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Outdated Show resolved Hide resolved
src/Microsoft.OData.Core/Json/ODataUtf8JsonWriter.cs Outdated Show resolved Hide resolved
@@ -92,14 +94,30 @@ public async Task WritePayloadAsync(IEnumerable<Customer> payload, Stream stream
// start write homeAddress
await writer.WriteStartAsync(homeAddressInfo);

var homeAddressResource = new ODataResource
ODataResource homeAddressResource;
if (includeRawValues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there was a specific reason why in test/PerformanceTests/SerializationComparisonsTests/Lib/DataModel.cs you added Misc property in between City and Street but if you were to add it to the bottom, I believe you'd be able to avoid duplicate code in this file plus other files with similar duplicated code:

var homeAddressPropertes = new List<ODataProperty>
{
    new ODataProperty { Name = "City", Value = customer.HomeAddress.City },
    new ODataProperty { Name = "Street", Value = customer.HomeAddress.Street }
};

if (includeRawValues)
{
    homeAddressProperties.Add(new ODataProperty { Name = "Misc", Value = new ODataUntypedValue() { RawValue = $"\"{customer.HomeAddress.Misc}\"" } })
}

var homeAddressResource = new ODataResource
{
    Properties = homeAddressProperties
}

Even with Misc sandwiched between City and Street, you should be able to do this:

var homeAddressPropertes = new List<ODataProperty>
{
    new ODataProperty { Name = "City", Value = customer.HomeAddress.City }
};

if (includeRawValues)
{
    homeAddressProperties.Add(new ODataProperty { Name = "Misc", Value = new ODataUntypedValue() { RawValue = $"\"{customer.HomeAddress.Misc}\"" } })
}

homeAddressProperties.Add(
    new ODataProperty { Name = "Street", Value = customer.HomeAddress.Street });

var homeAddressResource = new ODataResource
{
    Properties = homeAddressProperties
}

This comment applies to both sync and async sections of similar code in the following files:

  • test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterPayloadWriter.cs
  • test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterAsyncPayloadWriter.cs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. There's no reason to have it in between. Add the property first, then added the conditional statement later in order to be able to run tests without raw values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the reason I had the if statement outside of the loop was to avoid calling the if statement in each iteration since the value of includeRawValues doesn't change inside the loop. I thought having the if statement inside the loop means the benchmark test would pay the penalty of the if statement even when the test run doesn't include raw values. Maybe this is something that the system can optimize on its own via branch prediction. I didn't compare the benchmarks between having the if statement in the loop and outside to validate whether it makes a difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the foreach (address in customer.Addresses) code to move the if statement inside the loop and noticed the performance slow down by 10s on the ODataMessageWriter sync and async tests. However, this could be a fluke. But I've just realized your question was based on customer.HomeAddress and not customer.Addresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the perf degradation was noise, I've ran it again and the results were closer to the original run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@habbes Curious, why does includeRawValues have to be an option? Why can't you just update the existing benchmark to include raw values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made it an option is that it made the benchmarks slower (partly because we write more data as result of additional property and partly because we have to manually write a semicolon keep track of related flags), which makes it harder to compare to older benchmarks that did not include raw values. I also wanted to ensure that changes made to address raw values did not affect performance when not using raw values. My assumption is that raw values are an edge case and so people who do not use raw values should not incur a noticeable performance penalty because we added code that handles raw values. So, I wanted to be able to see the perf with and without raw values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason the raw value benchmarks are slower is because I'm doing the string interpolation inside the benchmark and not when the benchmark data is generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2527 (comment)
"people who do not use raw values should not incur a noticeable performance penalty because we added code that handles raw values" - By this statement I assume you're only referring to performance penalty in respect to running the benchmarks, not penalty in using the actual library..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance penalty with respect to the benchmarks. But for me to know that performance for scenarios without raw values was not affected after the change, I needed a way to run benchmarks without writing raw values.

/// The Utf8JsonWriter internally keeps track of when to write the item separtor ','
/// between key-value pairs in an object and between items in an array
/// However, we bypass the Utf8JsonWriter in our implementation of WriteRawValue
/// and write directly to the destination stream. This means that we have to manually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update this comment based on the perf fix

return false;
}

// BitStack doesn't implement a Peek()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more efficient than having a Peek() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be more efficient to have a specific Peek() but it would increase the complexity of the code, especially if the depth > 64 (this is rare in practice, but we'd still need to implement the logic to handle it). Since Peek() and Pop() are relatively cheap and are aggressively inlined, I don't think we'll save that much in implementing a Peek() method. That's why I felt like it wasn't worth the effort. That said, I haven't actually measured the cost savings of implementing a Peek() (because doing so would require me to implement Peek() in the first place).

this.shouldWriteSeparator = true;
}

if (this.isWritingFirstElementInArray || this.isWritingConsecutiveRawValuesAtStartOfArray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be just:

Suggested change
if (this.isWritingFirstElementInArray || this.isWritingConsecutiveRawValuesAtStartOfArray)
if (this.isWritingFirstElementInArray)

Because otherwise we're setting this.isWritingConsecutiveRawValuesAtStartOfArray to true when it already has a value of true...

this.shouldWriteSeparator = true;
}

if (this.isWritingAtStartOfArray || this.isWritingConsecutiveRawValuesAtStartOfArray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (this.isWritingAtStartOfArray || this.isWritingConsecutiveRawValuesAtStartOfArray)
if (this.isWritingAtStartOfArray)

Otherwise we're setting this.isWritingConsecutiveRawValuesAtStartOfArray to true when it already has a value of true...

Comment on lines +417 to +418
/// This method should not be called by <see cref="WriteRawValue(string)"/>.
/// </summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This method should not be called by <see cref="WriteRawValue(string)"/>.
/// </summary>
/// </summary>
/// <remarks>
/// This method should not be called by <see cref="WriteRawValue(string)"/>.
/// </remarks>

{
this.CommitUtf8JsonWriterContentsToBuffer();
Span<byte> buf = this.bufferWriter.GetSpan(1);
buf[0] = (byte)',';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where you could use parantheses field?

/// </summary>
private void ExitScope()
{
this.isWritingAtStartOfArray = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.isWritingAtStartOfArray = false;
this.isWritingAtStartOfArray = false;
this.isWritingConsecutiveRawValuesAtStartOfArray = false;

For our piece of mind...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe that could introduce a bug... Please double-check

gathogojr
gathogojr previously approved these changes Dec 2, 2022
Copy link
Contributor

@gathogojr gathogojr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:
Great work!

@pull-request-quantifier-deprecated

This PR has 1116 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Large
Size       : +988 -128
Percentile : 100%

Total files changed: 25

Change summary by file extension:
.yml : +2 -1
.cs : +978 -127
.txt : +0 -0
.md : +8 -0

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@habbes habbes requested a review from gathogojr December 5, 2022 08:02
Copy link
Contributor

@gathogojr gathogojr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ODataUtf8JsonWriter does not emit item separator after untyped values
5 participants