-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bugs causing separator to be omitted after ODataUtf8JsonWriter.WriteRawValue #2527
Fix bugs causing separator to be omitted after ODataUtf8JsonWriter.WriteRawValue #2527
Conversation
I've added to raw values to the benchmarks and noticed a considerable perf degradation. After I added raw values to the written payload, Benchmarks with PR changes, but before writing raw values: BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348 Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1
Benchmarks results when including raw values in payload: BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348 Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1
|
Based on some high-level profiler analysis, one of the possible explanations of the performance regression could be the fact that when we're writing raw values value, we have to flush the contents of the |
/// <remarks> | ||
/// This has been adapted from the .NET runtime's internal BitStack which is used by Utf8JsonWriter | ||
/// https://github.com/dotnet/runtime/blob/main/src/libraries/System.Text.Json/src/System/Text/Json/BitStack.cs | ||
/// This has been slightly modified because the original Pop() method did not return the last value pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. What does the original Pop() method return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the Utf8JsonWriter's WriteRawValue
method in .net 6.0, do you still need to do this modification to the Pop
method? That's if Utf8JsonWriter uses the BitStack internally to track when or how to use the separators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to return the value of the item before the most recent item, or something like that.
I made some changes to the PR to address the performance issues. Instead of a passing the Here are the new perf figures when the payload includes raw values BenchmarkDotNet=v0.13.1, OS=Windows 10.0.20348 Toolchain=InProcessEmitToolchain InvocationCount=1 UnrollFactor=1 Now
New benchmark figures without raw values
Baseline results (from master branch)
|
I've added tests for the buffer-flushing logic because there was a gap in the tests. |
this.CommitWriterContentsToBuffer(); | ||
this.writeStream.Write(this.bufferWriter.WrittenMemory.Span); | ||
this.bufferWriter.Clear(); | ||
this.writeStream.Flush(); | ||
} | ||
|
||
private void FlushIfBufferThresholdReached() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of how frequent FlushIfBufferThresholdReached
method is called, do you think adding MethodImpl(MethodImplOptions.AggressiveInlining)
attribute to this method could improve performance? Could you perhaps try and get some perf stats around such a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try it out. I usually leave that to the compiler to decide, but in the BCL there's a lot of explicit MethodIml(MethodImplOptions.AggressiveInlining)
. I'm also curious to see the result, and also whether the compiler is already inlining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the inlining attribute, but I can't tell from the benchmarks whether it leads to an improvement. The absolute mean time of ODataUtf8JsonWriter-Direct
benchmark has stayed the same (about 22ms). The ODataMessageWriter-Utf8JsonWriter
mean duration seems to have gone down to <254ms. But this happens sometimes, so it's not necessarily due to this change. The relative gap between that and the baseline ODataMessageWriter
has remain the same (about 6-7% difference). Though it went up as high as 9% difference in one of the runs.
/// bufferWriter. This should be called before writing directly | ||
/// to the bufferWriter. | ||
/// </summary> | ||
private void CommitWriterContentsToBuffer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of how frequent CommitWriterContentsToBuffer
method is called, do you think adding MethodImpl(MethodImplOptions.AggressiveInlining)
attribute to this method could improve performance? Could you perhaps try and get some perf stats around such a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try that out and share the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/FunctionalTests/Microsoft.OData.Core.Tests/Json/JsonWriterBaseTests.cs
Outdated
Show resolved
Hide resolved
test/FunctionalTests/Microsoft.OData.Core.Tests/Json/ODataUtf8JsonWriterAsyncTests.cs
Outdated
Show resolved
Hide resolved
test/FunctionalTests/Microsoft.OData.Core.Tests/Json/ODataUtf8JsonWriterAsyncTests.cs
Outdated
Show resolved
Hide resolved
test/FunctionalTests/Microsoft.OData.Core.Tests/Json/ODataUtf8JsonWriterAsyncTests.cs
Outdated
Show resolved
Hide resolved
@@ -92,14 +94,30 @@ public async Task WritePayloadAsync(IEnumerable<Customer> payload, Stream stream | |||
// start write homeAddress | |||
await writer.WriteStartAsync(homeAddressInfo); | |||
|
|||
var homeAddressResource = new ODataResource | |||
ODataResource homeAddressResource; | |||
if (includeRawValues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there was a specific reason why in test/PerformanceTests/SerializationComparisonsTests/Lib/DataModel.cs you added Misc
property in between City
and Street
but if you were to add it to the bottom, I believe you'd be able to avoid duplicate code in this file plus other files with similar duplicated code:
var homeAddressPropertes = new List<ODataProperty>
{
new ODataProperty { Name = "City", Value = customer.HomeAddress.City },
new ODataProperty { Name = "Street", Value = customer.HomeAddress.Street }
};
if (includeRawValues)
{
homeAddressProperties.Add(new ODataProperty { Name = "Misc", Value = new ODataUntypedValue() { RawValue = $"\"{customer.HomeAddress.Misc}\"" } })
}
var homeAddressResource = new ODataResource
{
Properties = homeAddressProperties
}
Even with Misc
sandwiched between City
and Street
, you should be able to do this:
var homeAddressPropertes = new List<ODataProperty>
{
new ODataProperty { Name = "City", Value = customer.HomeAddress.City }
};
if (includeRawValues)
{
homeAddressProperties.Add(new ODataProperty { Name = "Misc", Value = new ODataUntypedValue() { RawValue = $"\"{customer.HomeAddress.Misc}\"" } })
}
homeAddressProperties.Add(
new ODataProperty { Name = "Street", Value = customer.HomeAddress.Street });
var homeAddressResource = new ODataResource
{
Properties = homeAddressProperties
}
This comment applies to both sync and async sections of similar code in the following files:
- test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterPayloadWriter.cs
- test/PerformanceTests/SerializationComparisonsTests/Lib/ODataMessageWriterAsyncPayloadWriter.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. There's no reason to have it in between. Add the property first, then added the conditional statement later in order to be able to run tests without raw values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the reason I had the if statement outside of the loop was to avoid calling the if statement in each iteration since the value of includeRawValues
doesn't change inside the loop. I thought having the if statement inside the loop means the benchmark test would pay the penalty of the if statement even when the test run doesn't include raw values. Maybe this is something that the system can optimize on its own via branch prediction. I didn't compare the benchmarks between having the if statement in the loop and outside to validate whether it makes a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the foreach (address in customer.Addresses)
code to move the if statement inside the loop and noticed the performance slow down by 10s on the ODataMessageWriter
sync and async tests. However, this could be a fluke. But I've just realized your question was based on customer.HomeAddress
and not customer.Addresses
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the perf degradation was noise, I've ran it again and the results were closer to the original run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@habbes Curious, why does includeRawValues
have to be an option? Why can't you just update the existing benchmark to include raw values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I made it an option is that it made the benchmarks slower (partly because we write more data as result of additional property and partly because we have to manually write a semicolon keep track of related flags), which makes it harder to compare to older benchmarks that did not include raw values. I also wanted to ensure that changes made to address raw values did not affect performance when not using raw values. My assumption is that raw values are an edge case and so people who do not use raw values should not incur a noticeable performance penalty because we added code that handles raw values. So, I wanted to be able to see the perf with and without raw values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason the raw value benchmarks are slower is because I'm doing the string interpolation inside the benchmark and not when the benchmark data is generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2527 (comment)
"people who do not use raw values should not incur a noticeable performance penalty because we added code that handles raw values" - By this statement I assume you're only referring to performance penalty in respect to running the benchmarks, not penalty in using the actual library..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance penalty with respect to the benchmarks. But for me to know that performance for scenarios without raw values was not affected after the change, I needed a way to run benchmarks without writing raw values.
Co-authored-by: John Gathogo <john.gathogo@gmail.com>
Co-authored-by: John Gathogo <john.gathogo@gmail.com>
Co-authored-by: John Gathogo <john.gathogo@gmail.com>
/// The Utf8JsonWriter internally keeps track of when to write the item separtor ',' | ||
/// between key-value pairs in an object and between items in an array | ||
/// However, we bypass the Utf8JsonWriter in our implementation of WriteRawValue | ||
/// and write directly to the destination stream. This means that we have to manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update this comment based on the perf fix
return false; | ||
} | ||
|
||
// BitStack doesn't implement a Peek() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this more efficient than having a Peek()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be more efficient to have a specific Peek()
but it would increase the complexity of the code, especially if the depth > 64 (this is rare in practice, but we'd still need to implement the logic to handle it). Since Peek()
and Pop()
are relatively cheap and are aggressively inlined, I don't think we'll save that much in implementing a Peek()
method. That's why I felt like it wasn't worth the effort. That said, I haven't actually measured the cost savings of implementing a Peek()
(because doing so would require me to implement Peek()
in the first place).
this.shouldWriteSeparator = true; | ||
} | ||
|
||
if (this.isWritingFirstElementInArray || this.isWritingConsecutiveRawValuesAtStartOfArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be just:
if (this.isWritingFirstElementInArray || this.isWritingConsecutiveRawValuesAtStartOfArray) | |
if (this.isWritingFirstElementInArray) |
Because otherwise we're setting this.isWritingConsecutiveRawValuesAtStartOfArray
to true
when it already has a value of true
...
this.shouldWriteSeparator = true; | ||
} | ||
|
||
if (this.isWritingAtStartOfArray || this.isWritingConsecutiveRawValuesAtStartOfArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (this.isWritingAtStartOfArray || this.isWritingConsecutiveRawValuesAtStartOfArray) | |
if (this.isWritingAtStartOfArray) |
Otherwise we're setting this.isWritingConsecutiveRawValuesAtStartOfArray
to true
when it already has a value of true
...
/// This method should not be called by <see cref="WriteRawValue(string)"/>. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This method should not be called by <see cref="WriteRawValue(string)"/>. | |
/// </summary> | |
/// </summary> | |
/// <remarks> | |
/// This method should not be called by <see cref="WriteRawValue(string)"/>. | |
/// </remarks> |
{ | ||
this.CommitUtf8JsonWriterContentsToBuffer(); | ||
Span<byte> buf = this.bufferWriter.GetSpan(1); | ||
buf[0] = (byte)','; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where you could use parantheses
field?
/// </summary> | ||
private void ExitScope() | ||
{ | ||
this.isWritingAtStartOfArray = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.isWritingAtStartOfArray = false; | |
this.isWritingAtStartOfArray = false; | |
this.isWritingConsecutiveRawValuesAtStartOfArray = false; |
For our piece of mind...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe that could introduce a bug... Please double-check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues
*This pull request fixes #2525
Description
This PR fixes a bug in
ODataUtf8JsonWriter.WriteRawValue
that leads to invalid JSON being written. One common use case forWriteRawValue
is to write the value ofODataUntypedValue
objects, which could be anything.ODataUtf8JsonWriter.WriteRawValue
was implemented by writing the input directly to the output stream, bypassing the internalUtf8JsonWriter
. This is becauseUtf8JsonWriter.WriteRawValue
method does not exist in .NET Core 3.1. We intend to useUtf8JsonWriter
in ODL v8 when we drop support for older frameworks.Utf8JsonWriter
keeps track of whether or not it's writing the first item in an object or an array. It knows that it needs to place a separator (comma) between the end of one key-value pair and the beginning of another, it also knows that it needs to place separators between array elements.Here's an example of a normal flow:
ODataUtf8JsonWriter.WriteName("SomeKey")
-> callsUtf8JsonWriter.WritePropertyName("SomeKey")
ODataUtf8JsonWriter.WriteValue("SomeVal")
-> callsUtf8JsonWriter.WriteStringValue("SomeVal")
.Utf8JsonWriter
sets a flag signaling that the next property write should place a comma before the property nameODataUtf8JsonWriter.WriteName("AnotherKey")
-> callsUtf8JsonWriter.WritePropertyName("AnotherKey")
.Utf8JsonWriter
writes a comma before the property nameoutput excerpt:
"SomeKey":"SomeVal","AnotherKey":
When you bypass the
Utf8JsonWriter
and write directly to the output stream, theUtf8JsonWriter
is not aware of this write. So, theUtf8JsonWriter
will behave as if that raw value was never written. This may lead to missing or repeated separators, both of which lead to invalid JSON being written to the output.Here's an example of an invalid JSON from using the raw value:
ODataUtf8JsonWriter.WriteName("SomeKey")
-> callsUtf8JsonWriter.WritePropertyName("SomeKey")
ODataUtf8JsonWriter.WriteRawValue("\"SomeRawVal\"")
-> callswriteStream.Write("\"SomeRawVal\"")
.Utf8JsonWriter is not aware of this and does not set a flag to write a separator before the next property
.ODataUtf8JsonWriter.WriteName("AnotherKey") -> calls
Utf8JsonWriter.WritePropertyName("AnotherKey").output excerpt:
"SomeKey":"SomeVal""AnotherKey":
(comma missing between"SomeRawVal"
and"AnotherKey"
)Similarly,
Utf8JsonWriter
automatically places a separator before each element of an array except for the first element. However, when raw values are involved, the following bugs surface:Ut8fJsonWriter
is not aware of themUtf8JsonWriter
), not comma will be placed before this non-raw value becauseUtf8JsonWriter
mistakenly think it's the first item of the array.To fix these issues, this PR writes a separator manually in the following scenarios:
["raw1", "raw2", "raw3", "raw3", "non-raw value"]
(from the perspective ofUtf8JsonWriter
,"non-raw value"
is the first item in the array, and so it won't automatically place a comma before it).This PR achieves this by adding a couple of variables to keep track of when we've written raw values as well as a stack to keep track of whether we're in an array or object. The stack is based on
BitStack
implemented I "borrowed" from .NET runtime (which is used in theUtf8JsonWriter
implementation). I slightly adapted because it used some syntax not supported in the C# version used by ODL and also because itsPop()
method does not return the most recently pushed item on the stack. Not sure whether this is a bug or not. But I also pulled the tests and adjusted them accordingly. TheBitStack
is an efficient stack implementation optimized for stacks where there are only two types of values (represented as true and false). It stores the values in a sequence of integers, where a value in the stack is stored in a bit in one of the ints.The first iteration of this PR caused a performance regression when the calls to
ODataUtf8JsonWriter.WriteRawValue()
are made. This was mainly due to frequent flushing (I flushed the writer before writing raw values to the stream, to ensure correct order of writes). I fixed the perf issue by writing to the sameArrayBufferWriter<byte>
that is passed toUtf8JsonWriter
, this means there's no need to flush before the buffer is full. This improved performance. I also updated the benchmarks to include to optionally include raw values.Checklist (Uncheck if it is not completed)
Additional work necessary
If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.