Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Major performance regression serializing and deserializing json #50260

Closed
lewing opened this issue Mar 25, 2021 · 45 comments
Closed

[wasm] Major performance regression serializing and deserializing json #50260

lewing opened this issue Mar 25, 2021 · 45 comments
Assignees
Labels
arch-wasm WebAssembly architecture area-System.Text.Encoding tenet-performance Performance related issue
Milestone

Comments

@lewing
Copy link
Member

lewing commented Mar 25, 2021

From 0f64b26...5b77db9 there was a large regression in the aspnetcore blazor json tests. That is a large span of commits because aspnetcore wasn't able to ingest the runtime for an extended period.

image

cc @marek-safar @danmoseley

@lewing lewing added the tenet-performance Performance related issue label Mar 25, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Mar 25, 2021
@ghost
Copy link

ghost commented Mar 25, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

From 0f64b26...5b77db9 there was a large regression in the aspnetcore blazor json tests. That is a large span of commits because aspnetcore wasn't able to ingest the runtime for an extended period.

image

cc @marek-safar @danmoseley

Author: lewing
Assignees: -
Labels:

area-System.Text.Json, tenet-performance, untriaged

Milestone: -

@lewing lewing added the arch-wasm WebAssembly architecture label Mar 25, 2021
@ghost
Copy link

ghost commented Mar 25, 2021

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

From 0f64b26...5b77db9 there was a large regression in the aspnetcore blazor json tests. That is a large span of commits because aspnetcore wasn't able to ingest the runtime for an extended period.

image

cc @marek-safar @danmoseley

Author: lewing
Assignees: -
Labels:

arch-wasm, area-System.Text.Json, tenet-performance, untriaged

Milestone: -

@lewing
Copy link
Member Author

lewing commented Mar 25, 2021

There seem to be several changes around text processing in that set. Any thoughts @GrabYourPitchforks

@lewing lewing changed the title [wasm] major performance regression serializing and deserializing json [wasm] Major performance regression serializing and deserializing json Mar 25, 2021
@lewing lewing added this to the 6.0.0 milestone Mar 25, 2021
@GrabYourPitchforks
Copy link
Member

@lewing I'm not quite sure what tests you're referring to, but check out the below benchmark file.

https://github.com/GrabYourPitchforks/ConsoleApplicationBenchmark/blob/db476e4c64449ade3b9077c755d5aa513d8b7bb5/ConsoleAppBenchmark/TextEncoderRunner.cs

This benchmark exercises specifically the System.Text.Encodings.Web package, separate from System.Text.Json. If you see regressions in this benchmark file, that would be strong evidence that the PR #49373 really is responsible for this.

Is it possible to run these benchmarks against the wasm runtime to validate this theory? (You can make the benchmarks run a bit faster by removing the "HTML" and "URL" benchmarks on line 22 of that file.)

The logic inside the S.T.E.W project - even for the non-intrinsicified paths - is tuned so that it emits optimized codegen. It's possible that these optimizations benefit the typical codegen scenario but harm wasm. If necessary, we could investigate creating special -wasm.cs versions of these files which contain code patterns better suited for wasm scenarios. Let me know what you think of that idea.

@GrabYourPitchforks
Copy link
Member

I should also mention: System.Text.Encodings.Web is not involved with deserialization, only with serialization. If you're seeing regressions in deserialization-only scenarios, that would require pulling in the JSON crew.

@lewing
Copy link
Member Author

lewing commented Mar 26, 2021

The regression appears to be symmetric so that is a good indication that it is not your change. Someone from @dotnet/aspnet-blazor-eng can probably point to the specific code under test better than I can.

@steveharter
Copy link
Member

steveharter commented Mar 29, 2021

@lewing do you know what was the smaller regression around March 21st that went away? If it was a fluke, I wonder if the current regression could be as well.

@pranavkm
Copy link
Contributor

@steveharter we ran the benchmarks a few times and the regression was consistent. Seems less likely it's transient.

@lewing
Copy link
Member Author

lewing commented Mar 29, 2021

Yeah, that was one of my first questions too, so we verified. On top of that several more normal benchmark runs have completed and the regression has persisted

@steveharter
Copy link
Member

steveharter commented Mar 29, 2021

Yeah, that was one of my first questions too, so we verified. On top of that several more normal benchmark runs have completed and the regression has persisted

OK.

@GrabYourPitchforks

I should also mention: System.Text.Encodings.Web is not involved with deserialization, only with serialization. If you're seeing regressions in deserialization-only scenarios, that would require pulling in the JSON crew.

The property caching mechanism in the serializer attempts to pre-generate some JSON which does call JavaScriptEncoder.FindFirstCharacterToEncodeUtf8() which previously was heavily optimized for SSE2 and had some short-circuits for smaller lengths. This is done even when deserializing (one can argue it should be deferred, but that's a separate topic).

So it is possible #49373 is responsible although since the results are cached by the serializer, it would seem unlikely unless:

  • The benchmarks measure start-up time (first deserialize)
  • The new SSSE3 startup cost is super high.
  • Blazor interpreter intrinsics were changed; perhaps supported SSE2 but not SSSE3?

@GrabYourPitchforks
Copy link
Member

@lewing can we get a pointer to the benchmark code? That would clear up any confusion and also allow us to determine whether it's following System.Text.Json documented best practice.

@GrabYourPitchforks
Copy link
Member

Also re-upping the ping to @dotnet/aspnet-blazor-eng so that we can get a pointer to the code.

@captainsafia
Copy link
Member

Some quick benchmarking on this reveals that the biggest regressions seem to be in the deserialization. No change in serialization.

Deserialization

iteration 6.0.100-preview.2.21158.2 6.0.100-preview.3.21202.5
Deserialize 0 35.755 milliseconds 43.25 milliseconds
Deserialize 1 1 milliseconds 3.25 milliseconds
Deserialize 2 0.8749 milliseconds 3.375 milliseconds
Deserialize 3 0.9599 milliseconds 3.04 milliseconds
Deserialize 4 0.83 milliseconds 3 milliseconds
Deserialize 5 0.92 milliseconds 3 milliseconds
Deserialize 6 0.88 milliseconds 3.165 milliseconds
Deserialize 7 0.79 milliseconds 3.04 milliseconds
Deserialize 8 0.835 milliseconds 3 milliseconds
Deserialize 9 0.915 milliseconds 3 milliseconds
Deserialize 10 0.79 milliseconds 3.085 milliseconds

Serialization

iteration 6.0.100-preview.2.21158.2 6.0.100-preview.3.21202.5
Serialize 0 11.455 milliseconds 11.8299 milliseconds
Serialize 1 1.71 milliseconds 2.085 milliseconds
Serialize 2 1.545 milliseconds 1.875 milliseconds
Serialize 3 1.545 milliseconds 1.875 milliseconds
Serialize 4 1.7499 milliseconds 2.085 milliseconds
Serialize 5 1.795 milliseconds 1.96 milliseconds
Serialize 6 1.875 milliseconds 2.04 milliseconds
Serialize 7 1.755 milliseconds 1.9199 milliseconds
Serialize 8 1.585 milliseconds 2.125 milliseconds
Serialize 9 1.625 milliseconds 1.8749 milliseconds
Serialize 10 1.545 milliseconds 1.8299 milliseconds

Test repo is https://github.com/captainsafia/TestJsonRegression. The benchmarking code was implemented in Program.Main of a WASM app to reduce the impact of any external factors in Blazor.

@GrabYourPitchforks
Copy link
Member

Thanks @captainsafia for these numbers. I'll leave this pathed in System.Text.Json for now.

@vargaz
Copy link
Contributor

vargaz commented Apr 8, 2021

This is caused by
197a28d

@danmoseley
Copy link
Member

#49372

@GrabYourPitchforks
Copy link
Member

@pranavkm @captainsafia @lewing If I provided a patch file for dotnet/runtime with a proposed workaround, would you be able to apply the patch, rebuild the wasm-specific System.Private.CoreLib.dll, drop it into your benchmark app, and see if it resolves your perf issue? Keep in mind you'll also want to compare the before/after of the size on disk of a trimmed Blazor app.

@vargaz
Copy link
Contributor

vargaz commented Apr 9, 2021

I tested with a console app and reverting that commit does fix the issue, at least for deserialization.

@GrabYourPitchforks
Copy link
Member

Try cherry-picking commit GrabYourPitchforks@d7eae3b (patch file here) and see if it resolves the issue. This re-introduces the fast path in the SizeOpt files before entering the main workhorse loop. I'm hoping it won't increase size on disk significantly, but I don't have a good way of testing that right now.

@lewing
Copy link
Member Author

lewing commented Apr 14, 2021

@radekdoulik can you follow up on #50260 (comment) and get us some numbers?

@radekdoulik
Copy link
Member

radekdoulik commented Apr 15, 2021

I measured current HEAD and patched version of modified browser sample in Chrome. Later I added Preview 2 (no AOT as it was crashing) and reverted 197a28d measurements.

HEAD HEAD/AOT patched patched/AOT Preview 2 Reverted Reverted/AOT
serialize small 0.82ms 0.537ms 0.257ms 0.198ms 0.276ms 0.266ms 0.192ms
serialize large 282.4ms 170.5ms 74.3ms 51ms 70ms 76.8ms 49.7ms
deserialize small 1.09ms 0.568ms 0.428ms 0.265ms 0.454ms 0.424ms 0.245ms
deserialize large 337.2ms 199.1ms 115.7ms 67.6ms 121.9ms 112.5ms 61.3ms
dotnet.wasm size 8831403 8833528 8839630

It uses the Blazor benchmark code from Json.razor and Person.cs. The small serialization is measured over batch of 1000 calls and large over 10 calls. I ran it multiple times and used fastest measurement.

P2 was built on Mac and then run on the same machine/Chrome as the other sample builds.

Observation: the patched version is close to reverted version and still reduces the size of the dotnet.wasm file. Both are close to P2 times.

@radekdoulik
Copy link
Member

radekdoulik commented Apr 15, 2021

I have created draft PR #51310 to check the GrabYourPitchforks@d7eae3b change on CI.

@radekdoulik
Copy link
Member

@pavelsavara how does the GrabYourPitchforks@d7eae3b and the times look to you?

@jkotas
Copy link
Member

jkotas commented Apr 15, 2021

@pavelsavara Is the performance for payloads that contain non-ASCII characters (e.g. Czech strings with diacritics) important for you? (#51310 (comment))

@pavelsavara
Copy link
Member

pavelsavara commented Apr 15, 2021

@jkotas I'm newbie on the Mono team, so I don't really know. I did the original PR as part of my learning, based on GrabYourPitchforks guidance and motivated by size reduction. I don't have specific performance goal in mind.
My opinion is that 6Kb smaller uncompressed download is worth slower perf of non-ascii on wasm.

Also I think @marek-safar commented , that we could make it into (link time ?) feature flag, which could be exposed to developers, to make the choice.
https://github.com/dotnet/runtime/blob/main/docs/workflow/trimming/feature-switches.md
Is that worth the effort ?

@radekdoulik LGTM, unless we rather do the all or nothing feature flag.

@jkotas
Copy link
Member

jkotas commented Apr 15, 2021

My opinion is that 6Kb smaller uncompressed download is worth slower perf of non-ascii on wasm.

We should get a better idea about the performance impact for non-ASCII payloads.

Majority of the world uses non-ASCII characters. It is likely that the native strings that use user-local alphabet are typically going to appear in some json payloads.

that we could make it into (link time ?) feature flag, which could be exposed to developers, to make the choice.

I do not think the linker switch is worth it for this case. It may be worth it for other cases where the difference is more signficant.

@radekdoulik
Copy link
Member

I think the non-ASCII will remain slow. I will try to confirm it with measurements.

I do not think the linker switch is worth it for this case. It may be worth it for other cases where the difference is more signficant.

Maybe we can make this size optimization be part of InvariantCulture feature switch?

@pavelsavara
Copy link
Member

pavelsavara commented Apr 15, 2021

There is growing list of *.SizeOpt.cs files. Perhaps the linker switch could be common for all of them ?

src\libraries\Common\src\System\Collections\Generic\LargeArrayBuilder.SizeOpt.cs
src\libraries\System.Linq\src\System\Linq\Enumerable.SizeOpt.cs
src\libraries\System.Linq\src\System\Linq\Skip.SizeOpt.cs
src\libraries\System.Linq\src\System\Linq\Take.SizeOpt.cs
src\libraries\System.Private.CoreLib\src\System\Text\Unicode\Utf8Utility.SizeOpt.cs

@radekdoulik
Copy link
Member

Non-ASCII chars measurements. I have added some non-ASCII characters to the strings in the Person class.

patched Preview 2 HEAD
serialize small 0.629ms 0.554ms 2.054ms
serialize large 272.2ms 222.8ms 1107.2ms
deserialize small 1.241ms 1.022ms 2.145ms
deserialize large 613.3ms 457ms 1145.2ms

I also wanted to measure just text, so added measurement of [de]serialization of text containing 56777 chars. (repeating pangrams in Czech language)

patched Preview 2 HEAD
serialize text 9.17ms 7.03ms 45.83ms
deserialize text 27.23ms 12.58ms 51.34ms

@jkotas
Copy link
Member

jkotas commented Apr 16, 2021

This data tells me that we should revert the size optimization change for the UTF8 encoding and take the size hit.

radekdoulik added a commit to radekdoulik/runtime that referenced this issue Apr 16, 2021
…nterpreted and low-footprint scenarios (dotnet#49372)"

This reverts commit 197a28d.

Context: dotnet#50260
@radekdoulik
Copy link
Member

This data tells me that we should revert the size optimization change for the UTF8 encoding and take the size hit.

Yes, the perf hit is quite high. I have opened PR to revert it.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Apr 16, 2021

Before we revert the change, should we try creating a patch which inlines the decode methods, as mentioned in another comment? Still trying to figure out if we can get a good balance of size savings & acceptable perf. But if this is bordering on contorting the code, so be it.

Context: #51310 (comment)

jkotas pushed a commit that referenced this issue Apr 16, 2021
…nterpreted and low-footprint scenarios (#49372)" (#51379)

This reverts commit 197a28d.

Context: #50260
@lewing
Copy link
Member Author

lewing commented Apr 16, 2021

Before we revert the change, should we try creating a patch which inlines the decode methods, as mentioned in another comment? Still trying to figure out if we can get a good balance of size savings & acceptable perf. But if this is bordering on contorting the code, so be it.

Context: #51310 (comment)

It's reverted for now so we don't get a ton of reports about a known regression. We can still explore the options.

@GrabYourPitchforks
Copy link
Member

It's reverted for now so we don't get a ton of reports about a known regression. We can still explore the options.

Is anybody on point for that exploration? I assume Radek, since the issue is assigned to him?

@jkotas
Copy link
Member

jkotas commented Apr 17, 2021

I think investigating this encoding size optimization is in the area of diminishing returns. It would be more useful to investigate whether it is possible to make the Linq size vs. speed optimization configurable via trimming feature flag.

@lewing
Copy link
Member Author

lewing commented Apr 17, 2021

I think investigating this encoding size optimization is in the area of diminishing returns. It would be more useful to investigate whether it is possible to make the Linq size vs. speed optimization configurable via trimming feature flag.

Agreed, we've had customer reports of the Linq performance regression too and the size difference is larger

@lewing
Copy link
Member Author

lewing commented Apr 17, 2021

It's reverted for now so we don't get a ton of reports about a known regression. We can still explore the options.

Is anybody on point for that exploration? I assume Radek, since the issue is assigned to him?

On point in what sense?

@GrabYourPitchforks
Copy link
Member

On point in what sense?

"We should explore..."

Since this issue wasn't closed, I interpreted this as a request for somebody to perform this work. But it wasn't apparent to me who this work was being requested of.

@pranavkm
Copy link
Contributor

Has a fix for this landed in main? Our most recent benchmark run indicate that the regression has been mitigated?

image

@benaadams
Copy link
Member

Was reverted #51379

@steveharter
Copy link
Member

Can this be closed?

@pranavkm
Copy link
Contributor

Yup. It's been behaving itself:

image

@ghost ghost locked as resolved and limited conversation to collaborators May 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-System.Text.Encoding tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests