[wasm] Boost hit count for outer back branch targets; improve conditional execution detection #83630

kg · 2023-03-18T00:39:39Z

When we jit compile a trace and it tries to back branch outside of itself, this will cause a bailout. In cases with nested loops the outer target's hit count may take a long time to reach the compile threshold, but ideally we would compile a trace starting at that outer target so that we can spend more time inside of traces. This could happen for a less-frequently-called method that has millions of internal loop iterations, for example - like Benchstone.BenchF.FFT. In local test runs this appears to improve that benchmark's performance a bit and the bailout counts indicate that it goes down from millions of back-branch bailouts to tens of thousands, so it seems to work there.

This may increase startup time and memory usage a bit because we compile traces sooner, but it should balance out since the new traces will usually perform better. Adding the monitoring phase will help cull any cases where the new traces are slower.

Also, the jiterpreter's original logic only cared whether we were in a branch block when deciding whether a ret or call would execute, but any instruction after a branch instruction is conditionally executed. Improving this heuristic should fix some spurious compile aborts in certain functions (this affected a couple of the FFT benchmarks, for example)

EDIT: No merge because I want to delay this until after P3.

ghost · 2023-03-18T00:39:48Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

When we jit compile a trace and it tries to back branch outside of itself, this will cause a bailout. In cases with nested loops the outer target's hit count may take a long time to reach the compile threshold, but ideally we would compile a trace starting at that outer target so that we can spend more time inside of traces. This could happen for a less-frequently-called method that has millions of internal loop iterations, for example - like Benchstone.BenchF.FFT. In local test runs this appears to improve that benchmark's performance a bit and the bailout counts indicate that it goes down from millions of back-branch bailouts to tens of thousands, so it seems to work there.

This may increase startup time and memory usage a bit because we compile traces sooner, but it should balance out since the new traces will usually perform better. Adding the monitoring phase will help cull any cases where the new traces are slower.

Also, the jiterpreter's original logic only cared whether we were in a branch block when deciding whether a ret or call would execute, but any instruction after a branch instruction is conditionally executed. Improving this heuristic should fix some spurious compile aborts in certain functions (this affected a couple of the FFT benchmarks, for example)

Author:	kg
Assignees:	-
Labels:	`arch-wasm`, `area-Codegen-Jiterpreter-mono`
Milestone:	-

…tionally executed

…e and it is a trace prepare point, boost its hit count This will cause it to get jitted sooner

kg · 2023-03-25T03:04:08Z

Timings for browser-bench:

scenario	PR	main	% of base
Json, non-ASCII text serialize	2.0845ms	2.1645ms	96%
Json, non-ASCII text deserialize	3.6811ms	3.6340ms	101%
Json, small serialize	0.0960ms	0.0905ms	106%
Json, small deserialize	0.1768ms	0.1767ms	100%
Json, large serialize	25.8281ms	24.3029ms	106%
Json, large deserialize	47.7798ms	48.1759ms	99%
Span, Reverse bytes	0.0660ms	0.0660ms	100%
Span, Reverse chars	0.1050ms	0.1052ms	100%
Span, IndexOf bytes	0.5279us	0.5397us	98%
Span, IndexOf chars	0.0423ms	0.0423ms	100%
Span, SequenceEqual bytes	0.0294ms	0.0294ms	100%
Span, SequenceEqual chars	0.0587ms	0.0590ms	99%
String, Normalize	0.6263ms	0.6293ms	100%
String, Normalize ASCII	0.1854ms	0.0448ms	414%
Vector, Create Vector128	0.0798us	0.0852us	94%
Vector, Add 2 Vector128's	0.0954us	0.1017us	94%
Vector, Multiply 2 Vector128's	0.0965us	0.1026us	94%
Vector, Dot product int	0.0895us	0.0947us	95%
Vector, Dot product ulong	0.0758us	0.0831us	91%
Vector, Dot product float	0.1006us	0.1059us	95%
Vector, Dot product double	0.0814us	0.0870us	94%
Vector, Sum sbyte	0.1322us	0.1383us	96%
Vector, Sum short	0.1016us	0.1069us	95%
Vector, Sum uint	0.0786us	0.0852us	92%
Vector, Sum double	0.0805us	0.0851us	95%
Vector, Min float	0.1284us	0.1333us	96%
Vector, Max float	0.1299us	0.1349us	96%
Vector, Min double	0.1019us	0.1082us	94%
Vector, Max double	0.1038us	0.1095us	95%
Vector, Normalize float	0.4104us	0.4117us	100%

Looking into Normalize ASCII

kg added arch-wasm WebAssembly architecture area-Codegen-Jiterpreter-mono labels Mar 18, 2023

kg requested review from lewing, pavelsavara, vargaz and lambdageek as code owners March 18, 2023 00:39

ghost assigned kg Mar 18, 2023

kg added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 18, 2023

vargaz approved these changes Mar 18, 2023

View reviewed changes

kg added 2 commits March 24, 2023 18:45

More intelligently identify whether a given instruction will be condi…

975affb

…tionally executed

If a back branch target can't be reached from inside the current trac…

aa55a35

…e and it is a trace prepare point, boost its hit count This will cause it to get jitted sooner

kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 25, 2023

kg force-pushed the wasm-jiterpreter-conditional-exec-flag branch from 0c491f0 to aa55a35 Compare March 25, 2023 02:15

Fix build

0b8fdd5

build-analysis bot mentioned this pull request Mar 25, 2023

WasmTestOnBrowser-System.* test failures in CI #83655

Closed

kg added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 25, 2023

Fix Ascii normalization not optimizing properly anymore

949d1b1

kg removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 25, 2023

build-analysis bot mentioned this pull request Mar 25, 2023

Wasm debugger test timing out #83847

Closed

kg merged commit 043d3e7 into dotnet:main Mar 29, 2023

ghost locked as resolved and limited conversation to collaborators Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasm] Boost hit count for outer back branch targets; improve conditional execution detection #83630

[wasm] Boost hit count for outer back branch targets; improve conditional execution detection #83630

kg commented Mar 18, 2023 •

edited

Loading

ghost commented Mar 18, 2023

kg commented Mar 25, 2023 •

edited

Loading

[wasm] Boost hit count for outer back branch targets; improve conditional execution detection #83630

[wasm] Boost hit count for outer back branch targets; improve conditional execution detection #83630

Conversation

kg commented Mar 18, 2023 • edited Loading

ghost commented Mar 18, 2023

kg commented Mar 25, 2023 • edited Loading

kg commented Mar 18, 2023 •

edited

Loading

kg commented Mar 25, 2023 •

edited

Loading