-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Boost hit count for outer back branch targets; improve conditional execution detection #83630
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsWhen we jit compile a trace and it tries to back branch outside of itself, this will cause a bailout. In cases with nested loops the outer target's hit count may take a long time to reach the compile threshold, but ideally we would compile a trace starting at that outer target so that we can spend more time inside of traces. This could happen for a less-frequently-called method that has millions of internal loop iterations, for example - like This may increase startup time and memory usage a bit because we compile traces sooner, but it should balance out since the new traces will usually perform better. Adding the monitoring phase will help cull any cases where the new traces are slower. Also, the jiterpreter's original logic only cared whether we were in a branch block when deciding whether a ret or call would execute, but any instruction after a branch instruction is conditionally executed. Improving this heuristic should fix some spurious compile aborts in certain functions (this affected a couple of the FFT benchmarks, for example)
|
…tionally executed
…e and it is a trace prepare point, boost its hit count This will cause it to get jitted sooner
0c491f0
to
aa55a35
Compare
Timings for browser-bench:
Looking into Normalize ASCII |
When we jit compile a trace and it tries to back branch outside of itself, this will cause a bailout. In cases with nested loops the outer target's hit count may take a long time to reach the compile threshold, but ideally we would compile a trace starting at that outer target so that we can spend more time inside of traces. This could happen for a less-frequently-called method that has millions of internal loop iterations, for example - like
Benchstone.BenchF.FFT
. In local test runs this appears to improve that benchmark's performance a bit and the bailout counts indicate that it goes down from millions of back-branch bailouts to tens of thousands, so it seems to work there.This may increase startup time and memory usage a bit because we compile traces sooner, but it should balance out since the new traces will usually perform better. Adding the monitoring phase will help cull any cases where the new traces are slower.
Also, the jiterpreter's original logic only cared whether we were in a branch block when deciding whether a ret or call would execute, but any instruction after a branch instruction is conditionally executed. Improving this heuristic should fix some spurious compile aborts in certain functions (this affected a couple of the FFT benchmarks, for example)
EDIT: No merge because I want to delay this until after P3.