Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression using new interpreter API #1550

Open
gumb0 opened this issue Sep 23, 2020 · 4 comments
Open

Performance regression using new interpreter API #1550

gumb0 opened this issue Sep 23, 2020 · 4 comments

Comments

@gumb0
Copy link
Contributor

gumb0 commented Sep 23, 2020

We have been using WABT 1.0.12 interpreter as a baseline for our benchmarking while developing Fizzy.

Recently we tried comparing WABT 1.0.12 against 1.0.19 on our benchmarks (mostly arithmetic-heavy code, crypto algos), and it showed significant slowdown in the latter (execution up to 2-3 times slower in many benchmarks; instantiation also slower but only up to ~15% slower).

We don't rule out the possibility, that we use the new interpreter API in some suboptimal way, so we would appreciate, if some obvious issues with it were pointed out.

Our code is in https://github.com/wasmx/fizzy/blob/ce3c446e62acce5db245a1287b7a524956277b87/test/utils/wabt_engine.cpp (Note: we do create one Thread object to be used in all execute iterations; with the old API we similarly created single Executor)

Detailed benchmarking results are in wasmx/fizzy#443.

Thank you.

@aardappel
Copy link
Contributor

Note that the interpreter has been written for correctness and clarity, not performance. It is not unlikely that as new features get added, it will keep getting slower. The only way to not make it slow down is to emit specialized opcodes for all conditionals being introduced, which would very much go against the primary goals.

I know for example that with my recent addition of Memory64 support, extra conditionals were added to every memory access, and lookups to see which memory type we're dealing with. I certainly hope that alone wasn't a 2x slowdown :)

Maybe you could run your benchmark with various recent commits to isolate which one added the most slowdown?

@chfast
Copy link
Contributor

chfast commented Sep 25, 2020

I profiled wasm-interp and the issue is in "New interpreter (#1330)" introduced in 1.0.14.

I could not use our benchmarking tool and bigger set of inputs because the API changes a lot between 1.0.12 and 1.0.14 due to implementation of Wasm C API.

I ended up with using single input memset.wasm (memset.wat).

Time before: 0.003623
Time after: 0.009478 (2.61x)

I was not looking for reasons of the slowdown, but I noticed that BinOp function is not inlined in the new interpreter.

Perf stats

> perf stat -e cycles,instructions ./wasm-interp memset.wasm --run-all-exports

memset_bench() => i32:5100000

 Performance counter stats for './wasm-interp memset.wasm --run-all-exports':

        15 041 811      cycles
        39 211 115      instructions              #    2,61  insn per cycle

       0,003593214 seconds time elapsed

       0,003623000 seconds user
       0,000000000 seconds sys


> perf stat -e cycles,instructions ./wasm-interp-new memset.wasm --run-all-exports

memset_bench() => i32:5100000

 Performance counter stats for './wasm-interp-new memset.wasm --run-all-exports':

        39 301 171      cycles
       105 862 465      instructions              #    2,69  insn per cycle

       0,009431269 seconds time elapsed

       0,009478000 seconds user
       0,000000000 seconds sys

@aardappel
Copy link
Contributor

@binji any idea what part of that could the biggest slowdown?

Would be good to run it under a profiler to see if there's any low-hanging fruit.

@binji
Copy link
Member

binji commented Oct 21, 2020

IIRC there were some hacks in the old interpreter to try to keep more values in local variables while running the interpreter loop. The new interpreter is refactored to have a "Step" function instead, perhaps that's not being inlined/optimized. I'd guess there are a lot of opportunities to speed it up, but I'd worry a little about adding complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants