-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naive fibonacci benchmark over 2x slower than native #972
Comments
It would be good to have a benchmark that isn't relying on function call - just loops - for comparison. This is because function calls can have their own overhead. |
Perhaps, yes, but that's sort of a different benchmark. I wouldn't expect anything used in the wasm here to be 2x slower than native, and while the function call may be the most expensive part then that's what we should fix in this benchmark. |
Maybe you can try wasm2obj for now and disassembly the output to see if the code looks ok. This is not an apple to apple comparison but may it can give some hints. |
I think any comparison between wasm and native code needs to be closely examined, because yes it's very easy to compare apples to oranges. I do not, however, believe that is the case here, since this is an extremely simple function and it's just one function we're looking at, not entire programs or anything like that. While wasm has overhead almost everything about this benchmark is known statically and should have extremely little overhead, not 2x overhead. |
@alexcrichton @rene-fonseca did you guys manage to work out what the cause of the overhead is? |
No there hasn't been any discussion about the generated code itself. Using https://gist.github.com/alexcrichton/6d80d25ee6f64857c3388e130dad22ed |
There seems to be more spilling and more register-register moves in the Cranelift version. The work on a new regalloc may help with this. |
Not sure how useful this will be but I've done some more benchmarks including re-running fibonacci stripped of all recursion (sources and some numbers available at github.com/kubkon/wasmtime-bench). In that particular case, there is a massive speedup. Nonetheless, Cranelift still incurs 2x overhead as seen in a more complicated mandelbrot example. |
How is the performance with the new x64 backend? |
This is somewhat outdated; closing in favor of our upcoming more general perf-tracking efforts (bytecodealliance/rfcs#3 and bytecodealliance/rfcs#4). |
While a pretty awful benchmark "in the large" I was surprised playing around locally how the naive fibonacci program was so slow relative to native performance. Especially because fibonacci benchmarks don't touch linear memory much, this may at least be a decent benchmark of cranelift and/or the code generator in use.
Given an input file like:
native execution looks like (for me at least)
Whereas the wasm execution looks like:
Here the wasm is over 2x slower than native, which was a bit surprising to me!
Some other reference information below is...
Native assembly for the `fib` function
WebAssembly of the `fib` function
At this time I don't know of a great way to get out the assembly generated by cranelift unfortunately, but I'm hoping others may know of an easy way to do so!
The text was updated successfully, but these errors were encountered: