Emitted memset
and memcpy
are really slow on WASM
#92436
Labels
C-bug
Category: This is a bug.
C-optimization
Category: An issue highlighting optimization opportunities or PRs implementing such
O-wasm
Target: WASM (WebAssembly), http://webassembly.org/
While both functions do the "usual" double-loop pattern, with one loop processing 4 or 8 bytes at a time, and the other one doing the remaining few, sadly, the former one also only does 1-byte wide loads and stores, just has more of them per iteration in series.
This does not look right, it could very well operate on single 32bit or 64bit wide primitives per iteration.
This is what
rustc
emits in--release
mode:memset
memcpy
Running them through
wasm-opt -O
doesn't do much, only reorders some locals, and changes an "add -1" to a "sub 1" for some reason:memset after wasm-opt
memcpy after wasm-opt
(I have also extended these sources to be complete modules for experimentation purposes.)
These functions constitute a double-digit percentage of total runtime in some cases, see: WebAssembly/binaryen#4403 (comment)
The text was updated successfully, but these errors were encountered: