-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve process spawning performance #37
Comments
This has improved a bit with the re-write, but there should be a lot of room for further improvement:
|
OS and hardware are going to make things vary, but isn't 100us much greater than the normal amount of time to spawn a full OS thread? I haven't benchmarked this myself but others have reported times under 20us. This would make even bare wasmtime/wasmer instance creation even without lunatic involved slower than spawning a native thread according to the numbers here? I don't have a mental model for what the wasm runtimes need to do but I find this surprising. |
At the moment it's significantly slower than spawning an OS thread, but the amount of work done is also significantly higher. A Wasm instance gets a completely fresh heap memory, this means that all the static strings compiled into the binary need to be copied into the newly created heap. memcopy is fast, but I believe this is currently the biggest performance hit. There are also some mmap allocations (that are super slow on macOS) to eliminate bound checks additionally slowing things down. Each instance also holds onto file descriptors, tcp connections and other resources. Threads don't have individual resources that need to be set up and all threads inside the process share a table. That's also why threads start up much faster than operating system processes, because the heavy lifting is already done. The amount of work that is done by spawning a Wasm instance is much more comparable to spawning an operating system process. The good news is that we might be able to reduce the amount of memory copied by doing more sharing of the static memory. And I don't see a real blocker why we would not be able to get close or even beat thread spawning speed. One big advantage that we have is that once the Wasm instances are spawned, scheduling them in user space is much cheaper than for the OS to schedule threads |
Lunatic encourages program architectures where it's common to spawn many short lived processes (e.g. a process per HTTP request). For this to work the process spawning overhead needs to stay low. I ran some benchmarks on my MacBook:
With the Wasmtime backend:
With the Wasmer backend:
Ideally we want the instance creation time to be in the single digit micro second range, matching Erlang. There are many improvements we can do to get there.
As we keep adding features the instance creation time has been getting worse, mostly because every time we add a new host function it will increase the Linker creation time. The good news is that with a recent addition to Wasmtime it's possible to define "global" host functions so we can completely skip the step of adding all host functions to the instance linker each time we spawn a process.
Another recent Wasmtime addition allows us to reuse and pool resources. We could create a pool of other resources too, like AsyncWormhole stacks. As the Wasm code can't observe the "real" stack it would be even safe to reuse it between instances without clearing it first.
Even both of this optimisations are Wasmtime specific, I believe that Wasmer is going to add similar functionality in the future. I will open separate issues for both of this approaches and keep this as a tracking issue for further ideas and discussions around spawning performance.
The text was updated successfully, but these errors were encountered: