Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve process spawning performance #37

Open
bkolobara opened this issue Apr 10, 2021 · 3 comments
Open

Improve process spawning performance #37

bkolobara opened this issue Apr 10, 2021 · 3 comments

Comments

@bkolobara
Copy link
Contributor

Lunatic encourages program architectures where it's common to spawn many short lived processes (e.g. a process per HTTP request). For this to work the process spawning overhead needs to stay low. I ran some benchmarks on my MacBook:

With the Wasmtime backend:

wasmtime instance creation                                                                             
                        time:   [26.164 us 26.284 us 26.424 us]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

lunatic instance creation                                                                            
                        time:   [321.65 us 323.69 us 326.78 us]
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

With the Wasmer backend:

wasmer instance creation                                                                             
                        time:   [23.603 us 23.727 us 23.863 us]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

lunatic instance creation                                                                            
                        time:   [216.54 us 217.95 us 219.62 us]
                        change: [-32.116% -30.953% -29.620%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Ideally we want the instance creation time to be in the single digit micro second range, matching Erlang. There are many improvements we can do to get there.

As we keep adding features the instance creation time has been getting worse, mostly because every time we add a new host function it will increase the Linker creation time. The good news is that with a recent addition to Wasmtime it's possible to define "global" host functions so we can completely skip the step of adding all host functions to the instance linker each time we spawn a process.

Another recent Wasmtime addition allows us to reuse and pool resources. We could create a pool of other resources too, like AsyncWormhole stacks. As the Wasm code can't observe the "real" stack it would be even safe to reuse it between instances without clearing it first.

Even both of this optimisations are Wasmtime specific, I believe that Wasmer is going to add similar functionality in the future. I will open separate issues for both of this approaches and keep this as a tracking issue for further ideas and discussions around spawning performance.

@bkolobara
Copy link
Contributor Author

This has improved a bit with the re-write, but there should be a lot of room for further improvement:

spawn process           time:   [117.11 us 117.39 us 117.72 us]                          
                        change: [-44.351% -40.109% -36.021%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

@jgarvin
Copy link

jgarvin commented Dec 14, 2022

OS and hardware are going to make things vary, but isn't 100us much greater than the normal amount of time to spawn a full OS thread? I haven't benchmarked this myself but others have reported times under 20us. This would make even bare wasmtime/wasmer instance creation even without lunatic involved slower than spawning a native thread according to the numbers here? I don't have a mental model for what the wasm runtimes need to do but I find this surprising.

@bkolobara
Copy link
Contributor Author

At the moment it's significantly slower than spawning an OS thread, but the amount of work done is also significantly higher.

A Wasm instance gets a completely fresh heap memory, this means that all the static strings compiled into the binary need to be copied into the newly created heap. memcopy is fast, but I believe this is currently the biggest performance hit. There are also some mmap allocations (that are super slow on macOS) to eliminate bound checks additionally slowing things down. Each instance also holds onto file descriptors, tcp connections and other resources. Threads don't have individual resources that need to be set up and all threads inside the process share a table. That's also why threads start up much faster than operating system processes, because the heavy lifting is already done. The amount of work that is done by spawning a Wasm instance is much more comparable to spawning an operating system process.

The good news is that we might be able to reduce the amount of memory copied by doing more sharing of the static memory. And I don't see a real blocker why we would not be able to get close or even beat thread spawning speed. One big advantage that we have is that once the Wasm instances are spawned, scheduling them in user space is much cheaper than for the OS to schedule threads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants