Performance bottleneck when creating Instances / growing memory in parallel #2563

jcaesar · 2021-09-08T15:26:11Z

Motivation

I'd like to invoke user-defined functionality in a stream data processing setting. To ensure that there are no odd influences between subsequent (but usually unrelated) stream events, I'd like to create a new Instance for every event I process.
Also, I'd like to process loads of messages, possibly millions per second (in a cluster).

So I need lots of instances per second.

Problem

Wasmer instance creation doesn't benefit from parallelization. 1 thread gives 111 k instances per second, and 2 threads 80 k/s.
More threads make the situation even worse.
(The numbers are from a Ryzen 7 3700X 8-Core, Linux 5.13.13-arch1-1. My production machines will be larger…)

After a bit of benchmarking and experimenting, I arrived at the conclusion that the instance memory must be at fault.

If an instance has accessible memory, it will make a call to mmap/mprotect when being created (and the corresponding munmap call on drop) (look for the __GI_m* calls in the flamegraph, right below the stacks of [unknown]). The following experiment proves that they're at fault.

Proposed solution?

I have experimented with avoiding the mmap calls by reusing wasmer_vm::Mmaps. This can be done without modifying wasmer (but with some code duplication) by

Setting a custom Tunables
which creates a modified version of LinearMemory
which returns Mmaps to a thread-local pool instead of dropping them.

This has the desired effect of letting instance creation scale near-linearly to the number of cores. (e.g. 1 thread: 115 k/s, 2 threads: 233 k/s).

The problem is that to avoid once instance being able to see memory left by another, Mmaps need to be zeroed out up to the accessible size. Which is only cheaper than mmaping up to 10 pages (for the single-threaded case).

So, I'm looking for alternatives.

Alternatives

At a single thread, even with one mprotect/mmap/munmap for each instance memory allocation, I can get about 100k Instances per second. It's not awesome, but probably enough for most of my use-cases. I can just have a single thread create all the instances and shove them through an mpmc. (If in doubt, I can use more smaller machines. Cloud and all.)

I'm also considering the possibility of munmapping chunks of memory at the beginning of a Mmap that have already been used by an instance. It would free the memory and save zeroing or mmapping, but I'm not sure whether it's much cheaper, since munmap still messes with the TLB(?).

The text was updated successfully, but these errors were encountered:

jcaesar · 2022-12-02T00:46:53Z

For context: wasmtime seems to have given up on this problem: bytecodealliance/wasmtime#4637 (comment)

stale · 2024-05-03T22:16:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2024-06-05T01:27:30Z

Feel free to reopen the issue if it has been closed by mistake.

jcaesar added the 🎉 enhancement New feature! label Sep 8, 2021

jcaesar changed the title ~~Performance bottleneck when creating Instances / growing memory~~ Performance bottleneck when creating Instances / growing memory in parallel Sep 10, 2021

jcaesar mentioned this issue Sep 10, 2021

[c-api] wasm_instance_new is slow #2506

Closed

Amanieu added the priority-medium Medium priority issue label Oct 20, 2021

epilys modified the milestones: v3.0, v3.x Apr 27, 2022

ptitSeb modified the milestones: v3.x, v4.x May 3, 2023

stale bot added the 🏚 stale Inactive issues or PR label May 3, 2024

stale bot closed this as completed Jun 5, 2024

github-project-automation bot added this to Wasmer Runtime Issue Board Aug 29, 2024

github-project-automation bot moved this to 🎉 Done in Wasmer Runtime Issue Board Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance bottleneck when creating Instances / growing memory in parallel #2563

Performance bottleneck when creating Instances / growing memory in parallel #2563

jcaesar commented Sep 8, 2021 •

edited

Loading

jcaesar commented Dec 2, 2022

stale bot commented May 3, 2024

stale bot commented Jun 5, 2024

Performance bottleneck when creating Instances / growing memory in parallel #2563

Performance bottleneck when creating Instances / growing memory in parallel #2563

Comments

jcaesar commented Sep 8, 2021 • edited Loading

Motivation

Problem

Proposed solution?

Alternatives

jcaesar commented Dec 2, 2022

stale bot commented May 3, 2024

stale bot commented Jun 5, 2024

jcaesar commented Sep 8, 2021 •

edited

Loading