-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of wasmtime file I/O maybe because tokio #7973
Comments
Thanks for this detailed report! The short answer is, we haven't benchmarked performance of file IO yet, and right off the top of my head we have a couple optimization ideas we haven't explored because we were just trying to get things out the door. I will dig in deeper and see if we can come up with some improvements here. |
Thank you for your prompt response. Or is there any configuration on wasmtime that can improve tokio's io performance a bit? |
Those are great questions. This is a long answer but this is a pretty involved topic, so please excuse this wall of text, and let me know if there is anything about this I should explain better. We chose to use Another significant factor is that In order to fit in with the same I don't actually know where the extra Finally, why does wasmtime-cli, which is a totally synchronous Rust program, use There is no configuration available to change whether blocking file IO is moved to separate thread or not - its very fundamental to how tokio and the wasmtime-wasi implementation work. The only way to change that behavior would be to rewrite wasmtime-wasi with a completely different internal architecture, to solely use synchronous Rust. A rewrite of To give historical context: I designed We may be able to find a way, for synchronous embeddings, to break some of tokio's rules about performing blocking syscalls on the "main thread", because that operation should only affect wasmtime-wasi and other crates that build on top of it. If we could break those rules, we could provide a faster path to perform some blocking file IO operations (likely |
Thank you very much for explaining the reasons for choosing tokio in such detail. Currently, I am using FlashDB as my database. FlashDB has two types of file APIs: LIBC file API (like fopen/fread/fwrte/fclose) and Posix file API (like open/read/write/close). After being compiled by the wasi-sdk, the two types of file APIs invoke the same type of interface: For the FlashDB scenario, I now expect the file IO performance of wasmtime to be close to native, so I plan to try to modify the implementation of wasmtime's file IO, and directly use rust's file IO instead of using tokio. This is just a try, of course, very much look forward to your suggestions. |
@pchickey: Just to add to the users of async wasmtime at scale, Microsoft also uses this, and has built it into the containerd/runwasi project for generalized Kubernetes usage as well. Great description of how this came about, thank you for the time and effort. |
@liutao-liu Thanks for your response, that use case makes sense and its one that many users might encounter whether using FlashDB or sqlite or etc. One detail I glossed over: we are using |
Hi @pchickey,thank you for your modification tips. After I made the following changes according to your tips, the performance of wasmtime has been greatly improved. In the above test case, the wasmtime time has been optimized from 23 seconds to 6 seconds. As you might expect, the Is it necessary to submit a PR for my changes? I added a run option to control whether or not to block in tokio. I think it is still necessary to provide an option for users to choose whether to use tokio |
Thanks for testing that out @liutao-liu! I think it'd be reasonable to land something along these lines into the CLI itself, although I'd personally prefer to avoid a flag here since it'd be best to have the behavior turned on by default. What I might propose is something like:
Would that work for you? |
I understand your first propose, I can move the flag to wasictx. That's a good idea, it would be simpler. But I don't understand your second propose. Do you mean to run the whole wasmtime in tokio? Can you explain that in more detail? |
For the second point, sorry I think this is actually the right function, namely Can you try wrapping this invocation of |
Hello @alexcrichton , I tested the solution you proposed and it actually took an average of 14 seconds. Compared with the original solution (23 seconds), this improvement is not ideal. This is because each I/O operation is performed by invoking spawn_blocking, which still causes a large number of asynchronous waits. |
Sorry, but to confirm, did you keep the changes mentioned above, e.g. the I sketched out what those changes might look like in this commit, but I think we'll both want to skip the Can you confirm whether that commit has the performance that you're looking for? |
With these two changes, it takes 6 seconds, hardly any further improvement. Also, you mentioned the new flags added in wasictx, which can't be read here.in_tokio |
Oops sorry I forgot to actually turn the option on. I do realize that the goal is to avoid |
Using your latest commit, it took 7 seconds. You haven't changed the code here. After I modified it based on yours, it took 6 seconds.
|
Ok, thanks for confirming! You're right that I didn't change |
I've opened #8190 with the changes I made above cleaned up a bit. That's probably at least a good new baseline to start from in terms of optimizing. |
//hi @pchickey, |
Wasmtime's |
Issue#734 have some explaination of the backgroud of |
wasm memory sandboxing is not really related to ( while it might be possible to defer these checks in some cases (passthrough to a host iirc the proposal in #734 is basically what became re. simultaneously holding |
This commit is a refactoring and modernization of wiggle's `BorrowChecker` implementation. This type is quite old and predates everything related to the component model for example. This type additionally predates the implementation of WASI threads for Wasmtime as well. In general, this type is old and has not been updated in a long time. Originally a `BorrowChecker` was intended to be a somewhat cheap method of enabling the host to have active safe shared and mutable borrows to guest memory. Over time though this hasn't really panned out. The WASI threads proposal, for example, doesn't allow safe shared or mutable borrows at all. Instead everything must be modeled as a copy in or copy out of data. This means that all of `wasmtime-wasi` and `wasi-common` have largely already been rewritten in such a way to minimize borrows into linear memory. Nowadays the only types that represent safe borrows are the `GuestSlice` type and its equivalents (e.g. `GuestSliceMut`, `GuestStr`, etc). These are minimally used throughout `wasi-common` and `wasmtime-wasi` and when they are used they're typically isolated to a small region of memory. This is all coupled with the facst that `BorrowChecker` never ended up being optimized. It's a `Mutex<HashMap<..>>` effectively and a pretty expensive one at that. The `Mutex` is required because `&BorrowChecker` must both allow mutations and be `Sync`. The `HashMap` is used to implement precise byte-level region checking to fulfill the original design requirements of what `wiggle` was envisioned to be. Given all that, this commit guts `BorrowChecker`'s implementation and functionality. The type is now effectively a glorified `RefCell` for the entire span of linear memory. Regions are no longer considered when borrows are made and instead a shared borrow is considered as borrowing the entirety of shared memory. This means that it's not possible to simultaneously have a safe shared and mutable borrow, even if they're disjoint, at the same time. The goal of this commit is to address performance issues seen in bytecodealliance#7973 which I've seen locally as well. The heavyweight implementation of `BorrowChecker` isn't really buying us much nowadays, especially with much development having since moved on to the component model. The hope is that this much coarser way of implementing borrow checking, which should be much more easily optimizable, is sufficient for the needs of WASI and not a whole lot else.
The |
This commit is a refactoring and modernization of wiggle's `BorrowChecker` implementation. This type is quite old and predates everything related to the component model for example. This type additionally predates the implementation of WASI threads for Wasmtime as well. In general, this type is old and has not been updated in a long time. Originally a `BorrowChecker` was intended to be a somewhat cheap method of enabling the host to have active safe shared and mutable borrows to guest memory. Over time though this hasn't really panned out. The WASI threads proposal, for example, doesn't allow safe shared or mutable borrows at all. Instead everything must be modeled as a copy in or copy out of data. This means that all of `wasmtime-wasi` and `wasi-common` have largely already been rewritten in such a way to minimize borrows into linear memory. Nowadays the only types that represent safe borrows are the `GuestSlice` type and its equivalents (e.g. `GuestSliceMut`, `GuestStr`, etc). These are minimally used throughout `wasi-common` and `wasmtime-wasi` and when they are used they're typically isolated to a small region of memory. This is all coupled with the facst that `BorrowChecker` never ended up being optimized. It's a `Mutex<HashMap<..>>` effectively and a pretty expensive one at that. The `Mutex` is required because `&BorrowChecker` must both allow mutations and be `Sync`. The `HashMap` is used to implement precise byte-level region checking to fulfill the original design requirements of what `wiggle` was envisioned to be. Given all that, this commit guts `BorrowChecker`'s implementation and functionality. The type is now effectively a glorified `RefCell` for the entire span of linear memory. Regions are no longer considered when borrows are made and instead a shared borrow is considered as borrowing the entirety of shared memory. This means that it's not possible to simultaneously have a safe shared and mutable borrow, even if they're disjoint, at the same time. The goal of this commit is to address performance issues seen in #7973 which I've seen locally as well. The heavyweight implementation of `BorrowChecker` isn't really buying us much nowadays, especially with much development having since moved on to the component model. The hope is that this much coarser way of implementing borrow checking, which should be much more easily optimizable, is sufficient for the needs of WASI and not a whole lot else.
IIRC this was fixed in #8303, so I'm going to close this |
Test Case
test.c
Steps to Reproduce
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 23 seconds.
The same test.c, native or wamr only takes about 2 seconds.
Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
System Call Times Statistics
As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses tokio to implement file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
Why do we use Tokio to implement file I/O? Have we considered performance?
The text was updated successfully, but these errors were encountered: