-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save WASM state and resume #3017
Comments
The on a different host machine will require pessimizing optimizations such that between every instruction all locals and the full stack is known as different backends may optimize instructions in different ways and thus require different information to at a given point continue running. This will probably a significant perf hit. Alternatively interrupting could be limited to certain safepoints, allowing a much smaller perf hit.
It is much easier to implement this in an interpreter than a compiler as an interpreter needs to keep all state, while a compiler will "forget" about state as soon as possible to reduce register pressure and will try to fold instructions together (and thus allow forgetting about certain locals) or mangle them in different ways. |
I was worried that would be the problem.
I could easily trick the user into having regular safe points by requiring them to regularly check into something like a watch dog timer. How would you suggest saving the state at these safe points? |
That will still require changes to Cranelift to allow for saving the state and restoring at such a safe point. The changes are just easier and with less of a perf hit than allowing it at arbitrary points. |
@IamTheCarl thanks for starting this conversation -- it's a really interesting one! I'm curious what sort of state-saves are actually necessary. As @bjorn3 says, very general snapshotting at arbitrary points is expensive; if we needed to do something like that, probably the better way would be to just save the registers, heap and stack, ensure we restore into the same generated code (this has limits wrt CPU features too, as mentioned above), and take care to make the stack and register file "relocatable", in the sense that they can be fixed up for other heap-base addresses on restore. (Some thoughts on that: to do so, probably the easiest way would be to tweak codegen so that the heap base always lives in one pinned register, and any address computation uses that as part of its address expression, so we never have intermediate pointers in registers or spilled to stack. Then we just have to fix up frame pointers and return-area pointers on the stack.) But there's a potentially much easier way: could we get away with only snapshotting when no Wasm frame is on the stack? In other words, when a call into the Wasm has returned all the way out? If so, then we don't have to worry about register or stack state at all; we can just snapshot the heap and globals. We could even restore with different generated code, on a different architecture or just with different CPU features. This is sort of like what Wizer does, so that's probably the place to look for more ideas. cc @fitzgen for more thoughts! |
That does sound a bit like my alternative solution. The idea was to regularly call a function that would let the guest run for a short bit and then return shortly after. It would be up to the developer of the guest code to keep track of the state between calls. Of course if we're taking it that far I could then even push off the job of serialization to the guest code. Just send it some kind of event "shutdown now or you'll be force terminated" and that'll give them just a moment to save their state before shutdown. This one is so simple to implement that wasmtime wouldn't even need any modifications, but it is also the most inconvenient for the developer of guest code. |
Yeah, saving state without frames on the stack is pretty easy, and as Wizer shows, you don't even need to build that functionality into the engine itself, you can do it with the standard reflection APIs that engines expose and use Wasm itself as your snapshot format. You could even factor out the code from the data (snapshot) and make it so that you don't need to recompile the code for each snapshot, just reuse the already-compiled module from before and pass in new memory/globals for its imports. Binaryen's asyncify pass can help turn synchronous code into asynchronous code. If my understanding is correct, it lifts frame activations and their local values into linear memory and does some kind of CPS-esque transformation on the code. It might make assumptions about the Web or that you're also using Emscripten; not sure, I haven't actually used it. This might be one way to transform the Wasm so that code can be written synchronously but still make snapshotting with zero activations on the stack viable. One final thought: snapshotting at ~gc safe points could be made possible relatively easily if we spill all registers at safe points. This would be a config option, because we wouldn't want to do this unconditionally. But if we only have to capture and restore stack values, that seems like a much easier problem. Hard part is addressing native pointers. I don't have any good ideas here, but I think you may have some, @cfallin. |
Indeed! This is the bit about "no intermediate pointers in registers or the stack" alluded to above. To add a bit more detail, just to get this written down:
Given those codegen changes, the code is completely "data-relocatable": the only native-address-space pointers in the register file or on the stack at any time are in (Caveat: return addresses; if code is relocated too, probably best to retain frame-pointer chain, rewrite return addresses as relative to module base on snapshot, and re-add the new module base on restore.) Probably a few weeks' work but would be very useful even beyond this use-case! |
Intermediate pointers are fine if they aren't used across a safe point or if it is the vmctx. In the former case they simply wouldn't need to be stored and in the later case the vmctx value can (and should) trivially be replaced with the newly allocated vmctx when loading.
If this parameter is properly annotated it can be fixed up when loading too by looking at the caller.
The stack doesn't need to be copied verbatim. Only the known locations of spilled values, explicit stackslots (unused by wasm) and the return value need to be copied. The later can be stored as function index + callsite index. |
Yeah, might be useful for on stack replacement for eg speculative optimization or going from an interpreter to jitted code. |
I am impressed with both the quick reply and eagerness to find solutions in this team. There's a lot of terminology here I'm unfamiliar with, so I'd like to check and make sure I'm following this conversation correctly. It also sounds like you're having difficulty knowing how to serialize objects on the heap? I can understand why that would be an issue since you can't depend on getting those same addresses back. So what comes next? |
@IamTheCarl it spawned an interesting discussion and one that's timely for other potential uses too, so thank you for starting it!
Yes, this is probably the "easiest" solution (for some definition of easy) -- as @fitzgen suggested above, the Binaryen toolchain has an asyncify transform that might help turn arbitrary Wasm into something that would work with your "call into and execute a bit of Wasm at a time" approach; between those calls, all you need to save is the Wasm heap and globals, not any in-progress execution state such as registers or stack. I don't know much about this tool but it looks like there's a pretty comprehensive intro blog post here: https://kripken.github.io/blog/wasm/2019/07/16/asyncify.html The next step after that is to modify Wasmtime itself so that it generates code that allows for snapshotting and restoring, including the in-progress execution state. There are various tradeoffs here involving whether we want to allow restoring on a different machine architecture (hardest), or just with identical JIT'd code (easier); and whether we want to allow arbitrary interruption and checkpoint at any point (hardest, possibly pessimizes codegen), or checkpoint only at certain points, which we're calling "safepoints" above, in reference to some garbage-collector terminology (a bit easier). Anything along these lines though would not be a quick short-term solution for you, so I personally would suggest looking into the asyncify option, or perhaps just defining an execution model where control always returns from your Wasm module periodically. |
Does a method to do this currently exist? I'm exploring various wasm runtimes and have been unable to find such a feature supported as of yet. If not, how difficult would it be to implement? I may be able to take a look at it as it seems to be a heavily requested feature. |
Isn't that what wizer does? |
Ahh silly me - I did a google search on wizer after seeing it mentioned before and came up with a completely different unrelated result. But yes it looks like it's quite similar to what I'm looking for. While the caveats don't work for my specific use case, (specifically, not being able to call imported functions during the init function), it does provide enough of an idea of how to proceed for now. |
Indeed, Wizer came up in the above discussion as well; the main difference in my mind (at least as far as I can remember now) was whether this would be a more generic facility, with an API to snapshot/resume at arbitrary points. Wizer currently runs a Interestingly there was a recent PR from @koute (#3691) that did snapshotting as well. The specific use case in that PR should I think be mostly covered by our recent instantiation-time improvements. But if snapshotting in a more general sense continues to arise as a need, at the Wasmtime API level, that could merit more discussion, I suppose. @RobDavenport I'm curious about your use-case here, on a few axes: (i) do you need continued snapshot/restore (i.e. multiple roundtrips), or just one initial snapshot and then restores from that? And (ii) does it need to be programmatic at the Wasmtime API level, or is a separate tool (e.g. Wizer) usable for you? |
Thanks for the question @cfallin , in my case I'm writing something like a game engine and using WASM as my game-logic code. As a need to support rollback multiplayer (like GGPO which is often used in emulators), I need to consistently store the entire state of the game for the past few frames, and, if a mis-prediction is detected, rollback to a confirmed synchronized state and re-simulate the rest of the frames (often more than one!) with the new input up until the current time. I was actually able to get this working for the most part by following the suggestions in this thread, along with some things from Wizer. Iterating through the instance's exports, cloning the memories and non-const globals, and then just saving them out as a Loading saved globals is very straightforward and doesn't need any explanation. For memories, doing a I'm not sure how dangerous this might be since the WASM client script may be allocating or deallocating memory, but I'm under the assumption that since it's all lumped together and already allocated inside the WASM host VM then it should just work unless the instance or store are moved somehow. So to answer your questions... (i) Yes, multiple trips will be made, however at specific times determined by the host program without any call frames on the stack. These will always be going back in time, though, and then re-simulating and rewriting old snapshots as we progress. (ii) Actually I have no preference, as I was able to do it for my use case mentioned above. The reason I couldn't use Wizer in my case was I needed to allow the WASM code to make callbacks into the host for things like input handling or draw requests. However, I believe the original requester for this specifically wanted a way to be able to save/resume the WASM state on a fresh application or even potentially another machine. Unfortunately I'm not too familiar with how the internals work so I can't comment on the safety or feasibility of that, as my quick hack is only intended to work with a single module instance, on the same machine, during the same application lifetime. |
I'm curious if much work has been done on this. I saw that there now appears to be an async interface for wasmtime which I imagine would be useful for a feature like this. |
Not much has changed regarding the fundamentals here since the last comments; Wizer is still the state-of-the-art, and unfortunately we don't have anything built into Wasmtime itself for snapshotting. |
How hard it would be to make WASI WebGPU and File system (assuming the FS will never be changed by any other process) to work well with this (properly resume from saved state)? This sounds like it may be useful for workaround for Android's background app killing: An Android app that is made of wasm runtime and the wasm file that has most of the actual app's code. As soon as it goes to background, it would pause execution of the wasm (and its linked wasm-plugins, if any), and snapshot the state as soon as it goes to background, and then restore when user returns to the app. |
Feature
To be able to interrupt and save the state of executing WASM code and then resume it later, possibly on a different host machine.
Benefit
I am writing an add on for the Open Computers mod in Minecraft. The mod adds tiny computers that can be programmed by the player. One of their features is that if you quit the game, when you resume later, the state of the machine is restored just as they left it.
The default Lua interpreter uses a custom implementation of Lua that can save and restore its state like this.
There's a small hand full of emulators that just save their memory and registers and restore them later.
I don't see a clear way to do this with wasmtime.
I do see potential in this being useful for a tool similar to Jupyter. You could pause a computationally intensive task and resume it later, or a cloud could serialize a job, and send it to another machine to resume in the case of a scale up/down.
Implementation
I can see that modules can be serialized already. What really needs to be saved is the state of memory and execution.
Just dumping the memory into a byte array will be enough to save the memory. Saving the state is what looks hard to me.
Dumping the registers to be restored later could work for a non-portable solution but something that can be restored on another machine of possibly a different architecture would be preferred.
Unfortunately I don't know enough about wasmtime's implementation to give a fantastic recommendation here.
Alternatives
This is actually my fallback plan if the feature is rejected.
If you can just serialize the memory of the WASM environment (actually this may already be possible, I just haven't tried yet) you could push the job of tracking state off onto a Rust module's async state machine. You would need a function that is called regularly to resume this state machine and the module would need to frequently interrupt itself so that this function can return to the host.
The advantage here is that you don't need to worry about any kind of register. Saving the state is handled by the async state machine and you can just call that one function to run the state machine to resume the guest application.
The disadvantage is that this is really only practical if the web assembly is written in Rust, and the person writing this Rust code must correctly write an async application.
The text was updated successfully, but these errors were encountered: