-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macos: "failed to set thread exception port" in forked process #6785
Comments
@casimiro has a good example and complete stacktrace of the assertion in #6788. One thing we have tried to mitigate this is to destroy the engine before forking, in the hope that it would remove all secondary state, so that the forked process could call |
The error code is Wasmtime allocates a Mach port name ( The assertion failure is from the per-thread initialization that occurs to setup the thread exception port from threads that are running wasm code. Without a fork, this works because all threads in a process see the same Mach port namespace. However, Mach port namespaces aren't inherited in a child process. So I suspect we're going to run into an issue whenever an Note that there is a compile-time feature ( |
Thanks @peterhuene; ok so just like we thought, our hypothesis was the same. Although we think that destroying the engine prior to the fork should remedy this, it feels like an issue in Wasmtime itself that it doesn't. I'd say it's also a bummer that this cannot be disabled at runtime (in engine configuration) as now our users will have to compile Wasmtime themselves so they can disable this feature, they cannot just download one of the releases... But at least it is a path forward. Thanks again! |
I am not aware of a reason we couldn't "uninstall" the trap handlers or reset otherwise global state upon the destruction of the last I think it simply hasn't been a priority for us to do so as generally the majority use pattern for Wasmtime thus far has been to create a single |
Note that all Nginx is doing is forking into a background process which isn't so uncommon, I suspect more embeddings of Wasmtime might run into this issue in the long run. |
Is there a call to create the engine from within the initial process when the feature to fork the daemon is on? If not, I can't really explain why that particular feature would be tripping things up if all it is doing is immediately forking. |
Yes there is or else we would have moved it already. We must open an engine and validate the It's ok though, for now we are looking into |
I see, thank you for the clarification. I suspect even in the |
For the
And it can pass without It seems the |
Yeah, it appears to be missing from the
[features]
...
posix-signals-on-macos = ["wasmtime/posix-signals-on-macos"] |
If you don't want to modify the source |
Thanks all for the tips; With macOS x86, traps are properly handled with
This seems to be coming from Wasmtime's interaction with the libc binding... |
Currently the `libc` crate has an incorrect definition of `ucontext_t` for this platform which is causing alignment issues when it's used. This fixes [this issue][1] and the `posix-signals-on-macos` feature on this platform. [1]: bytecodealliance#6785 (comment)
That's a bug! I've posted a temporary fix for that at #6793 |
Currently the `libc` crate has an incorrect definition of `ucontext_t` for this platform which is causing alignment issues when it's used. This fixes [this issue][1] and the `posix-signals-on-macos` feature on this platform. [1]: #6785 (comment)
Oops didn't mean to close this |
I investigated this a bit recently and I'm not actually certain that we'll want to fix this. I think that the best solution here might be to recommend that macOS users who want to leverage Otherwise though the issues on macOS that I found difficult to handle are:
I suppose that can all be mostly summarized as I think the only possibly-feasible thing to do here is to shut everything down when an Otherwise though @thibaultcha I'd ask you if having a custom build is difficult for y'all? If so we can move the signals-vs-ports to a runtime option instead. Additionally one other mitigation we could implement is to use |
Hi @alexcrichton, thanks for having a look at this too! It is ok for us to have a custom build, although a runtime configuration option would be even better since it means users can download any of the upstream Wasmtime releases which imho is ideal for a friction-less experience. How much work do you foresee that be? Perhaps it could be something one of us could contribute (a difficulty is that none of us on my team has access to a Macbook, but maybe we can write the patch off a box if it isn't too big). Although, I do hope relying on this feature in the long run won't be a hassle for us (e.g. if Wasmtime on macos focuses more efforts & fixes on trap handling through ports at the detriment of signal dispositions); in which case having tighter control over the port handlers might be preferable, since we could shut it down before |
This commit adds a `Config::macos_use_mach_ports` configuration option to replace the old `posix-signals-on-macos` compile-time Cargo feature. This'll make Wasmtime a tad larger on macOS but likely negligibly so. Otherwise this is intended to provide a resolution to bytecodealliance#6785 where embedders will be able to use any build of Wasmtime and configure at runtime how trap handling should happen. Functionally this commit additionally registers a `pthread_atfork` handler to cause any usage of Wasmtime in the child to panic. This should help head off a known-invalid state in case anyone runs into it in the future.
Ok sounds good! Not too much work but no worries I've done some initial bits in #6807 now. If you'd like though some help in implementing a corresponding option in the C API would be appreciated! As for the long term you should be good. Unix signals are unlikely to be as battle-tested as Mach ports because they're off-by-default for macOS, but they're turn on-by-default for Linux for example which means they're not completely untested. Y'all aren't the first to want to use signals instead of mach ports as well so it's something that I'd at least personally like to see continued support for in Wasmtime. Basically that's to say that, yes, you may run into future issues. I think we'll be interested in fixing them promptly, though, if they arise. |
* Configure Mach ports vs signals via `Config` This commit adds a `Config::macos_use_mach_ports` configuration option to replace the old `posix-signals-on-macos` compile-time Cargo feature. This'll make Wasmtime a tad larger on macOS but likely negligibly so. Otherwise this is intended to provide a resolution to #6785 where embedders will be able to use any build of Wasmtime and configure at runtime how trap handling should happen. Functionally this commit additionally registers a `pthread_atfork` handler to cause any usage of Wasmtime in the child to panic. This should help head off a known-invalid state in case anyone runs into it in the future. * Fix build on non-macOS
This commit effectively reverts #2817. Currently `ucontext_t` has both the wrong size and the wrong alignment for aarch64-apple-darwin which causes problems for users referencing the structure [1]. The issue linked from #2817 claimed that it fixed #2812 but that's still an issue where FFI warnings are still emitted for usage of `ucontext_t` due to its transitive usage of `u128`. I'm not sure how to fix #2812 myself, but given that #2817 doesn't appear to solve its original intent and additionally the size/align are currently wrong this commit reverts in the meantime. [1]: bytecodealliance/wasmtime#6785 (comment)
…ohnTitor Fix size/align of `ucontext_t` on aarch64-apple-darwin This commit effectively reverts #2817. Currently `ucontext_t` has both the wrong size and the wrong alignment for aarch64-apple-darwin which causes problems for users referencing the structure [1]. The issue linked from #2817 claimed that it fixed #2812 but that's still an issue where FFI warnings are still emitted for usage of `ucontext_t` due to its transitive usage of `u128`. I'm not sure how to fix #2812 myself, but given that #2817 doesn't appear to solve its original intent and additionally the size/align are currently wrong this commit reverts in the meantime. [1]: bytecodealliance/wasmtime#6785 (comment)
…ohnTitor Fix size/align of `ucontext_t` on aarch64-apple-darwin This commit effectively reverts #2817. Currently `ucontext_t` has both the wrong size and the wrong alignment for aarch64-apple-darwin which causes problems for users referencing the structure [1]. The issue linked from #2817 claimed that it fixed #2812 but that's still an issue where FFI warnings are still emitted for usage of `ucontext_t` due to its transitive usage of `u128`. I'm not sure how to fix #2812 myself, but given that #2817 doesn't appear to solve its original intent and additionally the size/align are currently wrong this commit reverts in the meantime. [1]: bytecodealliance/wasmtime#6785 (comment)
…6793) Currently the `libc` crate has an incorrect definition of `ucontext_t` for this platform which is causing alignment issues when it's used. This fixes [this issue][1] and the `posix-signals-on-macos` feature on this platform. [1]: bytecodealliance#6785 (comment)
* Configure Mach ports vs signals via `Config` This commit adds a `Config::macos_use_mach_ports` configuration option to replace the old `posix-signals-on-macos` compile-time Cargo feature. This'll make Wasmtime a tad larger on macOS but likely negligibly so. Otherwise this is intended to provide a resolution to bytecodealliance#6785 where embedders will be able to use any build of Wasmtime and configure at runtime how trap handling should happen. Functionally this commit additionally registers a `pthread_atfork` handler to cause any usage of Wasmtime in the child to panic. This should help head off a known-invalid state in case anyone runs into it in the future. * Fix build on non-macOS
This commit effectively reverts rust-lang#2817. Currently `ucontext_t` has both the wrong size and the wrong alignment for aarch64-apple-darwin which causes problems for users referencing the structure [1]. The issue linked from rust-lang#2817 claimed that it fixed rust-lang#2812 but that's still an issue where FFI warnings are still emitted for usage of `ucontext_t` due to its transitive usage of `u128`. I'm not sure how to fix rust-lang#2812 myself, but given that rust-lang#2817 doesn't appear to solve its original intent and additionally the size/align are currently wrong this commit reverts in the meantime. [1]: bytecodealliance/wasmtime#6785 (comment)
I think everything is now handled here with a combo of what I mentioned above:
I'm going to close this now but @thibaultcha if there's anything else please comment here! |
Hello,
Avid Wasmtime users in our Nginx embedding, we are facing an assertion failure when running it with Wasmtime on macos (x86 and arm64):
This assertion systematically fails with Wasmtime 8.0.1 (and probably earlier) to 11.0.1 when Nginx is configured with daemon on (which ends here in Nginx forking into a background process). It seems it occurs in our call to
wasmtime_linker_instantiate
after having initialized an engine and a single store.Not forking the master process (
daemon off
) seems to be working fine, even in forked worker processes (managed by the master process in foreground); instances are created and behave as expected in the Nginx worker processes.The assertion failure specifically occurs once the master process has forked itself into a background process with
daemon on
.In summary:
daemon off
: master process (foreground) ->wasmtime_linker_instantiate
(temporary instance) -> fork() -> worker processes ->wasmtime_linker_instantiate
(worker instances processing requests).daemon on
: master process (foreground) -> fork() (background daemon) ->wasmtime_linker_instantiate
(temporary instance) -> assertion failure, no instance created, and no chance to fork() into worker processes.In the past we used to have a CI/CD pipeline with macos targets and Wasmtime used to work fine; but this CI/CD pipeline was taken down, and this bit us in the last few days. It seems like older macos work on Wasmtime has something to do with it, but I know nothing of the macos system interface...
So far the root cause of this assertion failure has eluded us; probably state that aren't being carried over to the forked process or something like this. Could someone shed some light on what may be at cause here?
Thanks!
The text was updated successfully, but these errors were encountered: