Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten main.c file? #1434

Closed
wants to merge 44 commits into from
Closed

Emscripten main.c file? #1434

wants to merge 44 commits into from

Conversation

kripken
Copy link
Member

@kripken kripken commented May 19, 2020

This is the main() file I've used in my emscripten testing so far. It's now at the point where it can run the whole benchmark suite, which includes file handling (some tests read a file, and sqlite also does seeks and writes), setjmp/longjmp, and a few other nontrivial things.

I'm a little unsure which repo it should live in, so I just opened a draft PR here. Do people think it would make sense here, or in the emscripten repo?

A related question might be where to test the emscripten+wasm2c mode. Getting it passing the emscripten test suite seems useful. Or maybe we should add some emcc tests here?

@binji
Copy link
Member

binji commented May 19, 2020

Looks nice! I think this is reasonable to have here, though I worry a little about version skew with emscripten. It would be nice to have some basic tests, but I'm not sure we'd want to run the full emscripten testsuite here.

@binji
Copy link
Member

binji commented May 19, 2020

@sbc100 wdyt?

@sbc100
Copy link
Member

sbc100 commented May 19, 2020

I'm currently working on the WASI implementation which should allow STANDALONE_WASM binaries to run the wasm-interp.

Eventually it would be nice to unify those two things.

Specifically, in my implementation I use uvwasi, which is very nice way to expose the wasi APIs entry points and is cross platform and fairly well tested. it does add a small sandbox layer out course which wasm2c users might not want.

I think there are a few questions we need to ask:

  1. Can we unify the wasm2c and wasm-interp implementations of syscalls/WASI
    • it would certainly be great if we could but I'm unsure of the complexity involved.
  2. Do we want a completely un-sandboxed mode.
    • If yes, can uvwasi provide that for us rather than us re-implementing it here.
  3. Do we want to include any emscripten specific syscalls or should we limit this to just WASI (and focus of improving STANDALONE_WASM mode in emscripten).

I worry in particular about the duplicate work and maintance cost of maintaining our own WASI implementation over time as it evolves. It would be nice to combine efforts with uvwasi and be able to benefit from their work.

@sbc100
Copy link
Member

sbc100 commented May 19, 2020

Personally, I don't think we want to be pulling emscripten-specific stuff into wabt. Testing emscripten under wasm2c would be something we could do in the emscripten repro and report bugs back here.

@sunfishcode
Copy link
Member

I'm currently working on the WASI implementation which should allow STANDALONE_WASM binaries to run the wasm-interp.

It could also enable wasm-interp to run binaries produced by wasi-sdk.

2. Do we want a completely un-sandboxed mode.

Just as there isn't a way to have a completely un-sandboxed core wasm module, even though some users would find it useful because it'd make it easier to share memory between wasm and host, I encourage wabt to not provide a completely unsandboxed API either, even though some users may find it useful. An API sandbox is sometimes inconvenient, as is the core sandbox, but if we focus on finding solutions to problems that work with the sandbox instead of making it easy to disable the sandbox, we can make the whole ecosystem better for people who do need the sandbox.

@kripken
Copy link
Member Author

kripken commented May 20, 2020

An API sandbox is sometimes inconvenient, as is the core sandbox, but if we focus on finding solutions to problems that work with the sandbox instead of making it easy to disable the sandbox, we can make the whole ecosystem better for people who do need the sandbox.

I think both routes are important. In emscripten we have NODERAWFS which just enables unsandboxed file access, and it's pretty useful. We may want more such things (my use case of "portable" llvm and binaryen builds using wasm2c may need more). But I agree completely that we should push the sandboxing approach as far as we can take it!

@sbc100
Copy link
Member

sbc100 commented May 20, 2020

I'm currently working on the WASI implementation which should allow STANDALONE_WASM binaries to run the wasm-interp.

It could also enable wasm-interp to run binaries produced by wasi-sdk.

Yes! Indeed, as of #1430 we can pass all the wask-sdk micro-tests (not say much I know but milestone non-the-less).

2. Do we want a completely un-sandboxed mode.

Just as there isn't a way to have a completely un-sandboxed core wasm module, even though some users would find it useful because it'd make it easier to share memory between wasm and host, I encourage wabt to not provide a completely unsandboxed API either, even though some users may find it useful. An API sandbox is sometimes inconvenient, as is the core sandbox, but if we focus on finding solutions to problems that work with the sandbox instead of making it easy to disable the sandbox, we can make the whole ecosystem better for people who do need the sandbox.

I believe the wasm2c already has an unsandboxed-memory mode, which is mostly interesting for experimenting and bench-marking I guess.

I would agree that we don't want to encourage such things in the wild but I do think its reasonable to have flags that we can use to turn these things on and off. I mean even chrome itself has a --no-sandbox flag.

@binji
Copy link
Member

binji commented May 20, 2020

I believe the wasm2c already has an unsandboxed-memory mode, which is mostly interesting for experimenting and bench-marking I guess.

Not originally, @kripken added this recently here: #1432

@sunfishcode
Copy link
Member

In emscripten we have NODERAWFS which just enables unsandboxed file access, and it's pretty useful.

Do you have a sense of what features, specifically, are useful, that are unsupported? It would be great to learn more about how current sandboxing mechanisms are insufficient for real-world use cases.

But I agree completely that we should push the sandboxing approach as far as we can take it!

A great way to push the sandboxing approach as far as we can take it is to enable it and find out where it's insufficient. 😄

@binji
Copy link
Member

binji commented May 20, 2020

Personally, I don't think we want to be pulling emscripten-specific stuff into wabt.

Thinking about this some more, I agree. This is pretty useful for wasm2c users, but perhaps the better solution would be to point people to the emscripten repo here, rather than including the source.

Can we unify the wasm2c and wasm-interp implementations of syscalls/WASI

If we're generating bindings for wasm-interp, we should be able to do it for wasm2c too. Using uvwasi as a base seems like a good way to go.

Do we want a completely un-sandboxed mode

If we do, we definitely should encourage it. As @sbc100 said, I think it's useful for benchmarking (memory checking overhead, etc.). But I'm not sure if this usage translates to a completely unsandboxed WASI too.

Do we want to include any emscripten specific syscalls or should we limit this to just WASI

emscripten would be convenient, but it does feel a bit odd to include here given that the emscripten API isn't standardized.

@sbc100
Copy link
Member

sbc100 commented May 20, 2020

@kripken is it possible for this to live in the emscripten-repo for now? Seems like that makes for sense (at least for now).

@kripken
Copy link
Member Author

kripken commented May 20, 2020

@sunfishcode

Do you have a sense of what features, specifically, are useful, that are unsupported? It would be great to learn more about how current sandboxing mechanisms are insufficient for real-world use cases.

Sure, here is my main use case: Right now emscripten ships an emsdk to users with binary builds for 3 platforms, windows, mac, and linux. That includes LLVM, Binaryen, and on some platforms Python and other stuff. Those builds cover most but not all users, since some people's linux distro can't run our linux builds, or they are on BSD, or whatever. We suggest those people build from source, but instead, how about if we gave them a build that runs in node.js, or a big C file? Both are those are likely to work for many users and be much simpler than building themselves.

Another benefit of such builds is we can use newer compilers. It would be nice if binaryen could use c++20 or maybe be a mixture of various languages, some of which use very new compilers. It's easier to do that if we have more easy ways for users to get builds from us, so they aren't stuck.

For such builds to be feasible, they must be of similar performance and capability to our normal builds. It looks like node.js is not currently viable as code caching of wasm worked in node 12, but fails in node 14 (due to v8 APIs changing, I am told). Recompiling clang on every invocation would not be fun! That's why I'm more focused on wasm2c atm (but long-term, I think node - or deno 😉 - is better).

For wasm2c to work, I need:

  • A way to disable memory checks. It'll run in an OS process anyhow! And benchmarks show it's a huge difference.
  • Support for setjmp, exceptions, etc. - right now that means supporting a bunch of emscripten APIs like invoke*, which is why you see those in the main.c here.
  • Direct OS access, in particular,
    • File access. Probably WASI can do most of this, but it also includes things like file locking (which we use to ensure a single invocation of emcc accesses the cache, across the entire OS), which I'm not sure what API is needed.
    • Process creation. This is necessary for Python, whose process pool support we use. I'm not sure exactly what native APIs it uses.
    • Network access. We fetch files over https using Python.

I think that covers it. Currently we already have direct node.js access to some things like files, and maybe the same flag could cover wasm2c. As I find the specific APIs, I can let you know what's missing from WASI in more detail.

A great way to push the sandboxing approach as far as we can take it is to enable it and find out where it's insufficient. smile

Definitely! I hope I didn't come across as negative in any way about that. I look forward to a future where practically everything is sandboxed!

@sbc100

Yeah, I'm happy for this to live in emscripten. Whatever people prefer. Sounds like we are agreed there, so I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants