Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for fast startup by restoring pre-initialized memory #15402

Closed
kg opened this issue Nov 1, 2021 · 7 comments · Fixed by #16011
Closed

Support for fast startup by restoring pre-initialized memory #15402

kg opened this issue Nov 1, 2021 · 7 comments · Fixed by #16011
Assignees

Comments

@kg
Copy link

kg commented Nov 1, 2021

For WASM applications it's theoretically feasible to generate a full heap offline and then do fast application startup by loading that data directly into memory at startup instead of running lots of init code. Is this something that emscripten could be made compatible with? Right now things like C++ constructors and the VFS seem to assume that an application is always being started fresh and there's no obvious way to adjust everything so that this could work.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 1, 2021

We have setting called EVAL_CTORS that can do some parts of this at compile time:

emscripten/src/settings.js

Lines 1540 to 1575 in bbc208c

// This tries to evaluate global ctors at compile-time, applying their effects
// into the mem init file. This saves running code during startup, and also
// allows removing the global ctor functions and other code that only they used,
// so this is also good for reducing code size. However, this does make the
// compile step much slower.
//
// This basically runs the ctors during compile time, seeing if they execute
// safely in a sandbox. Any ffi access out of wasm causes failure, as it could
// do something nondeterministic and/or alter some other state we don't see. If
// all the global ctor does is pure computation inside wasm, it should be ok.
// Run with EMCC_DEBUG=1 in the env to see logging, and errors when it fails to
// eval (you'll see a message, or a stack trace; in the latter case, the
// functions on the stack should give you an idea of what ffi was called and
// why, and perhaps you can refactor your code to avoid it, e.g., remove
// mallocs, printfs in global ctors).
//
// This optimization can increase the size of the mem init file, because ctors
// can write to memory that would otherwise be in a zeroinit area. This may not
// be a significant increase after gzip, if there are mostly zeros in there, and
// in any case the mem init increase would be offset by a code size decrease.
// (Unless you have a small ctor that writes 'random' data to memory, which
// would reduce little code but add potentially lots of uncompressible data.)
//
// LLVM's GlobalOpt *almost* does this operation. It does in simple cases, where
// LLVM IR is not too complex for its logic to evaluate, but it isn't powerful
// enough for e.g. libc++ iostream ctors. It is just hard to do at the LLVM IR
// level - LLVM IR is complex and getting more complex, this would require
// GlobalOpt to have a full interpreter, plus a way to write back into LLVM IR
// global objects. At the wasm level, however, everything has been lowered
// into a simple low level, and we also just need to write bytes into an array,
// so this is easy for us to do, but not for LLVM. A further issue for LLVM is
// that it doesn't know that we will not link in further code, so it only tries
// to optimize ctors with lowest priority. We do know that, and can optimize all
// the ctors.
// [link]
var EVAL_CTORS = 0;

However, this settings I believe was disabled with under the upstream llvm backend and needs to be revived and re-enabled.

As for more advanced versions of snapshotting and I think best hope would be integration with something like https://github.com/bytecodealliance/wizer.

@kripken
Copy link
Member

kripken commented Nov 1, 2021

EVAL_CTORS basically does that, yes. The binaryen wasm-ctor-eval tool does the actual work.

It was disabled at some point due to integration with malloc/sbrk, #9527 Those are no longer an issue, so it could be revived if it's useful. And extending it to do work in main and not just in global ctors would be easy.

The reason I haven't focused on it myself is that the benefit usually comes with a tradeoff in larger size, and I wasn't aware of current use cases. I could try to find time though if it seems like it could be useful?

@kg
Copy link
Author

kg commented Nov 1, 2021

If setting EVAL_CTORS will disable the emscripten logic that interferes, we may be able to do the rest ourselves. I'll let you know if we start testing it out and hit anything, we last attempted this mid-2020 and hit roadblocks then.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 1, 2021

I'm not sure what you mean by "disable the emscripten logic that interferes" but I don't think EVAL_CTORS effects the generated code at all. All it does it take a wasm file an produce anther wasm file that does potentially less work at startup, and it has fairly limited powers... IIRC its mostly able to remove static constructors that do just set memory locations and not much else.

@kripken
Copy link
Member

kripken commented Nov 1, 2021

I opened #15403 to test if we can re-enable that option. I believe it should safe now (as @sbc100 noted in #9527 (comment), we don't dynamically allocate in JS, and sbrk is now 100% in wasm, etc.), and the one known failure does pass for me locally.

@kripken kripken self-assigned this Jan 4, 2022
@kripken
Copy link
Member

kripken commented Jan 4, 2022

Heads up that I'll be looking into this soon. I've identified what I think are the main issues preventing this from working well now, and I intend to work on them in the coming weeks.

@kripken
Copy link
Member

kripken commented Jan 13, 2022

PR open: #16011

kripken added a commit that referenced this issue Jan 14, 2022
This updates us to use Binaryen's new version of wasm-ctor-eval, which can now
do a lot more things, like eval just part of a function, eval to globals, etc. That plus
other changes on the emscripten side that move more things like sbrk into pure
wasm means that we can eval a lot more code.

Previously -Oz would enable EVAL_CTORS. That was pretty dangerous as
often it does not help code size. You really just need to run with the option and
then measure the code size change vs the startup speed improvement. So this
PR makes us no longer do anything automatically - you must manually build with
-s EVAL_CTORS.

A new mode EVAL_CTORS=2 is also added. This enables wasm-ctor-eval's
new --ignore-external-input flag, which ignores the environment, params to
main, etc. This is unsafe, and probably we should have separate options for
these things, but for now this seems useful for experimentation.

Tested by running all of wasm2 with EVAL_CTORS=2 enabled, and then
ignoring the failures that are expected (things reading from argv, for example).
Also I ran around 200,000 fuzzer iterations on binaryen.

Example results on ./emcc tests/hello_libcxx.cpp -O3:

mode          | wasm size (bytes)
--------------+------------------
normal        |      136625
EVAL_CTORS-1 	|      136616
EVAL_CTORS-2 	|      133059

The output on the last one is:

trying to eval __wasm_call_ctors
  ...success on __wasm_call_ctors.
trying to eval main
  ...partial evalling successful, but stopping since could not eval: call import: wasi_snapshot_preview1.fd_write
  ...stopping

It completely evals the ctors, and in main it evals some stuff, until it reaches
a call to print to stdout.

Fixes #15402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants