Support for fast startup by restoring pre-initialized memory #15402

kg · 2021-11-01T18:24:28Z

For WASM applications it's theoretically feasible to generate a full heap offline and then do fast application startup by loading that data directly into memory at startup instead of running lots of init code. Is this something that emscripten could be made compatible with? Right now things like C++ constructors and the VFS seem to assume that an application is always being started fresh and there's no obvious way to adjust everything so that this could work.

sbc100 · 2021-11-01T19:18:31Z

We have setting called EVAL_CTORS that can do some parts of this at compile time:

emscripten/src/settings.js

Lines 1540 to 1575 in bbc208c

    
           // This tries to evaluate global ctors at compile-time, applying their effects 
        
           // into the mem init file. This saves running code during startup, and also 
        
           // allows removing the global ctor functions and other code that only they used, 
        
           // so this is also good for reducing code size. However, this does make the 
        
           // compile step much slower. 
        
           // 
        
           // This basically runs the ctors during compile time, seeing if they execute 
        
           // safely in a sandbox. Any ffi access out of wasm causes failure, as it could 
        
           // do something nondeterministic and/or alter some other state we don't see. If 
        
           // all the global ctor does is pure computation inside wasm, it should be ok. 
        
           // Run with EMCC_DEBUG=1 in the env to see logging, and errors when it fails to 
        
           // eval (you'll see a message, or a stack trace; in the latter case, the 
        
           // functions on the stack should give you an idea of what ffi was called and 
        
           // why, and perhaps you can refactor your code to avoid it, e.g., remove 
        
           // mallocs, printfs in global ctors). 
        
           // 
        
           // This optimization can increase the size of the mem init file, because ctors 
        
           // can write to memory that would otherwise be in a zeroinit area. This may not 
        
           // be a significant increase after gzip, if there are mostly zeros in there, and 
        
           // in any case the mem init increase would be offset by a code size decrease. 
        
           // (Unless you have a small ctor that writes 'random' data to memory, which 
        
           // would reduce little code but add potentially lots of uncompressible data.) 
        
           // 
        
           // LLVM's GlobalOpt *almost* does this operation. It does in simple cases, where 
        
           // LLVM IR is not too complex for its logic to evaluate, but it isn't powerful 
        
           // enough for e.g. libc++ iostream ctors. It is just hard to do at the LLVM IR 
        
           // level - LLVM IR is complex and getting more complex, this would require 
        
           // GlobalOpt to have a full interpreter, plus a way to write back into LLVM IR 
        
           // global objects.  At the wasm level, however, everything has been lowered 
        
           // into a simple low level, and we also just need to write bytes into an array, 
        
           // so this is easy for us to do, but not for LLVM. A further issue for LLVM is 
        
           // that it doesn't know that we will not link in further code, so it only tries 
        
           // to optimize ctors with lowest priority. We do know that, and can optimize all 
        
           // the ctors. 
        
           // [link] 
        
           var EVAL_CTORS = 0;

However, this settings I believe was disabled with under the upstream llvm backend and needs to be revived and re-enabled.

As for more advanced versions of snapshotting and I think best hope would be integration with something like https://github.com/bytecodealliance/wizer.

kripken · 2021-11-01T19:51:42Z

EVAL_CTORS basically does that, yes. The binaryen wasm-ctor-eval tool does the actual work.

It was disabled at some point due to integration with malloc/sbrk, #9527 Those are no longer an issue, so it could be revived if it's useful. And extending it to do work in main and not just in global ctors would be easy.

The reason I haven't focused on it myself is that the benefit usually comes with a tradeoff in larger size, and I wasn't aware of current use cases. I could try to find time though if it seems like it could be useful?

kg · 2021-11-01T19:56:28Z

If setting EVAL_CTORS will disable the emscripten logic that interferes, we may be able to do the rest ourselves. I'll let you know if we start testing it out and hit anything, we last attempted this mid-2020 and hit roadblocks then.

sbc100 · 2021-11-01T20:34:39Z

I'm not sure what you mean by "disable the emscripten logic that interferes" but I don't think EVAL_CTORS effects the generated code at all. All it does it take a wasm file an produce anther wasm file that does potentially less work at startup, and it has fairly limited powers... IIRC its mostly able to remove static constructors that do just set memory locations and not much else.

kripken · 2021-11-01T22:25:20Z

I opened #15403 to test if we can re-enable that option. I believe it should safe now (as @sbc100 noted in #9527 (comment), we don't dynamically allocate in JS, and sbrk is now 100% in wasm, etc.), and the one known failure does pass for me locally.

kripken · 2022-01-04T19:20:44Z

Heads up that I'll be looking into this soon. I've identified what I think are the main issues preventing this from working well now, and I intend to work on them in the coming weeks.

kripken · 2022-01-13T22:25:36Z

PR open: #16011

This updates us to use Binaryen's new version of wasm-ctor-eval, which can now do a lot more things, like eval just part of a function, eval to globals, etc. That plus other changes on the emscripten side that move more things like sbrk into pure wasm means that we can eval a lot more code. Previously -Oz would enable EVAL_CTORS. That was pretty dangerous as often it does not help code size. You really just need to run with the option and then measure the code size change vs the startup speed improvement. So this PR makes us no longer do anything automatically - you must manually build with -s EVAL_CTORS. A new mode EVAL_CTORS=2 is also added. This enables wasm-ctor-eval's new --ignore-external-input flag, which ignores the environment, params to main, etc. This is unsafe, and probably we should have separate options for these things, but for now this seems useful for experimentation. Tested by running all of wasm2 with EVAL_CTORS=2 enabled, and then ignoring the failures that are expected (things reading from argv, for example). Also I ran around 200,000 fuzzer iterations on binaryen. Example results on ./emcc tests/hello_libcxx.cpp -O3: mode | wasm size (bytes) --------------+------------------ normal | 136625 EVAL_CTORS-1 | 136616 EVAL_CTORS-2 | 133059 The output on the last one is: trying to eval __wasm_call_ctors ...success on __wasm_call_ctors. trying to eval main ...partial evalling successful, but stopping since could not eval: call import: wasi_snapshot_preview1.fd_write ...stopping It completely evals the ctors, and in main it evals some stuff, until it reaches a call to print to stdout. Fixes #15402

kripken mentioned this issue Nov 1, 2021

Experiment with re-enabling EVAL_CTORS #15403

Closed

kripken self-assigned this Jan 4, 2022

kripken mentioned this issue Jan 13, 2022

Reboot EVAL_CTORS with the new wasm-ctor-eval #16011

Merged

kripken closed this as completed in #16011 Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for fast startup by restoring pre-initialized memory #15402

Support for fast startup by restoring pre-initialized memory #15402

kg commented Nov 1, 2021

sbc100 commented Nov 1, 2021

kripken commented Nov 1, 2021

kg commented Nov 1, 2021

sbc100 commented Nov 1, 2021

kripken commented Nov 1, 2021

kripken commented Jan 4, 2022

kripken commented Jan 13, 2022

Support for fast startup by restoring pre-initialized memory #15402

Support for fast startup by restoring pre-initialized memory #15402

Comments

kg commented Nov 1, 2021

sbc100 commented Nov 1, 2021

kripken commented Nov 1, 2021

kg commented Nov 1, 2021

sbc100 commented Nov 1, 2021

kripken commented Nov 1, 2021

kripken commented Jan 4, 2022

kripken commented Jan 13, 2022