Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160

cfallin · 2024-03-17T20:23:18Z

From this comment: it would be great to have an optimization such that

if a table's default funcref is null, don't do lazy-init, and make the 0 bit-pattern mean null rather than non-initialized-pointer-to-anyfunc.

This comes from a circumstance where we know we'll initialize a table slot manually before calling it (not doing so is a program error that should lead to a trap), possibly because we already have a branch for the case where it is null at the program-logic level (e.g. fallback IC). We still must trap on null somehow but #8159 can do that implicitly by detecting a segfault in a load from a NULL anyfunc pointer. The above optimization then means that we do not need the lengthy lazy-init-check sequence on loads from such tables, and we can do so without requiring any initialization at instantiation time either.

cc @alexcrichton @jameysharp

The text was updated successfully, but these errors were encountered:

…e table. This is based on discussion in bytecodealliance#8158: - We can use `call_indirect` rather than `table.get` + `call_ref`, even on typed funcrefs. TIL; updated the test! - As noted in bytecodealliance#8160, if we use a nullable typed funcref table instead (and given that we know we'll initialize a particular slot before use on the application side, so we won't actually call a null ref), and if we have a null-ref default value, we should be able to avoid the lazy table-init mechanism entirely. (Ignore the part where this module doesn't actually have any update logic that would set non-null refs anywhere; it's a compile-test, not a runtest!) Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero branches in this test.

This is based on discussion in bytecodealliance#8158: as noted in bytecodealliance#8160, if we use a nullable typed funcref table instead (and given that we know we'll initialize a particular slot before use on the application side, so we won't actually call a null ref), and if we have a null-ref default value, we should be able to avoid the lazy table-init mechanism entirely. (Ignore the part where this module doesn't actually have any update logic that would set non-null refs anywhere; it's a compile-test, not a runtest!) Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero branches in this test.

alexcrichton · 2024-03-18T14:54:34Z

I like this idea! Some implementation points I can think of off the top of my head:

This would only be applicable when there are no elem segments with non-null entries for a table.
We'd need to store runtime state for each table to know whether it contains tagged functions or non-tagged functions.
In the Module's type information we'd need to store this in a TablePlan as well so the table access codegen could take it into account as well as runtime instantiation

This is based on discussion in #8158: as noted in #8160, if we use a nullable typed funcref table instead (and given that we know we'll initialize a particular slot before use on the application side, so we won't actually call a null ref), and if we have a null-ref default value, we should be able to avoid the lazy table-init mechanism entirely. (Ignore the part where this module doesn't actually have any update logic that would set non-null refs anywhere; it's a compile-test, not a runtest!) Once #8159 is merged and #8160 is implemented, we should see zero branches in this test.

Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when SPECTRE mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.

Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.

Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, #8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement #8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.

fitzgen · 2024-03-19T21:08:49Z

See also #8002 (comment)

jameysharp · 2024-03-20T05:18:42Z

An additional requirement, I think, is that we can only do this for tables which are not exported. If another module imports a table, it has no way of knowing at compile time whether this optimization applies.

I think this technique is still broadly applicable in spite of that though.

Nick pointed out that we could build a static page full of pointers which are null except the low bit is set, and CoW-map that page repeatedly as the initial table contents, if we know that the table is initially all null. That would be compatible with codegen that can't optimize away the lazy-init check, which would make it safe to export such tables.

He also pointed out that we could use the same trick to swap the meaning of our sentinel values: a (void*)1 would then mean "uninitialized", but we could still instantiate quickly without writing a bunch of bytes into a bunch of new pages. Then we'd be able to avoid setting and clearing the lazy-init flag on every table access. When we can prove the initial values are all null, we'd map zero-filled pages instead.

We discussed many more ideas, but I'll leave it at that for now.

alexcrichton · 2024-03-20T21:09:10Z

We talked some more in the Cranelift meeting this morning again about this and I wanted to summarize our thinking here as well.

The main realization we had was that we can address the concern of "now there's two kinds of null pointers" by, well, not having two kinds of null pointers. For example initialization of of an all-null table where the null pointer is (void*) 1 would be a bulk-initialization using that value (a memset-but-with-usize-value-things). Either that or an mmap'd memfd or page or something to use VM tricks. More-or-less we realized that we could implement the optimization in this issue, fast initialization of all-null tables, and then in codegen we can omit all lazy init checks since everything is statically known to be initialized.

cfallin · 2024-03-25T20:51:50Z

As a quick data-point to drop here: in a very hacky (work-in-progress) dev branch of SpiderMonkey + weval with traditional funcptr ICs with signature checks simply removed (unsafe!), I measured the impact of the lazy-init table scheme on runtime directly and came up with ~8% -- fairly significant and makes the optimizations discussed here worthwhile, IMHO. (I'm going to "speculatively live in the glorious future" with this branch of wasmtime for all my experiments, which also removes table bounds-checks and stack limit checks, and hope that we can productionize all of this at some point!)

cfallin mentioned this issue Mar 17, 2024

Wasm ICs / typed-funcrefs test: switch to nullable table. #8161

Merged

jameysharp mentioned this issue Mar 18, 2024

cranelift-wasm: Attach table OOB traps to loads/stores #8171

Merged

fitzgen mentioned this issue Mar 19, 2024

Remove lazy funcref table init checking from codegen for [return_]call_indirect #8002

Open

jameysharp mentioned this issue Mar 20, 2024

Read-only funcref tables #8195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160

Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160

cfallin commented Mar 17, 2024

alexcrichton commented Mar 18, 2024

fitzgen commented Mar 19, 2024

jameysharp commented Mar 20, 2024

alexcrichton commented Mar 20, 2024

cfallin commented Mar 25, 2024

Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160

Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160

Comments

cfallin commented Mar 17, 2024

alexcrichton commented Mar 18, 2024

fitzgen commented Mar 19, 2024

jameysharp commented Mar 20, 2024

alexcrichton commented Mar 20, 2024

cfallin commented Mar 25, 2024