Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160

Open
cfallin opened this issue Mar 17, 2024 · 5 comments

Comments

@cfallin
Copy link
Member

cfallin commented Mar 17, 2024

From this comment: it would be great to have an optimization such that

if a table's default funcref is null, don't do lazy-init, and make the 0 bit-pattern mean null rather than non-initialized-pointer-to-anyfunc.

This comes from a circumstance where we know we'll initialize a table slot manually before calling it (not doing so is a program error that should lead to a trap), possibly because we already have a branch for the case where it is null at the program-logic level (e.g. fallback IC). We still must trap on null somehow but #8159 can do that implicitly by detecting a segfault in a load from a NULL anyfunc pointer. The above optimization then means that we do not need the lengthy lazy-init-check sequence on loads from such tables, and we can do so without requiring any initialization at instantiation time either.

cc @alexcrichton @jameysharp

cfallin added a commit to cfallin/wasmtime that referenced this issue Mar 17, 2024
…e table.

This is based on discussion in bytecodealliance#8158:

- We can use `call_indirect` rather than `table.get` + `call_ref`, even
  on typed funcrefs. TIL; updated the test!

- As noted in bytecodealliance#8160, if we use a nullable typed funcref table instead
  (and given that we know we'll initialize a particular slot before use
  on the application side, so we won't actually call a null ref), and if
  we have a null-ref default value, we should be able to avoid the lazy
  table-init mechanism entirely.

(Ignore the part where this module doesn't actually have any update
logic that would set non-null refs anywhere; it's a compile-test, not a
runtest!)

Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero
branches in this test.
cfallin added a commit to cfallin/wasmtime that referenced this issue Mar 17, 2024
This is based on discussion in bytecodealliance#8158: as noted in bytecodealliance#8160, if we use a
nullable typed funcref table instead (and given that we know we'll
initialize a particular slot before use on the application side, so we
won't actually call a null ref), and if we have a null-ref default
value, we should be able to avoid the lazy table-init mechanism
entirely.

(Ignore the part where this module doesn't actually have any update
logic that would set non-null refs anywhere; it's a compile-test, not a
runtest!)

Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero
branches in this test.
@alexcrichton
Copy link
Member

I like this idea! Some implementation points I can think of off the top of my head:

  • This would only be applicable when there are no elem segments with non-null entries for a table.
  • We'd need to store runtime state for each table to know whether it contains tagged functions or non-tagged functions.
  • In the Module's type information we'd need to store this in a TablePlan as well so the table access codegen could take it into account as well as runtime instantiation

github-merge-queue bot pushed a commit that referenced this issue Mar 18, 2024
This is based on discussion in #8158: as noted in #8160, if we use a
nullable typed funcref table instead (and given that we know we'll
initialize a particular slot before use on the application side, so we
won't actually call a null ref), and if we have a null-ref default
value, we should be able to avoid the lazy table-init mechanism
entirely.

(Ignore the part where this module doesn't actually have any update
logic that would set non-null refs anywhere; it's a compile-test, not a
runtest!)

Once #8159 is merged and #8160 is implemented, we should see zero
branches in this test.
jameysharp added a commit to jameysharp/wasmtime that referenced this issue Mar 18, 2024
Currently, every access to a table element does a bounds-check with a
conditional branch to a block that explicitly traps.

Instead, when SPECTRE mitigations are enabled, let's change the address
computation to return a null pointer for out-of-bounds accesses, and
then allow the subsequent load or store to trap.

This is less code in that case since we can reuse instructions we needed
anyway.

In addition, when the table has constant size and the element index is a
constant and mid-end optimization is enabled, this allows the
bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us
optimize away the select_spectre_guard instruction in this case too.

Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be
almost as fast as native indirect function calls.
jameysharp added a commit to jameysharp/wasmtime that referenced this issue Mar 19, 2024
Currently, every access to a table element does a bounds-check with a
conditional branch to a block that explicitly traps.

Instead, when Spectre mitigations are enabled, let's change the address
computation to return a null pointer for out-of-bounds accesses, and
then allow the subsequent load or store to trap.

This is less code in that case since we can reuse instructions we needed
anyway.

We return the MemFlags that the memory access should use, in addition to
the address it should access. That way we don't record trap metadata on
memory access instructions which can't actually trap due to being
preceded by a `trapnz`-based bounds check, when Spectre mitigations are
disabled.

In addition, when the table has constant size and the element index is a
constant and mid-end optimization is enabled, this allows the
bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us
optimize away the select_spectre_guard instruction in this case too.

Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be
almost as fast as native indirect function calls.
github-merge-queue bot pushed a commit that referenced this issue Mar 19, 2024
Currently, every access to a table element does a bounds-check with a
conditional branch to a block that explicitly traps.

Instead, when Spectre mitigations are enabled, let's change the address
computation to return a null pointer for out-of-bounds accesses, and
then allow the subsequent load or store to trap.

This is less code in that case since we can reuse instructions we needed
anyway.

We return the MemFlags that the memory access should use, in addition to
the address it should access. That way we don't record trap metadata on
memory access instructions which can't actually trap due to being
preceded by a `trapnz`-based bounds check, when Spectre mitigations are
disabled.

In addition, when the table has constant size and the element index is a
constant and mid-end optimization is enabled, this allows the
bounds-check to be constant folded away. Later, #8139 will let us
optimize away the select_spectre_guard instruction in this case too.

Once we also implement #8160, `tests/disas/typed-funcrefs.wat` should be
almost as fast as native indirect function calls.
@fitzgen
Copy link
Member

fitzgen commented Mar 19, 2024

See also #8002 (comment)

@jameysharp
Copy link
Contributor

An additional requirement, I think, is that we can only do this for tables which are not exported. If another module imports a table, it has no way of knowing at compile time whether this optimization applies.

I think this technique is still broadly applicable in spite of that though.

Nick pointed out that we could build a static page full of pointers which are null except the low bit is set, and CoW-map that page repeatedly as the initial table contents, if we know that the table is initially all null. That would be compatible with codegen that can't optimize away the lazy-init check, which would make it safe to export such tables.

He also pointed out that we could use the same trick to swap the meaning of our sentinel values: a (void*)1 would then mean "uninitialized", but we could still instantiate quickly without writing a bunch of bytes into a bunch of new pages. Then we'd be able to avoid setting and clearing the lazy-init flag on every table access. When we can prove the initial values are all null, we'd map zero-filled pages instead.

We discussed many more ideas, but I'll leave it at that for now.

@alexcrichton
Copy link
Member

We talked some more in the Cranelift meeting this morning again about this and I wanted to summarize our thinking here as well.

The main realization we had was that we can address the concern of "now there's two kinds of null pointers" by, well, not having two kinds of null pointers. For example initialization of of an all-null table where the null pointer is (void*) 1 would be a bulk-initialization using that value (a memset-but-with-usize-value-things). Either that or an mmap'd memfd or page or something to use VM tricks. More-or-less we realized that we could implement the optimization in this issue, fast initialization of all-null tables, and then in codegen we can omit all lazy init checks since everything is statically known to be initialized.

@cfallin
Copy link
Member Author

cfallin commented Mar 25, 2024

As a quick data-point to drop here: in a very hacky (work-in-progress) dev branch of SpiderMonkey + weval with traditional funcptr ICs with signature checks simply removed (unsafe!), I measured the impact of the lazy-init table scheme on runtime directly and came up with ~8% -- fairly significant and makes the optimizations discussed here worthwhile, IMHO. (I'm going to "speculatively live in the glorious future" with this branch of wasmtime for all my experiments, which also removes table bounds-checks and stack limit checks, and hope that we can productionize all of this at some point!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants