Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't check for null in call_indirect and call_ref #8159

Merged

Conversation

alexcrichton
Copy link
Member

This PR is an implementation of #5291 to slightly optimize the lowering of call_indirect and call_ref in Wasmtime. Explicitly checks for null function pointers are no longer present and instead we let a segfault happen when loading from a null function pointer. This segfault is caught and the relevant instruction is annotated with the appropriate trap code.

This support first starts by refactoring the MemFlags API to no longer purely be flags but instead be a mixture of flags and "flag regions". The vmctx/heap/table alias regions are bundled into two bits now and the various trap-related bits are now bundled into four bits. This enables putting arbitrary trap codes in a MemFlags so long as they aren't TrapCode::User(_).

Closes #5291

@alexcrichton alexcrichton requested review from a team as code owners March 17, 2024 18:28
@alexcrichton alexcrichton requested review from cfallin and fitzgen and removed request for a team March 17, 2024 18:28
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:wasm wasmtime:api Related to the API of the `wasmtime` crate itself labels Mar 17, 2024
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 17, 2024
…e table.

This is based on discussion in bytecodealliance#8158:

- We can use `call_indirect` rather than `table.get` + `call_ref`, even
  on typed funcrefs. TIL; updated the test!

- As noted in bytecodealliance#8160, if we use a nullable typed funcref table instead
  (and given that we know we'll initialize a particular slot before use
  on the application side, so we won't actually call a null ref), and if
  we have a null-ref default value, we should be able to avoid the lazy
  table-init mechanism entirely.

(Ignore the part where this module doesn't actually have any update
logic that would set non-null refs anywhere; it's a compile-test, not a
runtest!)

Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero
branches in this test.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 17, 2024
This is based on discussion in bytecodealliance#8158: as noted in bytecodealliance#8160, if we use a
nullable typed funcref table instead (and given that we know we'll
initialize a particular slot before use on the application side, so we
won't actually call a null ref), and if we have a null-ref default
value, we should be able to avoid the lazy table-init mechanism
entirely.

(Ignore the part where this module doesn't actually have any update
logic that would set non-null refs anywhere; it's a compile-test, not a
runtest!)

Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero
branches in this test.
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice!

cranelift/codegen/src/ir/memflags.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would kind of prefer to land the MemFlags changes separately from the "don't explicitly check for null function pointers" changes, if that's not too much trouble to split up. The commit history here is almost right for doing that with a quick rebase but I think only part of the "Trim the MemFlags API" commit applies.

I'm interested in whether you or Chris or anyone else have opinions on the other comments I've left below too.

crates/cranelift/src/func_environ.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ir/memflags.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ir/memflags.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

prefer to land the MemFlags changes separately

Certainly!

github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2024
This is based on discussion in #8158: as noted in #8160, if we use a
nullable typed funcref table instead (and given that we know we'll
initialize a particular slot before use on the application side, so we
won't actually call a null ref), and if we have a null-ref default
value, we should be able to avoid the lazy table-init mechanism
entirely.

(Ignore the part where this module doesn't actually have any update
logic that would set non-null refs anywhere; it's a compile-test, not a
runtest!)

Once #8159 is merged and #8160 is implemented, we should see zero
branches in this test.
This commit uses the support from bytecodealliance#8162 to skip null function pointer
checks when performing an indirect call. Instead of an explicit check
the segfault from accessing the null function pointer is caught and
annotated with the appropriate trap.

Closes bytecodealliance#5291
@alexcrichton
Copy link
Member Author

Updated and rebased!

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very satisfying! Also TIL that you can do static asserts by putting assert! in a const-eval block.

@alexcrichton alexcrichton added this pull request to the merge queue Mar 18, 2024
Merged via the queue into bytecodealliance:main with commit fbbeaf7 Mar 18, 2024
19 checks passed
@alexcrichton alexcrichton deleted the no-null-chekc-functions branch March 18, 2024 19:55
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jun 27, 2024
In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.

Moreover, in bytecodealliance#8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.

In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in bytecodealliance#8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jun 27, 2024
In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.

Moreover, in bytecodealliance#8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.

In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in bytecodealliance#8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Jun 27, 2024
In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.

Moreover, in bytecodealliance#8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.

In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in bytecodealliance#8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
github-merge-queue bot pushed a commit that referenced this pull request Jun 27, 2024
In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.

Moreover, in #8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.

In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in #8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:wasm cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove null-check in generated code for call_indirect (and call_ref soon)
4 participants