-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazily populate a store's trampoline map #3742
Lazily populate a store's trampoline map #3742
Conversation
This commit is another installment of "how fast can we make instantiation". Currently when instantiating a module with many function imports each function, typically from the host, is inserted into the store. This insertion process stores the `VMTrampoline` for the host function in a side table so it can be looked up later if the host function is called through the `Func` interface. This insertion process, however, involves a hash map insertion which can be relatively expensive at the scale of the rest of the instantiation process. The optimization implemented in this commit is to avoid inserting trampolines into the store at `Func`-insertion-time (aka instantiation time) and instead only lazily populate the map of trampolines when needed. The theory behind this is that almost all `Func` instances that are called indirectly from the host are actually wasm functions, not host-defined functions. This means that they already don't need to go through the map of host trampolines and can instead be looked up from the module they're defined in. With the assumed rarity of host functions making `lookup_trampoline` a bit slower seems ok. The `lookup_trampoline` function will now, on a miss from the wasm modules and `host_trampolines` map, lazily iterate over the functions within the store and insert trampolines into the `host_trampolines` map. This process will eventually reach something which matches the function provided because it should at least hit the same host function. The relevant `lookup_trampoline` now sports a new documentation block explaining all this as well for future readers. Concretely this commit speeds up instantiation of an empty module with 100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40% improvement.
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comments! Very helpful. I nitpicked a bunch, but I hope my nitpicks should collectively make reading this documentation even more helpful for future spelunkers.
.skip(self.host_func_trampolines_registered) | ||
{ | ||
self.host_func_trampolines_registered += 1; | ||
self.host_trampolines.insert(f.sig_index(), f.trampoline()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.host_trampolines.insert(f.sig_index(), f.trampoline()); | |
let old_entry = self.host_trampolines.insert(f.sig_index(), f.trampoline()); | |
debug_assert!(old_entry.is_none()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually doing this causes a test failure because we might find duplicates of signatures we're not looking for as the array of functions is scanned.
* Lazily populate a store's trampoline map This commit is another installment of "how fast can we make instantiation". Currently when instantiating a module with many function imports each function, typically from the host, is inserted into the store. This insertion process stores the `VMTrampoline` for the host function in a side table so it can be looked up later if the host function is called through the `Func` interface. This insertion process, however, involves a hash map insertion which can be relatively expensive at the scale of the rest of the instantiation process. The optimization implemented in this commit is to avoid inserting trampolines into the store at `Func`-insertion-time (aka instantiation time) and instead only lazily populate the map of trampolines when needed. The theory behind this is that almost all `Func` instances that are called indirectly from the host are actually wasm functions, not host-defined functions. This means that they already don't need to go through the map of host trampolines and can instead be looked up from the module they're defined in. With the assumed rarity of host functions making `lookup_trampoline` a bit slower seems ok. The `lookup_trampoline` function will now, on a miss from the wasm modules and `host_trampolines` map, lazily iterate over the functions within the store and insert trampolines into the `host_trampolines` map. This process will eventually reach something which matches the function provided because it should at least hit the same host function. The relevant `lookup_trampoline` now sports a new documentation block explaining all this as well for future readers. Concretely this commit speeds up instantiation of an empty module with 100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40% improvement. * Review comments * Remove debug assert
* Lazily populate a store's trampoline map This commit is another installment of "how fast can we make instantiation". Currently when instantiating a module with many function imports each function, typically from the host, is inserted into the store. This insertion process stores the `VMTrampoline` for the host function in a side table so it can be looked up later if the host function is called through the `Func` interface. This insertion process, however, involves a hash map insertion which can be relatively expensive at the scale of the rest of the instantiation process. The optimization implemented in this commit is to avoid inserting trampolines into the store at `Func`-insertion-time (aka instantiation time) and instead only lazily populate the map of trampolines when needed. The theory behind this is that almost all `Func` instances that are called indirectly from the host are actually wasm functions, not host-defined functions. This means that they already don't need to go through the map of host trampolines and can instead be looked up from the module they're defined in. With the assumed rarity of host functions making `lookup_trampoline` a bit slower seems ok. The `lookup_trampoline` function will now, on a miss from the wasm modules and `host_trampolines` map, lazily iterate over the functions within the store and insert trampolines into the `host_trampolines` map. This process will eventually reach something which matches the function provided because it should at least hit the same host function. The relevant `lookup_trampoline` now sports a new documentation block explaining all this as well for future readers. Concretely this commit speeds up instantiation of an empty module with 100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40% improvement. * Review comments * Remove debug assert
This commit is another installment of "how fast can we make
instantiation". Currently when instantiating a module with many function
imports each function, typically from the host, is inserted into the
store. This insertion process stores the
VMTrampoline
for the hostfunction in a side table so it can be looked up later if the host
function is called through the
Func
interface. This insertion process,however, involves a hash map insertion which can be relatively expensive
at the scale of the rest of the instantiation process.
The optimization implemented in this commit is to avoid inserting
trampolines into the store at
Func
-insertion-time (aka instantiationtime) and instead only lazily populate the map of trampolines when
needed. The theory behind this is that almost all
Func
instances thatare called indirectly from the host are actually wasm functions, not
host-defined functions. This means that they already don't need to go
through the map of host trampolines and can instead be looked up from
the module they're defined in. With the assumed rarity of host functions
making
lookup_trampoline
a bit slower seems ok.The
lookup_trampoline
function will now, on a miss from the wasmmodules and
host_trampolines
map, lazily iterate over the functionswithin the store and insert trampolines into the
host_trampolines
map.This process will eventually reach something which matches the function
provided because it should at least hit the same host function. The
relevant
lookup_trampoline
now sports a new documentation blockexplaining all this as well for future readers.
Concretely this commit speeds up instantiation of an empty module with
100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40%
improvement.