-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: add "fastest-runtime" profile for runtime optimization #8554
build: add "fastest-runtime" profile for runtime optimization #8554
Conversation
93a0b10
to
01781b2
Compare
01781b2
to
1d88acc
Compare
Thanks for the PR here! Is it possible perhaps for CMake to set environment variables for other steps? That's probably the best way to do this rather than adding a static profile. That way you'd be able to set Otherwise though I also wanted to poke a bit about what the benchmark is. This is just starting a program that's not actually using Wasmtime at all? It's just changing how Wasmtime itself is linked in? If so it sounds like the main issue is that Wasmtime has lots of symbols and they may be all getting processed by the dynamic linker at startup. LTO will reduce the number of symbols, but there should be other knobs to internalize symbols in an executable or dynamic library such that LTO wouldn't be required to get the same performance benefits. |
That's what we did before, but we thought that the preset could be used by other projects with similar needs.
That is correct.
This is our thinking as well; if you know of such knobs, we'd be more than happy for the hint. Our own interface to wasmtime is extremely small (but we do link to tree-sitter, which uses more of it). |
Ok sounds good, and a preset also sounds good to me! Is this on macOS, Linux, Windows, or all three? This would likely be flags to the linker invocation that creates the neovim executable itself which would be along the lines of "only export this list of symbols and no more" and that would internalize everything else. Is it possible to construct a list like that, for example is |
All three and more (as many as we can build on), with a wide range of compilers. We use only a very limited set (basically, |
Oh sorry, to clarify, is the performance regression you're seeing on all platforms? Or only some? |
So far we have mostly tested with macOS; of course the exact impact varies with platform and toolchain. |
I'm trying to test my theory with a small script but it's not too successful. The big binaries are slower but they don't get faster when removing the symbols so I don't think this going to bear fruit. Regardless though I think it's fine to land this, but on reviewing it I think it might be best to call this "lto" in the various places instead of "optimized". I want to avoid confusion between "optimized" and |
I appreciate your testing! Not married to the name, of course, but note that full LTO is only half the game here -- you also need the other settings for the "optimal" benefit. Unless you consider codegen-units to be part of LTO? Then it'd obviously make sense. |
That's true yeah. I suppose I'm mostly looking for a name that evokes something like "pull out all the stops on optimizing this I'm willing to wait for an hour". That typically includes lto and codegen-units=1. I personally consider "lto" as "take your time please optimize this as much as you can" but that's also just me. Also, question for y'all, does codegen-units=1 help much here in isolation? If that setting is dropped do you see a difference in perf? I ask because that should only affect generated code quality, not symbol visibility. If codegen-units=1 does help then the mystery deepens in my head for what's going on here (PR can land either way of course, I'm just curious now!) |
Yes it does! (We benchmarked each option -- @dundargoc could you edit the PR description to add the individual steps?) And it's very weird -- but then I don't know any other large C project that statically links a Rust dependency (and we have been quite reticent about it im the past). If LTO has the full connotation for you, that's fine with us of course! |
Odd! Well in any case happy to r+ and flag for merge with a rename. Otherwise though if y'all have a series of instructions to reproduce this slowdown I'd be happy to try to dig further in my spare time later. |
This is the benchmarking we did to determine which of the flags seemed to make a difference:
So from this we were able to see that As for the profile name, I am fine with whatever. |
Oh, right. Reproduction steps. We build neovim in this PR with the following command: make distclean #just to remove old builds
make CMAKE_EXTRA_FLAGS=-DENABLE_WASMTIME=ON DEPS_CMAKE_FLAGS=-DENABLE_WASMTIME=ON CMAKE_BUILD_TYPE=Release
cmake --install build --prefix bin
hyperfine --warmup 2 "bin/bin/nvim +'quit'" I point to a local path of wasmtime in
It's not the most graceful reproductions steps I'm afraid but that's the current tech we're working with at the moment 😅 |
"fast-runtime" sounds reasonable to me, although I might bikeshed to "fastest-runtime" perhaps to help head off confusion between the default |
And no worries! I'll try to kick the tires there and see if I can't bottom something out as to how Wasmtime is affecting things |
This is extremely useful for cases where the default optimizations just are not enough. Background: [neovim](https://github.com/neovim/neovim) is interested to add wasmtime support in neovim/neovim#28415 but we noticed that including wasmtime, even when not using wasmtime directly, heavily affects runtime performance. This is not only reflected in the increased startuptime but affects the runtime performance overall. Here are the benchmarks for startuptimes for different configurations. Important to note is that all of runtime is affected, but the startuptime is a decent proxy to measure runtime performance: ``` No wasm Time (mean ± σ): 50.5 ms ± 1.5 ms [User: 32.8 ms, System: 12.3 ms] Range (min … max): 48.3 ms … 54.4 ms 56 runs Wasm, lto=thin Time (mean ± σ): 104.9 ms ± 3.5 ms [User: 86.5 ms, System: 12.7 ms] Range (min … max): 99.5 ms … 111.1 ms 26 runs Wasm, lto=true Time (mean ± σ): 83.8 ms ± 2.5 ms [User: 65.8 ms, System: 12.1 ms] Range (min … max): 80.5 ms … 93.3 ms 31 runs Wasm, lto=true, strip="none", incremental=false, codegen-units=1, panic="abort" Time (mean ± σ): 53.1 ms ± 1.0 ms [User: 35.5 ms, System: 12.5 ms] Range (min … max): 50.6 ms … 55.5 ms 54 runs ```
1d88acc
to
5d06b1f
Compare
…dealliance#8554) This is extremely useful for cases where the default optimizations just are not enough. Background: [neovim](https://github.com/neovim/neovim) is interested to add wasmtime support in neovim/neovim#28415 but we noticed that including wasmtime, even when not using wasmtime directly, heavily affects runtime performance. This is not only reflected in the increased startuptime but affects the runtime performance overall. Here are the benchmarks for startuptimes for different configurations. Important to note is that all of runtime is affected, but the startuptime is a decent proxy to measure runtime performance: ``` No wasm Time (mean ± σ): 50.5 ms ± 1.5 ms [User: 32.8 ms, System: 12.3 ms] Range (min … max): 48.3 ms … 54.4 ms 56 runs Wasm, lto=thin Time (mean ± σ): 104.9 ms ± 3.5 ms [User: 86.5 ms, System: 12.7 ms] Range (min … max): 99.5 ms … 111.1 ms 26 runs Wasm, lto=true Time (mean ± σ): 83.8 ms ± 2.5 ms [User: 65.8 ms, System: 12.1 ms] Range (min … max): 80.5 ms … 93.3 ms 31 runs Wasm, lto=true, strip="none", incremental=false, codegen-units=1, panic="abort" Time (mean ± σ): 53.1 ms ± 1.0 ms [User: 35.5 ms, System: 12.5 ms] Range (min … max): 50.6 ms … 55.5 ms 54 runs ```
I did a bit of debugging and unfortunately I can't figure out what's going on. The slowdown seems related to symbols but also not related to symbols. I suspect there's bits-and-pieces of the mach-o format that I don't understand which are coming into play here. I couldn't reproduce this slowdown on Linux, only on macOS, and profiling there I find a bit more difficult through instruments rather than |
* fix(c-api): give a cleaner error message if cargo isn't found (#8497) Instead of > Performing build step for 'wasmtime-crate''WASMTIME_CARGO_BINARY-NOTFOUND' is not recognized as an internal or external command, operable program or batch file. this will now instead output > "cargo" was not found. Ensure "cargo" is in PATH. Aborting... * c-api: use `--release` when MinSizeRel and RelWithDebInfo is used (#8549) * build: add "fastest-runtime" profile for runtime optimization (#8554) This is extremely useful for cases where the default optimizations just are not enough. Background: [neovim](https://github.com/neovim/neovim) is interested to add wasmtime support in neovim/neovim#28415 but we noticed that including wasmtime, even when not using wasmtime directly, heavily affects runtime performance. This is not only reflected in the increased startuptime but affects the runtime performance overall. Here are the benchmarks for startuptimes for different configurations. Important to note is that all of runtime is affected, but the startuptime is a decent proxy to measure runtime performance: ``` No wasm Time (mean ± σ): 50.5 ms ± 1.5 ms [User: 32.8 ms, System: 12.3 ms] Range (min … max): 48.3 ms … 54.4 ms 56 runs Wasm, lto=thin Time (mean ± σ): 104.9 ms ± 3.5 ms [User: 86.5 ms, System: 12.7 ms] Range (min … max): 99.5 ms … 111.1 ms 26 runs Wasm, lto=true Time (mean ± σ): 83.8 ms ± 2.5 ms [User: 65.8 ms, System: 12.1 ms] Range (min … max): 80.5 ms … 93.3 ms 31 runs Wasm, lto=true, strip="none", incremental=false, codegen-units=1, panic="abort" Time (mean ± σ): 53.1 ms ± 1.0 ms [User: 35.5 ms, System: 12.5 ms] Range (min … max): 50.6 ms … 55.5 ms 54 runs ```
…dealliance#8554) This is extremely useful for cases where the default optimizations just are not enough. Background: [neovim](https://github.com/neovim/neovim) is interested to add wasmtime support in neovim/neovim#28415 but we noticed that including wasmtime, even when not using wasmtime directly, heavily affects runtime performance. This is not only reflected in the increased startuptime but affects the runtime performance overall. Here are the benchmarks for startuptimes for different configurations. Important to note is that all of runtime is affected, but the startuptime is a decent proxy to measure runtime performance: ``` No wasm Time (mean ± σ): 50.5 ms ± 1.5 ms [User: 32.8 ms, System: 12.3 ms] Range (min … max): 48.3 ms … 54.4 ms 56 runs Wasm, lto=thin Time (mean ± σ): 104.9 ms ± 3.5 ms [User: 86.5 ms, System: 12.7 ms] Range (min … max): 99.5 ms … 111.1 ms 26 runs Wasm, lto=true Time (mean ± σ): 83.8 ms ± 2.5 ms [User: 65.8 ms, System: 12.1 ms] Range (min … max): 80.5 ms … 93.3 ms 31 runs Wasm, lto=true, strip="none", incremental=false, codegen-units=1, panic="abort" Time (mean ± σ): 53.1 ms ± 1.0 ms [User: 35.5 ms, System: 12.5 ms] Range (min … max): 50.6 ms … 55.5 ms 54 runs ```
…#8575) This is extremely useful for cases where the default optimizations just are not enough. Background: [neovim](https://github.com/neovim/neovim) is interested to add wasmtime support in neovim/neovim#28415 but we noticed that including wasmtime, even when not using wasmtime directly, heavily affects runtime performance. This is not only reflected in the increased startuptime but affects the runtime performance overall. Here are the benchmarks for startuptimes for different configurations. Important to note is that all of runtime is affected, but the startuptime is a decent proxy to measure runtime performance: ``` No wasm Time (mean ± σ): 50.5 ms ± 1.5 ms [User: 32.8 ms, System: 12.3 ms] Range (min … max): 48.3 ms … 54.4 ms 56 runs Wasm, lto=thin Time (mean ± σ): 104.9 ms ± 3.5 ms [User: 86.5 ms, System: 12.7 ms] Range (min … max): 99.5 ms … 111.1 ms 26 runs Wasm, lto=true Time (mean ± σ): 83.8 ms ± 2.5 ms [User: 65.8 ms, System: 12.1 ms] Range (min … max): 80.5 ms … 93.3 ms 31 runs Wasm, lto=true, strip="none", incremental=false, codegen-units=1, panic="abort" Time (mean ± σ): 53.1 ms ± 1.0 ms [User: 35.5 ms, System: 12.5 ms] Range (min … max): 50.6 ms … 55.5 ms 54 runs ``` Co-authored-by: dundargoc <33953936+dundargoc@users.noreply.github.com>
This should lead to significat speedups for users. See bytecodealliance/wasmtime#8554. While we're here, let's clean up the `cmake` invocation a bit.
This is extremely useful for cases where the default optimizations just
are not enough.
Background: neovim is interested to
add wasmtime support in neovim/neovim#28415 but
we noticed that including wasmtime, even when not using wasmtime
directly, heavily affects runtime performance. This is not only
reflected in the increased startuptime but affects the runtime
performance overall.
Here are the benchmarks for startuptimes for different configurations.
Important to note is that all of runtime is affected, but the
startuptime is a decent proxy to measure runtime performance: