-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark a lot of functions inline #347
Conversation
This fixes a lot of the performance regression reported in #339
Right now with
The remaining regression (compared to
|
We could also try to use |
Good idea! I just tried that out on my computer and benchmarked it. The rest of the profile config ( |
@tavianator thanks a lot for this PR and for pasting the benchmark results! Can you share more how you selected the functions that you annotated with [profile.release]
lto = "fat"
codegen-units = 1 I am asking because we should definitely make sure that the best possible performance won't get pessimized by any "optimization" that we apply. We only want to close the gap between both without pessimizing the best performance. If we are lucky the optimizations further improve the best possible outcome. I am sure there is still a lot of room of improvements even for the best results we have seen so far. For example from what I can tell is that the Rust compiler currently does not optimize our switch based instruction dispatch in a way that would be most optimal for it - but it could in theory do that. This could have a potential 30% gain in performance on top of our best results that we have seen so far. I also have a more in-depth answer here. |
I just compared this branch with [profile.release]
lto = "fat"
codegen-units = 1 Unfortunately we can see slowdowns in the range of 20-33% especially for benchmarks that evaluate performance of function dispatch and module instantiation. Instruction dispatch is also affected. tl;dr: While this PR closes the gap between best and worst performance it also pessimizes the best performance.
|
Did some test where I only added some |
You're welcome! :)
I got a
That's the "remaining regression" in my above post. I should do a before and after comparison with that configuration but I haven't yet.
RIght, agreed. And this is all only looking at one benchmark (execute/tiny_keccak/v1), presumably I should look at a wider range of benchmarks to find the appropriate candidates for inlining.
Thanks, I'll read that now. |
@tavianator Would you like to revert the And yes, ideally you test performance for all the benchmarks we have using |
@tavianator would you be willing to do this? Otherwise I'd like to close this PR. |
Closing this due to inactivity and also because we just merged #348 that is doing something similar but without the regressions for the However, thank you a lot for researching again in this direction even if it not really led to the solution of the original problem! :) |
This fixes a lot of the performance regression reported in #339.