-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language support improvements #161
Conversation
dfd980b
to
7d953dc
Compare
Hey, this is pretty cool. When first reading, I was wondering why I introduced some of these indirections in the first place, I hope these optimizations do not break the rules from upstream.(edit: obviously, the tests are passing so that seems all right) I think I can see another optimization reading this. In the implementation of |
- Replace `Result` parameters with `Option` - Replace `HashSet` for deduplication with linear search based approach - Avoid intermediate allocations This reduces the instruction count in release mode by almost 50%.
7d953dc
to
127124e
Compare
I don't think so, I tried to ensure that the optimizations didnt' change the logic. And the tests should cover all cases, I think.
The items are cloned, but only once (to add them to Regarding the draining of the Vec, I'm not sure I understand you correctly. The vec ( |
This is how I would implement it, there are no clones. |
@niklasmohrin hah, this is quite nice! For the test case above, it increases CPU instructions from 3900 to 3997 (probably because there are no duplicates, so the |
Co-authored-by: Niklas Mohrin <niklas.mohrin@gmail.com>
After #125, I used the occasion to dig out valgrind and kcachegrind to do some performance optimization (something that's fun, but that I do way too seldom...)
To profile, I created a new standalone binary that included nothing but
get_languages
and thismain
:I then built in release mode and counted the instructions for
main
(since the function call toget_languages
was inlined).Results (in CPU instructions and percentage reduced compared to previous version):
HashSet
for deduplication with linear search based approach: 4801 (-31%)language
variable: 4796 (-0.1%)lang_list
, notlocales
: 4543 (-5.3%)locales
into intermediate vec, instead use a chained iterator: 3900 (-14.2%)Callees before and after:
All in all, a reduction in instruction count (for the specified env variables) by 44%, nice!
(Might be of interest to you, @niklasmohrin)