-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak VecCache
to improve performance
#138405
base: master
Are you sure you want to change the base?
Conversation
r? @Noratrieb rustbot has assigned @Noratrieb. Use |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Tweak `VecCache` to improve performance This has some tweaks to `VecCache` to improve performance. - It saves a `compare_exchange` in `complete` using the new `put_unique` function. - It removes bound checks on entries. These are instead checked in the `slot_index_exhaustive` test. - `initialize_bucket` is outlined and tuned for that. cc `@Mark-Simulacrum`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
} | ||
|
||
#[cold] | ||
fn initialize_bucket<V>(&self, bucket: &AtomicPtr<Slot<V>>) -> *mut Slot<V> { | ||
#[inline(never)] | ||
fn initialize_bucket<V>(bucket: &AtomicPtr<Slot<V>>, bucket_idx: usize) -> *mut Slot<V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It avoids Self
needing it exist when not inlined, as the needed information can be passed in registers instead.
Finished benchmarking commit (0b0612c): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -2.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 4.1%, secondary -3.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 779.667s -> 779.823s (0.02%) |
Local results:
Looks like something might be up with |
I can't reproduce the
|
Cycles are noisy, although this was high above the threshold. Let's try again, just in case it was a fluke. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Tweak `VecCache` to improve performance This has some tweaks to `VecCache` to improve performance. - It saves a `compare_exchange` in `complete` using the new `put_unique` function. - It removes bound checks on entries. These are instead checked in the `slot_index_exhaustive` test. - `initialize_bucket` is outlined and tuned for that. cc `@Mark-Simulacrum`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (d06bb14): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResults (primary -1.5%, secondary -2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 779.458s -> 779.323s (-0.02%) |
Looks like it was a fluke after all. |
This has some tweaks to
VecCache
to improve performance.compare_exchange
incomplete
using the newput_unique
function.slot_index_exhaustive
test.initialize_bucket
is outlined and tuned for that.cc @Mark-Simulacrum