-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf - use ArrayVec
instead of Vec
for internal AudioParam
buffer
#363
Conversation
/bench |
|
src/param.rs
Outdated
@@ -1088,7 +1088,11 @@ impl AudioParamProcessor { | |||
match some_event { | |||
None => { | |||
if is_a_rate { | |||
self.buffer.resize(count, self.intrinsic_value); | |||
let buffer = [self.intrinsic_value; RENDER_QUANTUM_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's more succinct to write as:
for _ in self.buffer.len() .. count {
self.buffer.try_insert(self.intrinsic_value).unwrap();
}
Or did you benchmark this to be the fastest way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I just replaced it with a simple push
.
I actually didn't do any particular benchmark, but I think this doesn't worth the hassle for now. I don't think this is a really hot path and more important issues should be considered before focusing on such details in my opinion. Let's just prefer simplicity and readability (I left a comment to keep the idea around though)
note: there is this weird L2 Accesses: 307843 (+12.67221%)
in the bench_stereo_positional
(I have the impression this particular bench is often very unstable, but maybe this is just in my head...). That's a bit confusing, but all other numbers are very similar between the two versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the L2 and RAM accesses are quite unstable. But since these numbers are so low I don't think it is very important (and that's probably also the reason for the large deviations). I think instruction count is the main metric to track.
I have read https://kobzol.github.io/rust/rustc/2023/09/23/rustc-runtime-benchmarks.html recently with some tips to look further into:
After the initial refactoring was completed, I needed to decide how will we actually define the benchmarks and what tool we should use to gather the execution metrics. Both cargo bench and criterion are not a bad choice for running benchmarks, but they only measure wall-time, while I also wanted to measure hardware counters. I was considering to use iai for a while. However, it uses Cachegrind for the measurements, while I wanted the benchmarks to be executed natively, without simulation. Also, using Cachegrind wouldn’t produce realistic wall-time results.
In the end, I decided to write a small library called benchlib, so that we would have ultimate control of defining, executing and measuring the benchmarks, instead of relying on external crates. benchlib uses Linux perf events to gather hardware metrics, using the perf-event crate. I also took bits and pieces from other mentioned tools, like the black_box function from iai.
Thanks, looks good to me except the mentioned nitpick! |
/bench |
|
cf. #359