Perf - use `ArrayVec` instead of `Vec` for internal `AudioParam` buffer #363

b-ma · 2023-09-22T17:38:07Z

b-ma · 2023-09-22T17:42:43Z

/bench

github-actions · 2023-09-22T17:45:56Z

Benchmark result:


bench_ctor
  Instructions:             4533261 (-0.570421%)
  L1 Accesses:              6769687 (-0.571688%)
  L2 Accesses:                54312 (-0.038651%)
  RAM Accesses:               61621 (+0.108848%)
  Estimated Cycles:         9197982 (-0.397242%)

bench_sine
  Instructions:            70946851 (-0.159726%)
  L1 Accesses:            103533921 (-0.125667%)
  L2 Accesses:               263290 (-1.509030%)
  RAM Accesses:               62493 (+0.107327%)
  Estimated Cycles:       107037626 (-0.138169%)

bench_sine_gain
  Instructions:            75976565 (-0.244167%)
  L1 Accesses:            111134791 (-0.197463%)
  L2 Accesses:               268087 (-1.294900%)
  RAM Accesses:               62589 (-0.094177%)
  Estimated Cycles:       114665841 (-0.208462%)

bench_sine_gain_delay
  Instructions:           150978642 (-0.202723%)
  L1 Accesses:            213650631 (-0.173822%)
  L2 Accesses:               566736 (-2.148217%)
  RAM Accesses:               64193 (+0.077951%)
  Estimated Cycles:       218731066 (-0.197331%)

bench_buffer_src
  Instructions:            17508909 (-0.742848%)
  L1 Accesses:             25451477 (-0.627479%)
  L2 Accesses:                87862 (+0.702587%)
  RAM Accesses:              100775 (+0.057587%)
  Estimated Cycles:        29417912 (-0.526200%)

bench_buffer_src_delay
  Instructions:            91164983 (-0.304087%)
  L1 Accesses:            126148642 (-0.282404%)
  L2 Accesses:               163146 (-2.265062%)
  RAM Accesses:              100922 (+0.033701%)
  Estimated Cycles:       130496642 (-0.286520%)

bench_buffer_src_iir
  Instructions:            41930928 (+0.298410%)
  L1 Accesses:             60575017 (-0.403218%)
  L2 Accesses:                87029 (-0.810349%)
  RAM Accesses:              100756 (-0.045634%)
  Estimated Cycles:        64536622 (-0.386502%)

bench_buffer_src_biquad
  Instructions:            37529097 (-0.846618%)
  L1 Accesses:             52768075 (-0.667674%)
  L2 Accesses:               117794 (-1.586559%)
  RAM Accesses:              100972 (+0.022784%)
  Estimated Cycles:        56891065 (-0.634670%)

bench_stereo_positional
  Instructions:            44850469 (-1.726766%)
  L1 Accesses:             67166221 (-1.231903%)
  L2 Accesses:               290895 (+2.244209%)
  RAM Accesses:              100958 (-0.077200%)
  Estimated Cycles:        72154226 (-1.108165%)

bench_stereo_panning_automation
  Instructions:            32358310 (+0.399506%)
  L1 Accesses:             48579620 (+0.950843%)
  L2 Accesses:               134788 (-3.874598%)
  RAM Accesses:              100891 (+0.061490%)
  Estimated Cycles:        52784745 (+0.826269%)

bench_analyser_node
  Instructions:            39636423 (-0.368031%)
  L1 Accesses:             55489272 (-0.331125%)
  L2 Accesses:               185066 (+1.474418%)
  RAM Accesses:              101311 (+0.070130%)
  Estimated Cycles:        59960487 (-0.280097%)

orottier · 2023-10-02T18:23:42Z

src/param.rs

@@ -1088,7 +1088,11 @@ impl AudioParamProcessor {
            match some_event {
                None => {
                    if is_a_rate {
-                        self.buffer.resize(count, self.intrinsic_value);
+                        let buffer = [self.intrinsic_value; RENDER_QUANTUM_SIZE];


Maybe it's more succinct to write as:

for _ in self.buffer.len() .. count { self.buffer.try_insert(self.intrinsic_value).unwrap(); }

Or did you benchmark this to be the fastest way?

ok, I just replaced it with a simple push.

I actually didn't do any particular benchmark, but I think this doesn't worth the hassle for now. I don't think this is a really hot path and more important issues should be considered before focusing on such details in my opinion. Let's just prefer simplicity and readability (I left a comment to keep the idea around though)

note: there is this weird L2 Accesses: 307843 (+12.67221%) in the bench_stereo_positional (I have the impression this particular bench is often very unstable, but maybe this is just in my head...). That's a bit confusing, but all other numbers are very similar between the two versions

Yeah, the L2 and RAM accesses are quite unstable. But since these numbers are so low I don't think it is very important (and that's probably also the reason for the large deviations). I think instruction count is the main metric to track.

I have read https://kobzol.github.io/rust/rustc/2023/09/23/rustc-runtime-benchmarks.html recently with some tips to look further into:

After the initial refactoring was completed, I needed to decide how will we actually define the benchmarks and what tool we should use to gather the execution metrics. Both cargo bench and criterion are not a bad choice for running benchmarks, but they only measure wall-time, while I also wanted to measure hardware counters. I was considering to use iai for a while. However, it uses Cachegrind for the measurements, while I wanted the benchmarks to be executed natively, without simulation. Also, using Cachegrind wouldn’t produce realistic wall-time results.

In the end, I decided to write a small library called benchlib, so that we would have ultimate control of defining, executing and measuring the benchmarks, instead of relying on external crates. benchlib uses Linux perf events to gather hardware metrics, using the perf-event crate. I also took bits and pieces from other mentioned tools, like the black_box function from iai.

orottier · 2023-10-02T18:24:03Z

Thanks, looks good to me except the mentioned nitpick!

b-ma · 2023-10-03T08:26:54Z

/bench

github-actions · 2023-10-03T08:29:42Z

Benchmark result:


bench_ctor
  Instructions:             4533283 (-0.569938%)
  L1 Accesses:              6769740 (-0.570924%)
  L2 Accesses:                54303 (-0.038657%)
  RAM Accesses:               61603 (+0.066600%)
  Estimated Cycles:         9197360 (-0.406523%)

bench_sine
  Instructions:            70948434 (-0.157499%)
  L1 Accesses:            103535456 (-0.123113%)
  L2 Accesses:               263720 (-1.755013%)
  RAM Accesses:               62479 (+0.073679%)
  Estimated Cycles:       107040821 (-0.139535%)

bench_sine_gain
  Instructions:            75978926 (-0.235109%)
  L1 Accesses:            111130496 (-0.197582%)
  L2 Accesses:               275308 (+2.122135%)
  RAM Accesses:               62575 (+0.065564%)
  Estimated Cycles:       114697161 (-0.165352%)

bench_sine_gain_delay
  Instructions:           150981779 (-0.200649%)
  L1 Accesses:            213619925 (-0.188222%)
  L2 Accesses:               601328 (+3.846439%)
  RAM Accesses:               64178 (+0.040529%)
  Estimated Cycles:       218872795 (-0.132578%)

bench_buffer_src
  Instructions:            17508933 (-0.742734%)
  L1 Accesses:             25451728 (-0.626669%)
  L2 Accesses:                87646 (+0.505705%)
  RAM Accesses:              100769 (+0.045670%)
  Estimated Cycles:        29416873 (-0.529828%)

bench_buffer_src_delay
  Instructions:            91167238 (-0.300114%)
  L1 Accesses:            126152775 (-0.280860%)
  L2 Accesses:               162184 (-0.309795%)
  RAM Accesses:              100916 (+0.031720%)
  Estimated Cycles:       130495755 (-0.272605%)

bench_buffer_src_iir
  Instructions:            41934521 (+0.307074%)
  L1 Accesses:             60578354 (-0.398699%)
  L2 Accesses:                88469 (+1.563594%)
  RAM Accesses:              100868 (+0.055549%)
  Estimated Cycles:        64551079 (-0.360767%)

bench_buffer_src_biquad
  Instructions:            37537474 (-0.824486%)
  L1 Accesses:             52785985 (-0.640675%)
  L2 Accesses:               115575 (-0.449624%)
  RAM Accesses:              100959 (+0.003962%)
  Estimated Cycles:        56897425 (-0.598944%)

bench_stereo_positional
  Instructions:            44858702 (-1.698429%)
  L1 Accesses:             67159790 (-1.247776%)
  L2 Accesses:               307843 (+12.67221%)
  RAM Accesses:              101064 (+0.037614%)
  Estimated Cycles:        72236245 (-0.924624%)

bench_stereo_panning_automation
  Instructions:            32358700 (+0.400629%)
  L1 Accesses:             48578465 (+0.943108%)
  L2 Accesses:               136422 (-0.929544%)
  RAM Accesses:              100877 (+0.039668%)
  Estimated Cycles:        52791270 (+0.857559%)

bench_analyser_node
  Instructions:            39640027 (-0.368147%)
  L1 Accesses:             55496034 (-0.328541%)
  L2 Accesses:               183103 (+0.664673%)
  RAM Accesses:              101414 (+0.048340%)
  Estimated Cycles:        59961039 (-0.291285%)

perf: use ArrayVec instead of Vec for param buffer

57c945d

typo

37ae1fe

orottier reviewed Oct 2, 2023

View reviewed changes

refactor: simplify filling buffer when no event left

b07f4b9

doc: comment on using count

4a5e4dc

orottier merged commit 910e238 into orottier:main Oct 3, 2023

orottier mentioned this pull request Oct 3, 2023

(de-)allocation in render thread #359

Open

11 tasks

b-ma deleted the perf/param-array-vec branch November 4, 2023 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf - use `ArrayVec` instead of `Vec` for internal `AudioParam` buffer #363

Perf - use `ArrayVec` instead of `Vec` for internal `AudioParam` buffer #363

Uh oh!

b-ma commented Sep 22, 2023

Uh oh!

b-ma commented Sep 22, 2023

Uh oh!

github-actions bot commented Sep 22, 2023

Uh oh!

orottier Oct 2, 2023

Uh oh!

b-ma Oct 3, 2023

Uh oh!

orottier Oct 3, 2023

Uh oh!

orottier commented Oct 2, 2023

Uh oh!

b-ma commented Oct 3, 2023

Uh oh!

github-actions bot commented Oct 3, 2023

Uh oh!

Uh oh!

Perf - use ArrayVec instead of Vec for internal AudioParam buffer #363

Perf - use ArrayVec instead of Vec for internal AudioParam buffer #363

Uh oh!

Conversation

b-ma commented Sep 22, 2023

Uh oh!

b-ma commented Sep 22, 2023

Uh oh!

github-actions bot commented Sep 22, 2023

Uh oh!

orottier Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

b-ma Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

orottier Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

orottier commented Oct 2, 2023

Uh oh!

b-ma commented Oct 3, 2023

Uh oh!

github-actions bot commented Oct 3, 2023

Uh oh!

Uh oh!

Perf - use `ArrayVec` instead of `Vec` for internal `AudioParam` buffer #363

Perf - use `ArrayVec` instead of `Vec` for internal `AudioParam` buffer #363