Skip to content

Experiment: store nodes in contiguous Vec instead of HashMap #379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 30, 2023

Conversation

orottier
Copy link
Owner

Don't mind the unreadable code, I'm seeing +-10% performance gains in some benchmarks.

A big todo is to reuse deallocated node slots, or the Vec may grow indefinitely for long running processes

@orottier
Copy link
Owner Author

/bench

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             4412318 (-4.779829%)
  L1 Accesses:              6711114 (-3.665985%)
  L2 Accesses:                54029 (+0.031474%)
  RAM Accesses:               61396 (-0.056974%)
  Estimated Cycles:         9130119 (-2.732935%)

bench_sine
  Instructions:            70073875 (-1.565877%)
  L1 Accesses:            102101922 (-1.781494%)
  L2 Accesses:               270035 (-0.704904%)
  RAM Accesses:               62251 (-0.085067%)
  Estimated Cycles:       105630882 (-1.733462%)

bench_sine_gain
  Instructions:            74635985 (-2.206900%)
  L1 Accesses:            109148671 (-2.303750%)
  L2 Accesses:               279157 (+0.193095%)
  RAM Accesses:               62346 (-0.086538%)
  Estimated Cycles:       112726566 (-2.231584%)

bench_sine_gain_delay
  Instructions:           148096674 (-1.641238%)
  L1 Accesses:            208241762 (-1.656738%)
  L2 Accesses:               520474 (-0.413672%)
  RAM Accesses:               63934 (-0.084391%)
  Estimated Cycles:       213081822 (-1.625484%)

bench_buffer_src
  Instructions:            16625886 (-6.229234%)
  L1 Accesses:             24479257 (-5.306828%)
  L2 Accesses:                88737 (+1.195133%)
  RAM Accesses:              100416 (-0.034843%)
  Estimated Cycles:        28437502 (-4.589308%)

bench_buffer_src_delay
  Instructions:            88702199 (-2.188368%)
  L1 Accesses:            121640392 (-1.982771%)
  L2 Accesses:               161267 (-0.311552%)
  RAM Accesses:              100566 (-0.022865%)
  Estimated Cycles:       125966537 (-1.918519%)

bench_buffer_src_iir
  Instructions:            40559605 (-3.233541%)
  L1 Accesses:             59421855 (-2.742446%)
  L2 Accesses:                89447 (+1.771533%)
  RAM Accesses:              100516 (-0.027849%)
  Estimated Cycles:        63387150 (-2.565115%)

bench_buffer_src_biquad
  Instructions:            35749128 (-5.914751%)
  L1 Accesses:             50709230 (-5.336838%)
  L2 Accesses:               146040 (+24.97005%)
  RAM Accesses:              100610 (-0.051658%)
  Estimated Cycles:        54960780 (-4.706954%)

bench_stereo_positional
  Instructions:            41257633 (-10.06222%)
  L1 Accesses:             62919067 (-8.577324%)
  L2 Accesses:               258209 (-17.57779%)
  RAM Accesses:              100753 (+0.011912%)
  Estimated Cycles:        67736467 (-8.358328%)

bench_stereo_panning_automation
  Instructions:            30726394 (-5.145265%)
  L1 Accesses:             46433775 (-4.279755%)
  L2 Accesses:               142868 (+0.315971%)
  RAM Accesses:              100535 (-0.028837%)
  Estimated Cycles:        50666840 (-3.934019%)

bench_analyser_node
  Instructions:            38543208 (-3.389711%)
  L1 Accesses:             54284091 (-2.986209%)
  L2 Accesses:               185219 (+2.010817%)
  RAM Accesses:              101041 (-0.051438%)
  Estimated Cycles:        58746621 (-2.739186%)

bench_hrtf_panners
  Instructions:          1772814308 (-0.537937%)
  L1 Accesses:           2546265447 (-0.474399%)
  L2 Accesses:             25961946 (-1.879364%)
  RAM Accesses:              175023 (+0.049732%)
  Estimated Cycles:      2682200982 (-0.542132%)


No functional changes compared to the previous commit
@orottier
Copy link
Owner Author

/bench

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             4405809 (-4.920297%)
  L1 Accesses:              6689587 (-3.974992%)
  L2 Accesses:                54167 (+0.285117%)
  RAM Accesses:               61460 (+0.048836%)
  Estimated Cycles:         9111522 (-2.930746%)

bench_sine
  Instructions:            70220929 (-1.359308%)
  L1 Accesses:            102313719 (-1.577753%)
  L2 Accesses:               265364 (-2.422845%)
  RAM Accesses:               62306 (+0.004815%)
  Estimated Cycles:       105821249 (-1.556339%)

bench_sine_gain
  Instructions:            74861554 (-1.911345%)
  L1 Accesses:            109474714 (-2.011917%)
  L2 Accesses:               287355 (+3.135094%)
  RAM Accesses:               62396 (-0.004808%)
  Estimated Cycles:       113095349 (-1.911711%)

bench_sine_gain_delay
  Instructions:           148415464 (-1.429513%)
  L1 Accesses:            208700227 (-1.440225%)
  L2 Accesses:               530302 (+1.466601%)
  RAM Accesses:               63982 (-0.007814%)
  Estimated Cycles:       213591107 (-1.390346%)

bench_buffer_src
  Instructions:            16771432 (-5.408305%)
  L1 Accesses:             24689007 (-4.495407%)
  L2 Accesses:                88370 (+0.776608%)
  RAM Accesses:              100465 (+0.013937%)
  Estimated Cycles:        28647132 (-3.885940%)

bench_buffer_src_delay
  Instructions:            88944174 (-1.921469%)
  L1 Accesses:            121987664 (-1.702882%)
  L2 Accesses:               160893 (-0.544587%)
  RAM Accesses:              100615 (+0.026842%)
  Estimated Cycles:       126313654 (-1.648171%)

bench_buffer_src_iir
  Instructions:            40719841 (-2.851244%)
  L1 Accesses:             59656903 (-2.357725%)
  L2 Accesses:                89572 (+1.910277%)
  RAM Accesses:              100564 (+0.021881%)
  Estimated Cycles:        63624503 (-2.200177%)

bench_buffer_src_biquad
  Instructions:            36017388 (-5.208699%)
  L1 Accesses:             51089640 (-4.626664%)
  L2 Accesses:               142651 (+22.07104%)
  RAM Accesses:              100683 (+0.021856%)
  Estimated Cycles:        55326800 (-4.072240%)

bench_stereo_positional
  Instructions:            41824558 (-8.826328%)
  L1 Accesses:             63713630 (-7.422777%)
  L2 Accesses:               261674 (-16.47174%)
  RAM Accesses:              100780 (+0.041692%)
  Estimated Cycles:        68549300 (-7.258474%)

bench_stereo_panning_automation
  Instructions:            30944374 (-4.472344%)
  L1 Accesses:             46760900 (-3.605410%)
  L2 Accesses:               138841 (-2.511621%)
  RAM Accesses:              100584 (+0.020882%)
  Estimated Cycles:        50975545 (-3.348642%)

bench_analyser_node
  Instructions:            38697779 (-3.002291%)
  L1 Accesses:             54517709 (-2.568725%)
  L2 Accesses:               181257 (-0.169636%)
  RAM Accesses:              101100 (+0.006924%)
  Estimated Cycles:        58962494 (-2.381787%)

bench_hrtf_panners
  Instructions:          1775999964 (-0.359209%)
  L1 Accesses:           2551380900 (-0.274459%)
  L2 Accesses:             26223066 (-0.891846%)
  RAM Accesses:              175024 (+0.049732%)
  Estimated Cycles:      2688622070 (-0.304010%)


impl NodeCollection {
pub fn new() -> Self {
let mut instance = Self {
nodes: Vec::with_capacity(64),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be more agressive here and grab more memory at startup here, e.g. Vec::with_capacity(2048)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::mem::size_of::<RefCell<Node>>() is 160 bytes, so that would mean over 300kB allocated.
That's not terrible of course, but I think most audio graphs really don't need that many nodes.

Whatever the initial size, we do need to think about the performance impact of outgrowing that size.

@@ -126,6 +125,7 @@ impl Graph {
let inputs = vec![AudioRenderQuantum::from(self.alloc.silence()); number_of_inputs];
let outputs = vec![AudioRenderQuantum::from(self.alloc.silence()); number_of_outputs];

let index = index.0 as usize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, really interesting perf improvements
Just thinking (maybe I'm mistaking, not really sure and maybe that's what you have in mind but...) renaming these to id rather than index could help continue refactoring I think.
Because then we could have two parallel Vec: a Vec<Option<AudioNodeId>> and a Vec<Option<RefCell<Node>>> where the indexes would correspond. Then we could reuse None places in the Vec to prevent the it from growing on indefinitely (would maybe imply a third Vec<Option<usize>> containing the dropped positions for housekeeping)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks for the suggestion. I actually have another solution in mind where the audio graph lets the AudioContext know ( realtime safe) which AudioNodeIds are available again. Would be interesting to benchmark against your solution

@orottier
Copy link
Owner Author

I will have to evaluate the performance regression between the first and second commit

@orottier
Copy link
Owner Author

/bench

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             4415186 (-4.717936%)
  L1 Accesses:              6712612 (-3.644454%)
  L2 Accesses:                54178 (+0.305482%)
  RAM Accesses:               61519 (+0.141620%)
  Estimated Cycles:         9136667 (-2.663570%)

bench_sine
  Instructions:            70003866 (-1.664220%)
  L1 Accesses:            102445590 (-1.452945%)
  L2 Accesses:               260442 (-3.465275%)
  RAM Accesses:               62364 (+0.094695%)
  Estimated Cycles:       105930540 (-1.446802%)

bench_sine_gain
  Instructions:            74481900 (-2.408793%)
  L1 Accesses:            109337493 (-2.137422%)
  L2 Accesses:               291892 (+5.927224%)
  RAM Accesses:               62466 (+0.107374%)
  Estimated Cycles:       112983263 (-1.998517%)

bench_sine_gain_delay
  Instructions:           147853056 (-1.803040%)
  L1 Accesses:            208247988 (-1.645851%)
  L2 Accesses:               555203 (+2.863373%)
  RAM Accesses:               64061 (+0.112519%)
  Estimated Cycles:       213266138 (-1.571511%)

bench_buffer_src
  Instructions:            16545078 (-6.685021%)
  L1 Accesses:             24329803 (-5.885015%)
  L2 Accesses:                86525 (-1.310537%)
  RAM Accesses:              100535 (+0.080634%)
  Estimated Cycles:        28281153 (-5.114018%)

bench_buffer_src_delay
  Instructions:            88357311 (-2.576728%)
  L1 Accesses:            121021472 (-2.488780%)
  L2 Accesses:               162332 (+1.466378%)
  RAM Accesses:              100692 (+0.101402%)
  Estimated Cycles:       125357352 (-2.393142%)

bench_buffer_src_iir
  Instructions:            40356776 (-3.717434%)
  L1 Accesses:             59068937 (-3.320739%)
  L2 Accesses:                88188 (+0.824311%)
  RAM Accesses:              100634 (+0.092500%)
  Estimated Cycles:        63032067 (-3.108243%)

bench_buffer_src_biquad
  Instructions:            35455095 (-6.688588%)
  L1 Accesses:             50223669 (-6.242442%)
  L2 Accesses:               100993 (-13.92691%)
  RAM Accesses:              100753 (+0.090402%)
  Estimated Cycles:        54254989 (-5.933769%)

bench_stereo_positional
  Instructions:            40699885 (-11.27798%)
  L1 Accesses:             61863902 (-10.14594%)
  L2 Accesses:               294245 (+2.848704%)
  RAM Accesses:              100857 (+0.117134%)
  Estimated Cycles:        66865122 (-9.403788%)

bench_stereo_panning_automation
  Instructions:            30709023 (-5.220852%)
  L1 Accesses:             46391825 (-4.382311%)
  L2 Accesses:               138272 (-2.466689%)
  RAM Accesses:              100657 (+0.092478%)
  Estimated Cycles:        50606180 (-4.057968%)

bench_analyser_node
  Instructions:            38500547 (-3.496696%)
  L1 Accesses:             54206782 (-3.125859%)
  L2 Accesses:               183296 (+1.417553%)
  RAM Accesses:              101187 (+0.093974%)
  Estimated Cycles:        58664807 (-2.869256%)

bench_hrtf_panners
  Instructions:          1773193214 (-0.516679%)
  L1 Accesses:           2547262342 (-0.435362%)
  L2 Accesses:             26077189 (-1.450493%)
  RAM Accesses:              175085 (+0.062866%)
  Estimated Cycles:      2683776262 (-0.484033%)


@orottier
Copy link
Owner Author

/bench

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             4415186 (-4.717936%)
  L1 Accesses:              6712609 (-3.644497%)
  L2 Accesses:                54175 (+0.299928%)
  RAM Accesses:               61525 (+0.151387%)
  Estimated Cycles:         9136859 (-2.661524%)

bench_sine
  Instructions:            70000116 (-1.669488%)
  L1 Accesses:            102443278 (-1.455175%)
  L2 Accesses:               258997 (-4.000163%)
  RAM Accesses:               62371 (+0.112358%)
  Estimated Cycles:       105921248 (-1.455314%)

bench_sine_gain
  Instructions:            74478150 (-2.413707%)
  L1 Accesses:            109349560 (-2.126626%)
  L2 Accesses:               276068 (+0.184352%)
  RAM Accesses:               62473 (+0.128220%)
  Estimated Cycles:       112916455 (-2.056296%)

bench_sine_gain_delay
  Instructions:           147849306 (-1.805531%)
  L1 Accesses:            208243459 (-1.647992%)
  L2 Accesses:               555978 (+3.006768%)
  RAM Accesses:               64065 (+0.126594%)
  Estimated Cycles:       213265624 (-1.571673%)

bench_buffer_src
  Instructions:            16541308 (-6.706326%)
  L1 Accesses:             24325938 (-5.900025%)
  L2 Accesses:                86622 (-1.196519%)
  RAM Accesses:              100533 (+0.084621%)
  Estimated Cycles:        28277703 (-5.124928%)

bench_buffer_src_delay
  Instructions:            88353557 (-2.580854%)
  L1 Accesses:            121017569 (-2.491920%)
  L2 Accesses:               162478 (+1.558906%)
  RAM Accesses:              100693 (+0.106377%)
  Estimated Cycles:       125354214 (-2.395466%)

bench_buffer_src_iir
  Instructions:            40353030 (-3.726399%)
  L1 Accesses:             59065095 (-3.327045%)
  L2 Accesses:                88283 (+0.930615%)
  RAM Accesses:              100637 (+0.095483%)
  Estimated Cycles:        63028805 (-3.113289%)

bench_buffer_src_biquad
  Instructions:            35451341 (-6.698458%)
  L1 Accesses:             50218639 (-6.251828%)
  L2 Accesses:               102263 (-12.84527%)
  RAM Accesses:              100762 (+0.104315%)
  Estimated Cycles:        54256624 (-5.930654%)

bench_stereo_positional
  Instructions:            40692385 (-11.29438%)
  L1 Accesses:             61858175 (-10.15429%)
  L2 Accesses:               292463 (+2.226549%)
  RAM Accesses:              100866 (+0.125074%)
  Estimated Cycles:        66850800 (-9.423257%)

bench_stereo_panning_automation
  Instructions:            30705253 (-5.232546%)
  L1 Accesses:             46388075 (-4.390095%)
  L2 Accesses:               138243 (-2.488520%)
  RAM Accesses:              100661 (+0.101433%)
  Estimated Cycles:        50602425 (-4.064838%)

bench_analyser_node
  Instructions:            38496821 (-3.506123%)
  L1 Accesses:             54202558 (-3.133472%)
  L2 Accesses:               183798 (+1.694746%)
  RAM Accesses:              101190 (+0.095951%)
  Estimated Cycles:        58663198 (-2.872044%)

bench_hrtf_panners
  Instructions:          1773185720 (-0.517099%)
  L1 Accesses:           2547203760 (-0.437657%)
  L2 Accesses:             26128242 (-1.257183%)
  RAM Accesses:              175122 (+0.097170%)
  Estimated Cycles:      2683974240 (-0.476648%)


@orottier orottier marked this pull request as ready for review October 27, 2023 11:26
@orottier
Copy link
Owner Author

Alright, very happy with the results. This is ready for review @b-ma, but take your time. I'm not in a hurry

self.nodes
.get_mut(&source.0)
.unwrap_or_else(|| panic!("cannot connect {:?} to {:?}", source, dest))
self.nodes[source.0 .0 as usize]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with this syntax: source.0 .0 as usize, what does it mean?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that looks weird. It is a double tuple unpack for which rustfmt has added an extra space. This appears to be not necessary anymore if we change the config: rust-lang/rustfmt#5745
I will pick that up in a separate PR

Then again I agree the nested tuple is not nice to read. I will change the NodeCollection interface to take AudioNodeId instead of usize for indexing. This will make the code nicer!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change the NodeCollection interface to take AudioNodeId instead of usize for indexing. This will make the code nicer!

+1

@b-ma
Copy link
Collaborator

b-ma commented Oct 29, 2023

I think I got it, no special comment on my side, looks very nice, great work!

///
/// It reuses the ids of decommissioned nodes to prevent unbounded growth of the audio graphs node
/// list (which is stored in a Vec indexed by the AudioNodeId).
struct AudioNodeIdIssuer {
Copy link
Collaborator

@b-ma b-ma Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find audio_node_id_issuer.issue() a bit weird.
Maybe rename to AudioNodeIdProvider? i.e. audio_node_id_provider.get()

...But really no big deal

@orottier
Copy link
Owner Author

/bench

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             4433872 (-4.314930%)
  L1 Accesses:              6727608 (-3.429445%)
  L2 Accesses:                54184 (+0.318448%)
  RAM Accesses:               61507 (+0.117197%)
  Estimated Cycles:         9151273 (-2.509192%)

bench_sine
  Instructions:            70214192 (-1.368588%)
  L1 Accesses:            102312736 (-1.580679%)
  L2 Accesses:               262502 (-2.647233%)
  RAM Accesses:               62350 (+0.073832%)
  Estimated Cycles:       105807496 (-1.560491%)

bench_sine_gain
  Instructions:            74842079 (-1.937078%)
  L1 Accesses:            109460314 (-2.027991%)
  L2 Accesses:               285431 (+3.690532%)
  RAM Accesses:               62446 (+0.080133%)
  Estimated Cycles:       113073079 (-1.919785%)

bench_sine_gain_delay
  Instructions:           148311943 (-1.527527%)
  L1 Accesses:            208516131 (-1.555471%)
  L2 Accesses:               565866 (+5.509615%)
  RAM Accesses:               64033 (+0.071890%)
  Estimated Cycles:       213586616 (-1.451240%)

bench_buffer_src
  Instructions:            16773163 (-5.398115%)
  L1 Accesses:             24699697 (-4.456634%)
  L2 Accesses:                87911 (+1.249626%)
  RAM Accesses:              100525 (+0.074664%)
  Estimated Cycles:        28657627 (-3.838946%)

bench_buffer_src_delay
  Instructions:            88956262 (-1.867799%)
  L1 Accesses:            122015757 (-1.628351%)
  L2 Accesses:               164893 (+3.009196%)
  RAM Accesses:              100676 (+0.086491%)
  Estimated Cycles:       126363882 (-1.552398%)

bench_buffer_src_iir
  Instructions:            40717967 (-2.856072%)
  L1 Accesses:             59664404 (-2.346751%)
  L2 Accesses:                88755 (+1.639889%)
  RAM Accesses:              100624 (+0.082553%)
  Estimated Cycles:        63630019 (-2.188586%)

bench_buffer_src_biquad
  Instructions:            36016950 (-5.209308%)
  L1 Accesses:             51139449 (-4.531395%)
  L2 Accesses:               118248 (+0.393089%)
  RAM Accesses:              100712 (+0.048677%)
  Estimated Cycles:        55255609 (-4.201349%)

bench_stereo_positional
  Instructions:            41758291 (-8.971637%)
  L1 Accesses:             63635785 (-7.573622%)
  L2 Accesses:               285383 (-0.179785%)
  RAM Accesses:              100841 (+0.096284%)
  Estimated Cycles:        68592135 (-7.063990%)

bench_stereo_panning_automation
  Instructions:            30934299 (-4.525077%)
  L1 Accesses:             46764098 (-3.613296%)
  L2 Accesses:               133928 (-5.923673%)
  RAM Accesses:              100636 (+0.074582%)
  Estimated Cycles:        50955998 (-3.398401%)

bench_analyser_node
  Instructions:            38695608 (-3.008117%)
  L1 Accesses:             54517898 (-2.571152%)
  L2 Accesses:               187399 (+3.973080%)
  RAM Accesses:              101164 (+0.072212%)
  Estimated Cycles:        58995633 (-2.318647%)

bench_hrtf_panners
  Instructions:          1774802856 (-0.426342%)
  L1 Accesses:           2548950578 (-0.370688%)
  L2 Accesses:             26426413 (+0.000034%)
  RAM Accesses:              175053 (+0.056586%)
  Estimated Cycles:      2687209498 (-0.351553%)


@orottier orottier merged commit 6cfc2d7 into main Oct 30, 2023
@orottier orottier deleted the feature/store-nodes-in-vec-not-hashmap branch October 30, 2023 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants