Experimenting towards allocator-free rendering #368

orottier · 2023-10-05T18:55:15Z

The goal is to run cargo run --example audio_buffer_source_events without failing for now.
It shows that the #366 works like intended, there is no deallocation of the audio buffer taking place.
But this PR made me realize we also need to look into all Arc<..> items of renderers, because it now fails with

  13: <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/sync.rs:1897:13
  14: core::ptr::drop_in_place<alloc::sync::Arc<web_audio_api::AtomicF64>>
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/ptr/mod.rs:497:1

And then there's still more issues to look at (Box<dyn AudioProcessor> for example)

The goal is to run `cargo run --example audio_buffer_source_events` without failing for now

orottier · 2023-10-05T18:56:18Z

src/param.rs

@@ -128,7 +128,7 @@ struct AudioParamEventTimeline {
 impl AudioParamEventTimeline {
    fn new() -> Self {
        Self {
-            inner: Vec::new(),
+            inner: Vec::with_capacity(5),


This is not a real solution to the problem

orottier · 2023-10-05T18:56:36Z

src/render/graph.rs

    /// Reusable output buffers, consumed by subsequent Nodes in this graph
-    outputs: Vec<AudioRenderQuantum>,
+    outputs: SmallVec<[AudioRenderQuantum; 2]>,


These are not real solutions to the deallocation problems of this struct

Actually, maybe I'm missing something but it seems to me that the only nodes that have multiple inputs / outputs are the ChannelSplitterNode and ChannelMergerNode which are clamped to MAX_CHANNELS, cf. https://webaudio.github.io/web-audio-api/#dom-baseaudiocontext-createchannelmerger, no?

Then maybe we could simply use an ArrayVec<AudioRenderQuantum, MAX_CHANNELS> to fix this?

~~True that, but of course we also allow users to implement their own AudioNodes which I think could use an unbounded number of inputs/outputs.~~

Scratch that, in the JS interface the https://webaudio.github.io/web-audio-api/#AudioWorkletNode is bound to 1 input and 1 output. So technically I could restrict the raw AudioNode trait and panic when the users supplies a 32+ input/output count.

We need to check performance though, ArrayVec<AudioRenderQuantum, MAX_CHANNELS> is a rather large object to have on the stack, and rust will probably memcpy it around a lot, so we may want to have it on the heap anyway. We can experiment with pre-allocating that Vec on the control thread

b-ma · 2023-10-06T06:36:21Z

src/render/graph.rs

@@ -11,6 +11,9 @@ use super::{Alloc, AudioParamValues, AudioProcessor, AudioRenderQuantum};
 use crate::node::ChannelConfig;
 use crate::render::RenderScope;

+const INITIAL_GRAPH_SIZE: usize = 16;
+const INITIAL_CHANNEL_DATA_COUNT: usize = INITIAL_GRAPH_SIZE * 4;


Would not per se solve the allocation problem, but taking a very large value for INITIAL_GRAPH_SIZE could help (4096 or 8185), as vector growth seems to be exponential (cf. https://nnethercote.github.io/perf-book/heap-allocations.html?highlight=borrow#vec-growth)

b-ma · 2023-10-06T06:38:00Z

Cool, this is very interesting! I will try to have a closer look on this and the previous PR this week-end (I'm a bit running to find some time these weeks...)

…render' into feature/store-nodes-in-vec-not-hashmap

b-ma · 2023-11-05T09:52:36Z

Hey, I just made a small hacky experiment on how a renderer could just drop itself in GC with llq and impl Drop: https://gist.github.com/b-ma/53094341fd51c1ed1ab165b840ea691b

That's rather weird I must confess :), but it seems to work... maybe this can give some ideas

orottier · 2023-11-06T19:12:31Z

Hey, I just made a small hacky experiment on how a renderer could

Well, this is definitely a new use case for recursive data structures. Thanks ;)

However, I'm getting more and more convinced that the dyn AudioProcessor should probably drop inside the render thread, and then defer the actual deallocation of the Box to the garbage collector thread. Something like this (warning, unsafe and untested)

pub fn deallocate_audio_processor(&self, value: Box<dyn AudioProcessor>) {
    let p = Box::into_raw(value);
    unsafe {
        ptr::drop_in_place(p); // drop the actual AudioProcessor fields
        dealloc(p as *mut u8, Layout::new::<Box<dyn AudioProcessor>>()); // TODO - perform this call by GC thread
    }
}

The reason I want the processor to drop inside the render thread is that I don't want to put the burden of clearing your Vec<AudioRenderQuantum> etc on the users of our library. If they forget, and we ship the non-empty vec outside the render thread a data race on the inner Rc may occur resulting in undefined behaviour.

Let me share my current plans:
The Send bound on AudioProcessor bugs me and requires us to use unsafe already e.g. https://github.com/b-ma/web-audio-api-rs/blob/main/src/node/delay.rs#L289 https://github.com/b-ma/web-audio-api-rs/blob/main/src/node/dynamics_compressor.rs#L264
This means any user of our library writing advanced processors will run into this too - at least any processor that uses a ring buffer.
I intend to remove it and come up with a way to instantiate the renderer inside the render thread, as opposed to the current situation where the renderer is instantiated inside the control thread and then shipped.
This will make the AudioProcessor trait more complex. However I am making progress on a true rust AudioWorkletNode API which will be easier to use and aligns more with the specification - and which will be useful for ircam-ismm/node-web-audio-api#28
Then only expert users will need to use the AudioProcessor trait - and perhaps we can even remove it from the public API altogether.
I will open a draft PR around this shortly

b-ma · 2023-11-06T20:08:38Z

ok, not sure I understand every detail (...pretty sure I actually don't, let's be honest), but looking forward to see the thing :)

b-ma · 2023-11-06T20:25:33Z

(my experiment is quite fun anyway! :)

orottier added 3 commits October 3, 2023 16:49

Expose garbage collector thread via RenderScope to all AudioProcessors

4b19874

Proof of concept (untested) to prevent deallocation in node cleanup

23b4d6a

Experimenting with allocator-free rendering - not meant to be merged

614abc1

The goal is to run `cargo run --example audio_buffer_source_events` without failing for now

orottier requested a review from b-ma October 5, 2023 18:55

orottier changed the base branch from main to feature/async-deallocations October 5, 2023 18:55

orottier commented Oct 5, 2023

View reviewed changes

b-ma reviewed Oct 6, 2023

View reviewed changes

orottier mentioned this pull request Oct 27, 2023

Reduce size_of Node for better cache performance in render loop #208

Open

5 tasks

orottier changed the base branch from feature/async-deallocations to main October 27, 2023 11:30

orottier added 4 commits October 27, 2023 13:41

Merge remote-tracking branch 'origin/feature/towards-allocation-free-…

9d66b2a

…render' into feature/store-nodes-in-vec-not-hashmap

Introduce NodeCollection::with_capacity

c6146a9

cargo fmt

b71f3cc

Move GarbageCollector to a separate mod

4a1bb98

orottier changed the base branch from main to feature/store-nodes-in-vec-not-hashmap October 27, 2023 12:05

Base automatically changed from feature/store-nodes-in-vec-not-hashmap to main October 30, 2023 09:37

Merge branch 'main' into feature/towards-allocation-free-render

de08796

orottier changed the title ~~Don't merge: experimenting towards allocator-free rendering~~ Experimenting towards allocator-free rendering Oct 30, 2023

orottier mentioned this pull request Oct 30, 2023

Async deallocations #366

Closed

orottier mentioned this pull request Nov 20, 2023

(de-)allocation in render thread #359

Open

11 tasks

orottier mentioned this pull request Apr 10, 2024

Introduce AudioProcessor::before_drop pre-destructor hook #490

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimenting towards allocator-free rendering #368

Experimenting towards allocator-free rendering #368

Uh oh!

orottier commented Oct 5, 2023

Uh oh!

orottier Oct 5, 2023

Uh oh!

orottier Oct 5, 2023

Uh oh!

b-ma Oct 6, 2023

Uh oh!

orottier Oct 27, 2023

Uh oh!

b-ma Oct 6, 2023 •

edited

Loading

Uh oh!

b-ma commented Oct 6, 2023 •

edited

Loading

Uh oh!

b-ma commented Nov 5, 2023

Uh oh!

orottier commented Nov 6, 2023 •

edited

Loading

Uh oh!

b-ma commented Nov 6, 2023 •

edited

Loading

Uh oh!

b-ma commented Nov 6, 2023

Uh oh!

Uh oh!

Experimenting towards allocator-free rendering #368

Are you sure you want to change the base?

Experimenting towards allocator-free rendering #368

Uh oh!

Conversation

orottier commented Oct 5, 2023

Uh oh!

orottier Oct 5, 2023

Choose a reason for hiding this comment

Uh oh!

orottier Oct 5, 2023

Choose a reason for hiding this comment

Uh oh!

b-ma Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

orottier Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

b-ma Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

b-ma commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-ma commented Nov 5, 2023

Uh oh!

orottier commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-ma commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-ma commented Nov 6, 2023

Uh oh!

Uh oh!

b-ma Oct 6, 2023 •

edited

Loading

b-ma commented Oct 6, 2023 •

edited

Loading

orottier commented Nov 6, 2023 •

edited

Loading

b-ma commented Nov 6, 2023 •

edited

Loading