Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest adding StreamSplitter::pullInto() to reduce memory allocation when recording samples #395

Closed
dpgeorge opened this issue Nov 14, 2023 · 8 comments
Assignees
Milestone

Comments

@dpgeorge
Copy link

When recording a stream of audio from the microphone using StreamSplitter, there is potentially an unnecessary allocating of intermediate buffer space, and unnecessary copying of data.

In MicroPython, to record audio one first creates a big buffer to hold the entire sample (eg 5 seconds) and then records into that buffer. It would be convenient and more efficient if CODAL could be told "record into this buffer directly".

With the current architecture of streaming and pullRequest() methods and different consumers with different sample rates, it might be difficult to add the functionality for a zero-copy interface to record samples. So instead I suggest adding a method StreamSplitter::pullInto(uint8_t *buf, size_t len). This is the same as StreamSplitter::pull() except that it uses the given buffer instead of allocating the outData buffer and returning that. This will eliminate a memory allocation for each call to this function, and also eliminate a subsequent memcpy().

@JohnVidler
Copy link
Collaborator

I think this is a pretty good suggestion for cases where 'userland' runtimes want more direct C-alike access to memory from a stream. To this end, I plan to roll this in with the changes required to support/fix #394 and #321 and squash all three bugs at once.

Unfortunately I'm on leave for the next week, and will be back in the office on Monday 27th, so I'll pick this up then as a priority.

@JohnVidler
Copy link
Collaborator

Related: #365 and #284

@JohnVidler JohnVidler added the p0 label Nov 16, 2023
@JohnVidler
Copy link
Collaborator

Note: Using this issue to track the discussion for implementation to address all the linked issues here.

@JohnVidler
Copy link
Collaborator

With the changes currently pending merge to address the sample rate change issues I've also now included a uint8_t * SplitterChannel::pullInto( uint8_t * rawBuffer, int length ) method as part of the SplitterChannel.

This lets you supply an arbitrary block of data, and have the benefits of automatic resampling along with keeping the 'contract' between stream classes to keep the microphone auto-wake functions working. The return pointer is set to the next byte in memory if you're using a large buffer with multiple samples stored sequentially, so by tracking the return between calls you can automatically wander the larger buffer.

If the supplied memory is smaller than required, it will be completely filled and the data will be truncated; if the memory is longer than required, the complete upstream buffer will be copied in and the rest of the memory will be untouched.

Thanks for everyone's patience on this issue, its hit quite a few things at once :)

@dpgeorge
Copy link
Author

if the memory is longer than required, the complete upstream buffer will be copied in and the rest of the memory will be untouched.

Testing out this new pullInto() method, I'm not sure the above behaviour is implemented correctly. Passing in a large buffer, it seems to just keep on sampling the input buffer past its end, and only stops once it has filled the given output buffer. This means that you have to request exactly the number of bytes that are available, and it's not really possible to know this.

In the code you can see that the loop to copy the samples from input to output uses while( outPtr - output < length ), and if a buffer is passed in then length is just taken as the passed-in length. The following patch gets it working for me:

--- a/source/streams/StreamSplitter.cpp
+++ b/source/streams/StreamSplitter.cpp
@@ -74,6 +74,10 @@ ManagedBuffer SplitterChannel::resample( ManagedBuffer _in, uint8_t * buffer, in
     if( output == NULL ) {
         output = (uint8_t *)malloc( samplesPerOut * bytesPerSample );
         length = samplesPerOut * bytesPerSample;
+    } else {
+        if (length > samplesPerOut * bytesPerSample) {
+            length = samplesPerOut * bytesPerSample;
+        }
     }
 
     int oversample_offset = 0;

@JohnVidler
Copy link
Collaborator

Testing out this new pullInto() method, I'm not sure the above behaviour is implemented correctly. Passing in a large buffer, it seems to just keep on sampling the input buffer past its end, and only stops once it has filled the given output buffer.

Ah! Good catch! The pullInto() method was less completely tested (I don't have a use case for this inside CODAL itself) so I missed that one. I'll merge your proposed patch into master and can you let me know if everything then works with Python when built against it all? If so, we'll call this the next minor rev with a couple of other smaller merges.

@JohnVidler JohnVidler reopened this Feb 26, 2024
@microbit-carlos microbit-carlos modified the milestones: v0.2.65, v0.2.66 Feb 26, 2024
@JohnVidler
Copy link
Collaborator

Aaand patched, pushed, tagged and released. Might as well get this out the door ASAP so you have everything you need for Python @dpgeorge 👍

For ref: https://github.com/lancaster-university/codal-microbit-v2/releases/tag/v0.2.66

@dpgeorge
Copy link
Author

Thanks for the quick patch and release! I have tested it and it works. I have updated all our code to use the new v0.2.66 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants