Add MicrophoneFeed with direct access to the microphone input buffer #108773

goatchurchprime · 2025-07-19T11:44:03Z

Direct access to the microphone buffer is required to reduce audio latency for the use-case of VoIP, and to repair the fundamental design flaw of connecting the audio input stream to the audio output stream under the mistaken assumption that they will always proceed at exactly the same rate for all time on every platform. A diagram of the current design and the improved situation can be seen here.

The following issues are illustrative of the flaw in the current design.

Issue Microphone builds up delay over time #80173
Issue Microphone plays its buffer in an infinite loop on Android when re-enabled #95120
Issue Audio input gets muted after a while on android #86428 -- which works by packing zero values into the audio stream when the buffer under-runs, causing a degradation of the sound quality
Issue Microphone Garbling #112836 (possibly due to the above)
(Observed Godot3) Issue AudioEffectRecord has more latency when using an USB microphone compared to an analog microphone #54544

This PR is a re-implementation of two previous attempts to find an appropriate place for the get_microphone_frames() function:

This also resolves godotengine/godot-proposals#11347 (proposal).

A comprehensive demo is at:
https://github.com/goatchurchprime/godot-demo-projects/tree/gtch/micplotfeed/audio/mic_feed

As discussed at length in the Audio Group meetings, this API is designed to future-proof for the case when there might be more than one microphone data stream, even though only a single microphone input has been hard-coded throughout the AudioDriver code on all platforms.

Accordingly, I have modeled a framework based on the CameraServer and CameraFeed objects and created the new object MicrophoneServer that has a single MicrophoneFeed.

This MicrophoneFeed contains the following functions:

is_active() -> bool
set_active(p_is_active : bool) -> void

get_frames_available() -> int
get_frames(p_frames ->) -> PackedVector2Array
get_buffer_length_frames() -> int

In the class AudioEffectCapture, the function that is equivalent to get_frames() is known as get_buffer().

The function get_buffer_length_frames() is needed to predict when the overflow condition is likely.

fire · 2025-07-20T05:32:29Z

servers/microphone/microphone_feed.cpp

+	return buf.size() / 2;
+}
+
+PackedVector2Array MicrophoneFeed::get_frames(int p_frames) {


This is wrong since it mimics the audio capture api but does not use a ring buffer. You must use a ring buffer.

The AudioDriver input_buffer is already a ring buffer, so I'm not sure where another ring buffer would come into play.

The current audio capture apis (capture effect and record effect) use separate ring buffers because they're receiving a fixed, transient batch of data from the AudioDriver input_buffer (pulled off it in AudioStreamMicrophone) as part of the audio server _mix_step, and need to store it until the user can request it in their own code, outside the audio server thread/loop. This implementation bypasses that, allowing users to pull from the audio driver ring buffer directly. Each feed can have its own 'buffer_ofs', so multiple places in the code can be retrieving microphone data from the same device without stepping on each others toes.

The issue I have with this is it should need to lock the AudioDriver, since input_buffer is the single buffer that is being written to by the driver in the AudioDriver thread. As far as I can tell, not locking will lead to race conditions in multithreaded environments.

So, the locking in the current implementation is inconsistent.

There is no locking applied when the samples are added to the buffer by
AudioDriver::input_buffer_write(int32_t sample), but there is a lock applied when the samples are taken from the buffer by AudioStreamPlaybackMicrophone::_mix_internal(AudioFrame *p_buffer, int p_frames).

That means the locking isn't doing anything useful since it is on only one half of the transaction. Assuming it was meant to do something, I added a corresponding lock into the input_buffer_write() function in my first PR. But there were complaints that I was adding a lock into the very time-sensitive audio thread which could be potentially harmful.

The lack of a lock in input_buffer_write() evidently caused an index out-of-range crash that was mitigated by inserting in an extra boundary test on the index, instead of finding the root cause, which could only be due two threads executing input_position++ at the same time:

void AudioDriver::input_buffer_write(int32_t sample) { if ((int)input_position < input_buffer.size()) { input_buffer.write[input_position++] = sample; if ((int)input_position >= input_buffer.size()) { input_position = 0; } } else { WARN_PRINT("input_buffer_write: Invalid input_position=" + itos(input_position) + " input_buffer.size()=" + itos(input_buffer.size())); } }

In any case, there should never be two threads entering this function since it would result in choppy out-of-order audio chunks being pulled from the operating system and buffered. I've only seen it happen in circumstances when AudioDriverPulseAudio::input_start() was called a second time. Some of the code in my current PR protects this from happening again on the various different platforms.

With regards to the race condition, I think it is safe since the input_buffer is never getting realloced and the MicrophoneFeed::get_frames() function doesn't write to any conflicted values and can tolerate an out-of-date input_position value.

Indeed, there is no point in adding in a lock into this function without adding the corresponding lock into input_buffer_write() function. Unfortunately we don't know what the consequences of acquiring a lock at the rate of 88.2kHz in the audio thread would be, and it has persisted for this long without it being a problem that it would be a risk to change it.

The drivers lock themselves in their own threads before making changes to input_buffer (except maybe the Android driver).

godot/drivers/pulseaudio/audio_driver_pulseaudio.cpp

Line 440 in 71a9948

ad->lock();

godot/drivers/wasapi/audio_driver_wasapi.cpp

Line 765 in 71a9948

ad->lock();

godot/platform/web/audio_driver_web.cpp

Line 405 in 71a9948

driver->lock();

You may be right that accessing input_buffer from multiple threads is fine (due to no reallocation), but it really doesn't feel correct.

I stand corrected. Note that the lock is per chunk of audio, not per sample.

Yes, it was the Android version which had all the bugs I have been most interested in fixing.

Alex2782 · 2025-07-21T11:34:50Z

Many GH checks failed,
here for GHA / 🍎 macOS / Template (target=template_release) (pull_request)Failing after 4m

[doctest] doctest version is "2.4.12"
[doctest] run with "--help" for options
===============================================================================
./tests/core/object/test_class_db.h:879:
TEST SUITE: [ClassDB]
TEST CASE:  [ClassDB] Add exposed classes, builtin types, and global enums
  [ClassDB] Validate exposed classes

./tests/core/object/test_class_db.h:467: FATAL ERROR: REQUIRE_FALSE( !p_context.has_type(p_method.return_type) ) is NOT correct!
  values: REQUIRE_FALSE( true )
  logged: Method return type 'MicrophoneFeed' not found: 'MicrophoneServer.get_feed'.

===============================================================================
[doctest] test cases:    1223 |    1222 passed | 1 failed | 3 skipped
[doctest] assertions: 2349280 | 2349279 passed | 1 failed |
[doctest] Status: FAILURE!

servers/microphone_server.cpp

goatchurchprime · 2025-07-22T17:06:22Z

Work to do:

Do the docs again using https://docs.godotengine.org/en/stable/contributing/documentation/updating_the_class_reference.html#updating-class-reference-when-working-on-the-engine

Try using:
https://docs.godotengine.org/en/stable/classes/class_engine.html#class-engine-method-get-singleton

Remove vector&

lock in the android version and locking in the get buffer.

goatchurchprime · 2025-10-14T18:15:02Z

@adamscott

The deeper proposal is to separate the Audio Input (microphone code) from Audio Output (speakers) and move this code into a default microphone feed class.

Unfortunately the Godot codebase does not make this separation easy to do.

Here is why.

The Godot AudioDriver class (which contains the single microphone input buffer ) has a derived class for each platform: AudioDriverPulseAudio, AudioDriverXAudio2, AudioDriverWeb, AudioDriverOpenSL, AudioDriverALSA, AudioDriverCoreAudio, and AudioDriverWASAPI.

Each of these classes manages the single input and single output for that platform -- sometimes with the same function.

For example, in AudioDriverWASAPI there is a 300 line initialization function that has a parameter p_input to say whether it is setting up the input or the output device:

Error AudioDriverWASAPI::audio_device_init(AudioDeviceWASAPI *p_device, bool p_input, bool p_reinit, bool p_no_audio_client_3)

Here are the options of what can be done.

Option 1: Separate the Audio Input and Output at a deep level

This requires me to write a new AudioInputDriver class along with 7 derived classes AudioInputDriverPulseAudio, AudioInputDriverXAudio2, AudioInputDriverWeb, AudioInputDriverOpenSL, AudioInputDriverALSA, AudioInputDriverCoreAudio, and AudioInputDriverWASAPI and cut-and-paste the input related functions into each of these for each platform.

The case of AudioInputDriverWASAPI would require me to copy-paste that 300 line audio_device_init() function, which would cause complaints about code duplication.

Option 2 (what I have implemented): Touch the AudioDriver code as little as humanly possible so nothing breaks:

Leave all the code and its technical debt (in relation to the feature of multiple microphones) in place, including its single microphone input buffer on which the 7 different platform implementations depend, and simply access this single buffer from the one and only MicrophoneFeed.

Option 3: Something in between

Leave the AudioDriver class and its 7 derived classes as they are but extend the core function AudioDriver::input_buffer_write(int32_t sample) to take a microphone_feed_id parameter that enables each platform to push the audio data to one of multiple microphone buffers in the core class.

Since none of the platforms implement multiple microphones, the value of microphone_feed_id will always be zero. So although I could write code that looked like it did something in terms managing and looking up multiple MicrophoneFeed objects, it would do nothing more than the current PR, so I think what is in the PR is more honest because it is not trying to fool anyone with code that doesn't do anything.

In my opinion the appropriate time to implement the code that can manage multiple MicrophoneFeeds is when at least one of the platforms' AudioDrivers has been extended to support it, because otherwise its implementation will be speculative and likely to be wrong.

BuzzLord · 2025-10-14T19:27:37Z

I made a similar implementation of a MicrophoneServer/MicrophoneFeed (branch here) but with a slightly different interface. I wasn't happy with some aspects of it (I forget specifically what, though...), and so didn't turn it into a PR after seeing this PR, but maybe we can pull something from it into here.

My implementation has the MicrophoneServer create a MicrophoneFeed on demand, based on an input_device name (which comes from the AudioServer.get_input_device_list). Multiple feeds can be used to record mic data independently, since each feed has its own buffer offset into the AudioDriver input_buffer.

However, since only one real microphone exists in the AudioServer (as currently implemented), the feed will throw an error if you try to start recording with a different device than the active one. I removed access to setting the input_device via AudioServer, and only let it happen through the MicrophoneServer/Feed.

mrTag · 2025-11-16T11:16:19Z

I talked to @goatchurchprime at GodotFest and found out about this PR that way 🙂 I implemented a voip GDExtension for our next game, too (still very much WIP, but can be found here: https://github.com/mrTag/listenclosely ) and I'm sharing the pain of the currently very cumbersome and delay inducing way of accessing the microphone stream in a GDExtension.

With the current wave of indie "Friend Slop" games (Peak, R.E.P.O. and the like) it is very important for Godot to have a good voip solution ready and for that we need a good way to access the microphone stream directly. I think the MicrophoneFeed solution with "Option 2" is a good one, since it doesn't change the core all that much and so it can be ready quickly, while still being open to extensive changes in the background down the line.

All this to say: Please don't forget about this PR, as the issue is more important than you might think. Thank you!

allenwp · 2025-11-24T19:17:02Z

Although this PR is related to a number of existing issues that are linked in the main description, this PR does not fix or close a number of these issues; instead, it simply provides a workaround. The primary purpose of this PR is to close the proposal for direct access to microphone input for VoIP. The other issues will still remain open.

(Said differently, this PR description should not use the "fixes #..." for issues that it does not fully resolve; it's better to just mention them as "related to". It's good to have a list of related issues and also good to distinguish which issues are actually closed by this PR.)

goatchurchprime · 2025-11-24T20:48:47Z

I had other words there originally, but then someone on RocketChat told me:

It actually needs to be “fixes #xxxx” or “closes #xxxx” for GitHub to pick up on it. By writing “fixes issue #xxxx”, GitHub won’t pick up on the link.

Would you like me to change them to, "Doesn't exactly fix#"?

allenwp · 2025-11-24T21:08:30Z

Yes, that was me; My intent was that any issues specifically fixed by this PR should be linked as "fixes #...", but I wasn't aware at the time that this PR doesn't actually fully resolve those issues, but instead provides a workaround and leaves the original issue unresolved.

My mistake! Sorry!

goatchurchprime requested review from a team as code owners July 19, 2025 11:44

goatchurchprime changed the title ~~Gtch/micserver~~ Add MicrophoneFeed with direct access to the microphone input buffer Jul 19, 2025

goatchurchprime mentioned this pull request Jul 19, 2025

MicRecord Demo was wrong when change settings. godotengine/godot-demo-projects#1230

Open

fire reviewed Jul 20, 2025

View reviewed changes

Kaleb-Reid reviewed Jul 22, 2025

View reviewed changes

servers/microphone_server.cpp Show resolved Hide resolved

Calinou added enhancement topic:audio labels Jul 22, 2025

Calinou added this to the 4.x milestone Jul 22, 2025

goatchurchprime mentioned this pull request Aug 3, 2025

Record audio problems on Android goatchurchprime/two-voip-godot-4#53

Open

fire mentioned this pull request Aug 4, 2025

Create AudioStreamWithEffects #107523

Open

goatchurchprime requested review from a team as code owners August 28, 2025 15:52

AThousandShips removed request for a team September 30, 2025 08:59

new MicrophoneServer and MicrophoneFeed classes

a807709

goatchurchprime force-pushed the gtch/micserver branch from aff6747 to a807709 Compare October 1, 2025 16:24

This was referenced Oct 11, 2025

Demo of MicrophoneFeed with shader showing the captured waveform in stereo godotengine/godot-demo-projects#1264

Draft

Show AudioEffectCapture by use of a shader showing the captured waveform in stereo godotengine/godot-demo-projects#1172

Closed

goatchurchprime mentioned this pull request Nov 28, 2025

AudioServer to have function to access microphone buffer directly #113288

Open

Uh oh!

Add MicrophoneFeed with direct access to the microphone input buffer #108773

Are you sure you want to change the base?

Add MicrophoneFeed with direct access to the microphone input buffer #108773

Conversation

goatchurchprime commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fire Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

BuzzLord Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

goatchurchprime Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

BuzzLord Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

goatchurchprime Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Alex2782 commented Jul 21, 2025

Uh oh!

Uh oh!

goatchurchprime commented Jul 22, 2025

Uh oh!

goatchurchprime commented Oct 14, 2025

Option 1: Separate the Audio Input and Output at a deep level

Option 2 (what I have implemented): Touch the AudioDriver code as little as humanly possible so nothing breaks:

Option 3: Something in between

Uh oh!

BuzzLord commented Oct 14, 2025

Uh oh!

mrTag commented Nov 16, 2025

Uh oh!

allenwp commented Nov 24, 2025

Uh oh!

goatchurchprime commented Nov 24, 2025

Uh oh!

allenwp commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

goatchurchprime commented Jul 19, 2025 •

edited

Loading