Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental microphone support #19106

Merged
merged 18 commits into from
Aug 11, 2018
Merged

Conversation

SaracenOne
Copy link
Member

@SaracenOne SaracenOne commented May 22, 2018

Okay, following @reduz’s advice, I decided to post a PR for this feature I was working on even though it is not complete and definitely not ready for merging. Given that I knew next to nothing about audio encoding before starting this, this has largely been learning experience, but given some walls I have run up against with trying to make this feature work correctly, I feel I should put out what I have already done in an attempt to get feedback, and also support perhaps from people with more experience in this particular field. I'm led to understand that @marcelofg55 in particular might be able to help out with this.

In my discussion with @reduz, the idea behind this particular approach was to add microphone support directly through the AudioStream interface which could then in theory go through AudioMixer for processing. This idea actually introduced somewhat more complexity than my original attempt at the interface, and necessitated in the introduction of a kind of proxy interface between the audio driver, server, and individual streams. This comes in the form of MicrophoneRecievers and MicrophoneDeviceOutputs. The MicrophoneDeviceOutputs are the primary endpoints that the actually drivers for each capture device. MicrophoneRecievers are invisibly assigned by the new AudioStreamMicrophone classes when playback is requested which then allow these classes to decode the microphone buffers themselves and go through the standard audio mixing process. The presence of receivers also automatically control whether a capture device should be active or not. These classes are also heavily virtualised as they also support non-physical capture devices such as the ‘default’ endpoint which can also be changed while the engine is already running.

I will stress further that this is NOT a fully functional implementation of microphones; it is merely a proof of concept which has severe audio clipping and which will actually crash 10 seconds after starting a microphone stream due to a buffer overflow. It is only meant to demonstrate the underlying design of the interface and only currently supports one audio driver. It also probably requires some further cleanup and refactoring.

While most of the issues can be addressed fairly easily, the main thing I am having trouble with right now is how to solve the audio clipping problem. The main issue lies in synchronising the audio capture with the audio stream processing; audio capture packets currently seem to come in at a bigger size than they get decoded at, meaning that the only way to keep them in sync currently has been to only process part of them in the AudioStream, clipping off the end, otherwise greater and greater latency gets introduced between the capture and playback. If anyone can figure out how to handle this issue, please let me know or even send me another pull request.

I’ll list a couple of the things I feel still need to be addressed before this PR would ready for merging:

  • Keep captured audio in sync with output without clipping
  • Default audio endpoints which can be changed while the engine is running
  • Allow opening of arbitrary microphone devices.
  • AudioStreamMicrophone resampling.
  • Fix editor crash on exit
  • WASAPI support
  • PulseAudio Support
  • CoreAudio Support

While there are many local applications for microphone support, the main motivation for this feature is to provide a way of support VOIP. This would likely be achieved by compressing audio packets through Opus, sending them through networking interface and then decoding them for the clients. While it would likely take the form of another interface sitting atop the basic microphone input interface, it would be nice to have a sample implementation of such a feature developers could easily integrate into their own games to instantly have VOIP support.

Lastly, I have also included an extremely basic sample project to demonstrate this feature. Remember that this currently only supports listed supported drivers.
godot_mic_test.zip

@marcelofg55
Copy link
Contributor

Note: this will fix #13268 once merged.

@SaracenOne
Copy link
Member Author

Special thanks to @marcelofg55 for providing a patch which seems to have more or less fixed the audio clipping problems.

@SaracenOne SaracenOne force-pushed the audio_mic branch 2 times, most recently from 8419761 to 6b9ddd5 Compare May 24, 2018 21:03
@marcelofg55
Copy link
Contributor

As to what was spoken on IRC a few days ago it would be good to convert this to use a single microphone device at a time, since it's really not necessary to have multiple mics active at a time and will make the code simpler. So instead of a Vector of microphone_device_outputs it can be a single object.
Also @reduz suggested to rename AudioStreamMicrophone to AudioStreamInput since we could also input from line-in inputs.
If I can help you out with this let me know :).

@hungrymonkey
Copy link
Contributor

@SaracenOne I solve the audio clipping by adding a lockless buffer.

I am only trained in c++11, so I cannot provide my lame implementation of a multi producer single consumer.

I found it to be simpler to just make audio output an bindable signal

@SaracenOne
Copy link
Member Author

@marcelofg55 Sorry, I've been a bit busy over the past few weeks. I'll return to this PR soon. Interestingly, an earlier attempt actually did use this naming convention.
@hungrymonkey Hmm, I'll look more into it.

@mhilbrunner
Copy link
Member

Whats the status of this? :)

@marcelofg55
Copy link
Contributor

I could continue with this if @SaracenOne is busy, but I don't want to step on his toes if he wants to continue it by himself :)

@mhilbrunner
Copy link
Member

mhilbrunner commented Jul 3, 2018

Just saying: Right now there's still a little time left to get this into a 3.1 release ;)

@SaracenOne
Copy link
Member Author

Sorry, I actually put this to one side while I started working on a high level networking layer which would actually incorporate the VOIP aspect of this eventually. I do intend to return to this soon, but I have no problem if someone else wants to, say for example, write more driver backends in the meantime.

@marcelofg55
Copy link
Contributor

@SaracenOne I'll work on the CoreAudio driver then.

@SaracenOne
Copy link
Member Author

@marcelofg55 That would be awesome! I actually don't have immediate access to a Mac at the moment, so that would be especially useful. One thing though: I'm not sure if iOS has a similar permissions model to Android, but I know that on Android at least, it will be necessary to poll for permission to access the mic.

@marcelofg55
Copy link
Contributor

Mind if I convert the code to use a single receiver? Instead of multiple receivers.

@SaracenOne
Copy link
Member Author

@marcelofg55 I guess if you want to do that too, sure thing ;)

@hungrymonkey
Copy link
Contributor

@SaracenOne If you want, i can help with the voip stuff. Somebody needs to refactor the high level network to allow packets for different purposes. Currently, the Scenetree basically takes controls over the p2p networking.

@SaracenOne
Copy link
Member Author

@hungrymonkey Although the multiplayer API does take control of the SceneTree and deals primarily with direct RPC calls on nodes with path caching. This system makes it very easy to get networking up and running, but hard to do anything more complicated than that. For example, there's no way to do replication meaning that players can't join servers where the game has already started for example. However, it is actually still possible to send generic packets through this system too which can be parsed however you like.

The way I'm approaching networking myself is writing a higher level library on top of the already existing multiplayer API which is more specific to a particular type of game logic, but still pretty broad (basically any kind of which involves one map per server and a bunch of networked entities moving around, but doesn't cover more specific cases like fighting or RTS games). It's currently up and running, but my intention is to go further and implement the networked occlusion system described in this Halo talk (https://www.gdcvault.com/play/1014345/I-Shot-You-First-Networking) to make it highly scalable. For me personally, I'm happy to just manually append extra data to the packets to send VOIP data manually, but it's @reduz's call if he might want to make VOIP data part of the standard multiplayer API, which would still be pretty easy to implement and would probably make it easier for developers working with the vanilla multiplayer API who simply want to get VOIP up and running as soon as possible.

The main thing we need though at this point is code to get audio data out of the mixer and back into the game logic so we can start prepping it for going over the networking. We will also need to integrate the Opus codec too for properly compressing the audio.

@hungrymonkey
Copy link
Contributor

hungrymonkey commented Jul 3, 2018

@SaracenOne You really dont need to make it part of the multiplayer API. I just need control of an extra few channels in which I can send an receive packets. If that is possible, then I can remove my crash connect code because the VOIP will be self contained in its own module.

I am currently using protobufs to create my own mumble like VOIP protocol.

Oh ok, you want a generic VOIP protocol in the network?

Use something like protobufs and flatbuffers

Edit: If you guy fix the network thing. I would donate code for VOIP.

@SaracenOne
Copy link
Member Author

@hungrymonkey In which case, I THINK it should already be possible, but you might still have to run that by @reduz. I know you can manipulate the channel you want to use by directly interacting with the ENet peer (since the concept 'channels' are not something universal to all networking models, Bluetooth for example), but I think one of the reasons you might still have to go through the MultiplayerAPI's send_bytes method is because I believe the MultiplayerAPI looks at the first few bytes to determine whether or not the incoming command is an RPC call. Beyond that, the only other alternative I can think of would be to establish a secondary connection on a completely different port.

@hungrymonkey
Copy link
Contributor

@SaracenOne I dont need to manipulate channels. I just need an extra channels to mess things around. I attempted to do it with the current implementation and it didnt work. It seems like Fabless made a few changes lately.

The largest issue is that Scenetree seems to dequeue all packets. When I try to grab the packet I need, scenetree already dequeue it.

@SaracenOne
Copy link
Member Author

@hungrymonkey Hmm, in which case I can't be sure. I'll see if I can find any more info.

@SaracenOne
Copy link
Member Author

@hungrymonkey By the way, I'm having a look through your VOIP implementation right now...

@hungrymonkey
Copy link
Contributor

@SaracenOne look into the happytree_master

some of the branches are just old code. I should delete them but I am procrastinating

I would have to refactor it into a godot server.

@mhilbrunner
Copy link
Member

cc @Faless ^ networking VOIP discussion, see above

@SaracenOne
Copy link
Member Author

I think me and @marcelofg55 decided that this PR is probably in a suitable enough state to get merged now. That might be issues that could still be ironed out, but we're actually somewhat collectively limited into what we can test with this due to not having access to a wide array of capture hardware. At the very least, this now covers support for at least one driver for the three major support desktop platforms and seems to be working now.

The only thing I would likely to now know is how to get the audio data out of the mixer and back into the code so I can start looking into integrating @hungrymonkey's VOIP example.

@reduz
Copy link
Member

reduz commented Aug 10, 2018

@akien-mga this is ok to me, but is the PR format in the right shape for you?

@akien-mga
Copy link
Member

Seems good to me too, quite a few commits compared to our usual standard but I like that it shows the collaboration between two contributors :)

@akien-mga akien-mga merged commit 73cf0fd into godotengine:master Aug 11, 2018
@akien-mga akien-mga changed the title [WIP] Experimental microphone support Experimental microphone support Aug 13, 2018
@akien-mga akien-mga mentioned this pull request Aug 13, 2018
@hungrymonkey
Copy link
Contributor

@SaracenOne i took a look at the mic, i guess i will have to refactor my code again since I need to add an explicit poll to my voip tree

@Chaosus Chaosus mentioned this pull request Sep 9, 2018
@nicoechaniz
Copy link

Is mic support already usable from GDScript in beta9 ? Any docs?

@LinuxUserGD
Copy link
Contributor

LinuxUserGD commented Mar 5, 2019

(*) beta 11

@marcelofg55
Copy link
Contributor

@ZX-WT
Copy link
Contributor

ZX-WT commented Mar 21, 2019

@marcelofg55 Would you document this feature? If not I would like to write a simple tutorial for this section but I would like to know whether I can record from microphone just from the editor, I looked at the demo and understand that it can record in game.

@marcelofg55
Copy link
Contributor

@marcelofg55 Would you document this feature? If not I would like to write a simple tutorial for this section but I would like to know whether I can record from microphone just from the editor, I looked at the demo and understand that it can record in game.

If you want to document it go ahead. The feature is meant for in-game recording.

@ArdaE
Copy link
Contributor

ArdaE commented Jul 4, 2019

Microphone recording doesn't work on iOS. I've tried the demo project from https://github.com/godotengine/godot-demo-projects/tree/master/audio/mic_record, but the audio data that is recorded is all zeros.

I added "Privacy - Microphone Usage Description", aka NSMicrophoneUsageDescription, to the app's ...-Info.plist, which is required on iOS before one can use the microphone. While that did trigger a permission request question to the user, as expected, the data that is recorded is still all zeros even after I grant the app permission to use the microphone. Should I create a new bug report for this or would it be tracked here?

There are two issues that need to be fixed for iOS:

  1. The NSMicrophoneUsageDescription entry should be generated automatically, similar to the "Camera usage description" and "Photolibrary Usage Description" that exist in the iOS export settings.
  2. The audio recording should work like on other platforms.

P.S. I'm assuming I haven't forgotten a step. If you can get the demo project working on iOS, please let me know. By the way, the demo works fine on macOS, but iOS and macOS still have slightly different CoreAudio implementations as far as I know despite recent attempts to make them more similar (though I'm not an expert on macOS).

@fire
Copy link
Member

fire commented Jul 4, 2019

I would create a specific bug for iOS's microphone is not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.