-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/noise gate with history frames #3
Conversation
| public int SendHistoryMillis = 200; | ||
| public int FrameMillis = 30; | ||
| [Range(1, 32)] | ||
| public int HistoryFrames = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather small number of history frames, which is good for the responsiveness, but might make it hard to tune the VAD for good accuracy. As long as this works, I'm fine with the parameters, just pointing out that this could be higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of the same, although it should match and in practice works rather well. There was one occasion with missing leading audio, so if you don't mind I'll add a separate setting for total history (which will be sent upon activation) and number of frames used for activity analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added VADHistory to control the number of frames analyzed, and HistoryFrames which is the total history size.
| // Start audio capture | ||
| clip = Microphone.Start(CaptureDeviceName, true, MicBufferLengthMillis / 1000, MicSampleRate); | ||
|
|
||
| int micBufferMillis = FrameMillis * HistoryFrames + 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 500 the context (in milliseconds) which will be sent after vad activation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 500ms is the buffer reserved for incoming audio signal. In addition to this, there is the history. Audio is received (at least on desktop) in 300..400 sample chunks so 500 ms should be plenty. For reasons unknown to me, Unity wants the total buffer to be of integer length (in seconds).
Uh oh!
There was an error while loading. Please reload this page.