Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Conversation

@arzga
Copy link
Contributor

@arzga arzga commented Mar 23, 2022

  • Triggering energy treshold VAD is now calculated from history frames
  • Redesigned control parameters

image

@arzga arzga marked this pull request as ready for review March 23, 2022 13:36
public int SendHistoryMillis = 200;
public int FrameMillis = 30;
[Range(1, 32)]
public int HistoryFrames = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather small number of history frames, which is good for the responsiveness, but might make it hard to tune the VAD for good accuracy. As long as this works, I'm fine with the parameters, just pointing out that this could be higher.

Copy link
Contributor Author

@arzga arzga Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the same, although it should match and in practice works rather well. There was one occasion with missing leading audio, so if you don't mind I'll add a separate setting for total history (which will be sent upon activation) and number of frames used for activity analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added VADHistory to control the number of frames analyzed, and HistoryFrames which is the total history size.

// Start audio capture
clip = Microphone.Start(CaptureDeviceName, true, MicBufferLengthMillis / 1000, MicSampleRate);

int micBufferMillis = FrameMillis * HistoryFrames + 500;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 500 the context (in milliseconds) which will be sent after vad activation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 500ms is the buffer reserved for incoming audio signal. In addition to this, there is the history. Audio is received (at least on desktop) in 300..400 sample chunks so 500 ms should be plenty. For reasons unknown to me, Unity wants the total buffer to be of integer length (in seconds).

@arzga arzga merged commit 94f3fc3 into main Mar 23, 2022
@arzga arzga deleted the feature/noise-gate-with-history-frames branch March 23, 2022 17:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants