New .NET/Unity SpeechlyClient audio pipeline #4

arzga · 2022-04-04T14:27:40Z

What

New SpeechlyClient audio pipeline

IN: Constantly feed new audio to SpeechlyClient with SpeechlyClient.ProcessAudio().
Downsample audio to 16kHz if needed (controlled by inputSampleRate in constructor)
Add audio to history ringbuffer (controlled by HistoryFrames and FrameSamples in constructor)
Energy threshold calculation (enabled by EnergyTresholdVAD in constructor)
Automatic VAD Start/StopContext control (enabled by EnergyTresholdVAD.ControlListening = true)
OUT: Send utterances to files (enabled by SaveToFolder = "folder" in constructor)
OUT: Send utterances to Speechly cloud SLU (controlled by UseCloudProcessing = true in constructor)

SpeechlyClient now prefers constant audio streams

SpeechlyClient for .NET/Unity is refactored to constantly process audio chunks with ProcessAudio(float[] inputSamples, start, length). This enables it to automatically control listening using a VAD implementation.

StartContext/StopContext calls can be used to control listening when VAD is not in use. However, it's still recommended to stream audio constantly. Old way of streaming only after StartContext() works, but is discouraged.

Adaptive energy threshold VAD controls hands-free listening

SpeechlyClient comes with optional adaptive energy threshold VAD. It is configured with minimum energy level, a signal-to-noise ratio, minimum activation time and an activation/deactivation treshold (ratio of loud to silent frames). When enough loud frames have been detected, VAD activates and calls StartContext automatically. When enough silent frames have been detected, the VAD deactivates after the sustain time and StopContext is called automatically. The background noise energy gradually adapts when VAD is not active.

History buffer captures the beginning of utterances

SpeechlyClient maintains a configurable ringbuffer of recent audio frames. The size of the history is determined by historyFrames, each containing frameSamples. History is sent upon StartContext to capture the start of utterance which especially important with VAD, which activates with a constant delay.

All SpeechlyClient features can be dry-run on the command line with `dotnet`

Only the microphone implementation (MicToSpeechly.cs) is Unity-specific. SpeechlyClient features can be run with prerecorded audio on the command line with little setup. Some command line example setups can tried in speechly-client-net-standard-2.0/ folder:

dotnet run test processes an example file, sends to Speechly cloud SLU and prints the received results in console.
dotnet run vad processes an example file, sends the utterances audio to files in temp/ folder as 16 bit raw and creates an utterance timestamp .tsv (tab-separated values) for each audio file processed.
dotnet run vad myaudiofiles/*.raw processes a set of files with VAD.

New functions and callbacks

StartStream should be called at start of a continuous audio stream. It resets the stream sample counters and history. For backwards compability, ProcessAudio and StartContext ensure it's been called.
StopStream should be called at the end of a continuous audio stream.
OnStreamStart is triggered upon a call to StartStream
OnStreamStop is triggered upon a call to StopStream
OnContextStart is triggered upon a call to StartContext. This can be used to track VAD activation.
OnContextStop is triggered upon a call to StopContext. This can be used to track VAD deactivation.

dotnet run vad logging (see previous chapter) is implemented with OnStreamStart, OnStreamStop, OnContextStart and OnContextStop callbacks. These can be also used to log usage metrics.

IDecoder interface

CloudDecoder is now an optional module, passed with Initialize(). It implements a IDecoder interface.

Death of ClientState

Removed ClientState that was a 1-to-1 port from browser-client. It did not reflect the internal workings of Unity SpeechlyClient as there are several differences:

Initializing SLU engine is an optional step, rendering Connecting and Preinitialized states optional. If you use SLU, you can do await Initialize(new CloudDecoder(...)); or check the state of new IsReady boolean.
Microphone implementation is external to Unity SpeechlyClient. Thus SpeechlyClient has no knowledge of that rendering and Initializing state useless.
IsListening boolean has replaced Starting/Recording/Stopping states both in browser-client and Unity SpeechlyClient.

Why

To enable use of all SpeechlyClient features with any audio source, not just the microphone.

Help connect SpeechlyClient to existing audio stacks that already use the microphone, e.g. Vivox
Enable batch processing files, e.g. to test performance of SpeechlyClient features like VAD.
Abstract decoder interface to enable plugging in a OnDeviceDecoder (not part of this PR) instead of a CloudDecoder

… AnalyzeFrame function.

… instead of SendAudio.

…eechlyConfig class from constructor.

… Preliminarily added UseCouldSpeechProcessing flag.

…tch processing. a .tsv (tab separated values) file is written if logUtteranceFolder parameter is provided to SpeechlyClient initializer. BeginStream and EndStream calls wrap Start/StopContext to ensure log file is written and to provide a meaningful stream identifies, e.g. an audio file name.

…mands.

…tartContext task wasn't completed until next update round and resulted in thread freeze.

…Client.

… is activated.

…veraging new OnStreamStart/Stop and OnContextStart/Stop callbacks.

…ier.

…both of which implement the IDecoder interface.

…calls are now immediate and any async functionality goes thru websocket (WS) send queue. Introduced WS Send task. More graceful dispmantling of WS. Debug print flag is honoured by decoders.

…d them). Removed async signatures from methods that no longer need them.

…oning of Unity client too well. Added IsReady to check the status of SLU Decoder instead.

…ocs.sh script.

…ed some fields and classes internal. Regenerated docs.

speechly-client-net-standard-2.0/Speechly/SLUClient/CloudDecoder.cs

speechly-client-net-standard-2.0/Speechly/SLUClient/SpeechlyClient.cs

docs/README.html

…ucted from ApiUrl, which is now passed without path.

arzga and others added 17 commits March 25, 2022 09:41

Data streaming is now frame-based. Refactored MicToSpeechly code into…

e61c8eb

… AnalyzeFrame function.

Added ProcessFrame to SpeechlyClient. SendAudioFile uses ProcessFrame…

22cbd81

… instead of SendAudio.

WIP Write files without sending them to cloud.

d813917

Local file saving can be enabled by defining saveToFolder. Removed Sp…

e6c269b

…eechlyConfig class from constructor.

Initial VAD implementation in dotnet code

7cfa6dc

Refactored MicToSpeechly to use EnergyTresholdVAD thu SpeechlyClient.…

1791795

… Preliminarily added UseCouldSpeechProcessing flag.

Callbacks for StartStream, StartContext. CLI accepts test and vad com…

96f4f30

…mands.

Fixed Unity MicToSpeechly and concurrencly problems when using VAD: S…

14aed4e

…tartContext task wasn't completed until next update round and resulted in thread freeze.

WIP moved audio frame processing out of MicToSpeechly and to Speechly…

04d3f6b

…Client.

Introduced a history ring buffer in SpeechlyClient. Using it when VAD…

d076168

… is activated.

Synced NET and Unity folders

d371502

Downsampling added to audio pipeline

3e0f635

Clened up and commented downsampling code

e5eb1d0

Exposed frameMillis and frameHistory in SpeechlyClient

1af7583

Moved utterance logging from SpeechlyClient to SpeechlyClientTest, le…

4153aa4

…veraging new OnStreamStart/Stop and OnContextStart/Stop callbacks.

ProcessAudio ensures StartContext has been called if not done so earl…

1f3b953

…ier.

arzga changed the title ~~Feature/dotnet vad~~ New SpeechlyClient audio pipeline Apr 5, 2022

arzga changed the title ~~New SpeechlyClient audio pipeline~~ New .NET/Unity SpeechlyClient audio pipeline Apr 5, 2022

arzga marked this pull request as ready for review April 5, 2022 06:10

arzga added 10 commits April 7, 2022 17:04

SpeechlyClient can connect to both CloudDecoder and OnDeviceDecoder, …

fd6d03f

…both of which implement the IDecoder interface.

Synced Unity and NET files

e3413a0

Fixed Android build

2358b33

Backported: Fix for occasional websocket breakage. Start/StopContext …

d46c233

…calls are now immediate and any async functionality goes thru websocket (WS) send queue. Introduced WS Send task. More graceful dispmantling of WS. Debug print flag is honoured by decoders.

Synced .NET files with Unity. Fixed missing Fetch implementation.

6a589c4

Updated README and unitypackage

339d03d

AutoControlListening method to skip waiting for contextIds (won't nee…

47fee54

…d them). Removed async signatures from methods that no longer need them.

Did away with SpeechlyState which did not reflect the internal functi…

4e3dab0

…oning of Unity client too well. Added IsReady to check the status of SLU Decoder instead.

Added preliminary documentation using XML comments, DocFX and build-d…

edfacdc

…ocs.sh script.

SpeechlyClientTest fix

47286ef

Updated unitypackage

49b905b

github-pages bot temporarily deployed to github-pages April 12, 2022 10:47 Inactive

Docs typo fixes

3a2380b

github-pages bot temporarily deployed to github-pages April 12, 2022 11:06 Inactive

Docs additions

ab05ad1

github-pages bot temporarily deployed to github-pages April 12, 2022 11:23 Inactive

Docs updates

16e94eb

github-pages bot temporarily deployed to github-pages April 12, 2022 11:56 Inactive

Rearranged .NET code into SLUClient, Tools and Types assemblies. Mark…

84f2bcc

…ed some fields and classes internal. Regenerated docs.

github-pages bot temporarily deployed to github-pages April 12, 2022 20:14 Inactive

Regenerated unitypackage

d14529a

github-pages bot temporarily deployed to github-pages April 12, 2022 20:15 Inactive

Segment update methods are now internal

b654a08

github-pages bot temporarily deployed to github-pages April 12, 2022 20:24 Inactive

langma reviewed Apr 13, 2022

View reviewed changes

Changed IsListening to IsActive. Removed CloudDecoder LoginUrl is ded…

ff45a5a

…ucted from ApiUrl, which is now passed without path.

github-pages bot temporarily deployed to github-pages April 14, 2022 07:48 Inactive

arzga merged commit 3765c08 into main Apr 14, 2022

arzga deleted the feature/dotnet-vad branch April 14, 2022 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New .NET/Unity SpeechlyClient audio pipeline #4

New .NET/Unity SpeechlyClient audio pipeline #4

Uh oh!

arzga commented Apr 4, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New .NET/Unity SpeechlyClient audio pipeline #4

New .NET/Unity SpeechlyClient audio pipeline #4

Uh oh!

Conversation

arzga commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

New SpeechlyClient audio pipeline

SpeechlyClient now prefers constant audio streams

Adaptive energy threshold VAD controls hands-free listening

History buffer captures the beginning of utterances

All SpeechlyClient features can be dry-run on the command line with dotnet

New functions and callbacks

IDecoder interface

Death of ClientState

Why

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arzga commented Apr 4, 2022 •

edited

Loading

All SpeechlyClient features can be dry-run on the command line with `dotnet`