Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Conversation

@arzga
Copy link
Contributor

@arzga arzga commented Apr 4, 2022

What

New SpeechlyClient audio pipeline

  • IN: Constantly feed new audio to SpeechlyClient with SpeechlyClient.ProcessAudio().
  • Downsample audio to 16kHz if needed (controlled by inputSampleRate in constructor)
  • Add audio to history ringbuffer (controlled by HistoryFrames and FrameSamples in constructor)
  • Energy threshold calculation (enabled by EnergyTresholdVAD in constructor)
  • Automatic VAD Start/StopContext control (enabled by EnergyTresholdVAD.ControlListening = true)
  • OUT: Send utterances to files (enabled by SaveToFolder = "folder" in constructor)
  • OUT: Send utterances to Speechly cloud SLU (controlled by UseCloudProcessing = true in constructor)

SpeechlyClient now prefers constant audio streams

SpeechlyClient for .NET/Unity is refactored to constantly process audio chunks with ProcessAudio(float[] inputSamples, start, length). This enables it to automatically control listening using a VAD implementation.

StartContext/StopContext calls can be used to control listening when VAD is not in use. However, it's still recommended to stream audio constantly. Old way of streaming only after StartContext() works, but is discouraged.

Adaptive energy threshold VAD controls hands-free listening

SpeechlyClient comes with optional adaptive energy threshold VAD. It is configured with minimum energy level, a signal-to-noise ratio, minimum activation time and an activation/deactivation treshold (ratio of loud to silent frames). When enough loud frames have been detected, VAD activates and calls StartContext automatically. When enough silent frames have been detected, the VAD deactivates after the sustain time and StopContext is called automatically. The background noise energy gradually adapts when VAD is not active.

History buffer captures the beginning of utterances

SpeechlyClient maintains a configurable ringbuffer of recent audio frames. The size of the history is determined by historyFrames, each containing frameSamples. History is sent upon StartContext to capture the start of utterance which especially important with VAD, which activates with a constant delay.

All SpeechlyClient features can be dry-run on the command line with dotnet

Only the microphone implementation (MicToSpeechly.cs) is Unity-specific. SpeechlyClient features can be run with prerecorded audio on the command line with little setup. Some command line example setups can tried in speechly-client-net-standard-2.0/ folder:

  • dotnet run test processes an example file, sends to Speechly cloud SLU and prints the received results in console.
  • dotnet run vad processes an example file, sends the utterances audio to files in temp/ folder as 16 bit raw and creates an utterance timestamp .tsv (tab-separated values) for each audio file processed.
  • dotnet run vad myaudiofiles/*.raw processes a set of files with VAD.

New functions and callbacks

  • StartStream should be called at start of a continuous audio stream. It resets the stream sample counters and history. For backwards compability, ProcessAudio and StartContext ensure it's been called.
  • StopStream should be called at the end of a continuous audio stream.
  • OnStreamStart is triggered upon a call to StartStream
  • OnStreamStop is triggered upon a call to StopStream
  • OnContextStart is triggered upon a call to StartContext. This can be used to track VAD activation.
  • OnContextStop is triggered upon a call to StopContext. This can be used to track VAD deactivation.

dotnet run vad logging (see previous chapter) is implemented with OnStreamStart, OnStreamStop, OnContextStart and OnContextStop callbacks. These can be also used to log usage metrics.

IDecoder interface

CloudDecoder is now an optional module, passed with Initialize(). It implements a IDecoder interface.

Death of ClientState

Removed ClientState that was a 1-to-1 port from browser-client. It did not reflect the internal workings of Unity SpeechlyClient as there are several differences:

  • Initializing SLU engine is an optional step, rendering Connecting and Preinitialized states optional. If you use SLU, you can do await Initialize(new CloudDecoder(...)); or check the state of new IsReady boolean.
  • Microphone implementation is external to Unity SpeechlyClient. Thus SpeechlyClient has no knowledge of that rendering and Initializing state useless.
  • IsListening boolean has replaced Starting/Recording/Stopping states both in browser-client and Unity SpeechlyClient.

Why

To enable use of all SpeechlyClient features with any audio source, not just the microphone.

  • Help connect SpeechlyClient to existing audio stacks that already use the microphone, e.g. Vivox
  • Enable batch processing files, e.g. to test performance of SpeechlyClient features like VAD.
  • Abstract decoder interface to enable plugging in a OnDeviceDecoder (not part of this PR) instead of a CloudDecoder

arzga and others added 17 commits March 25, 2022 09:41
… Preliminarily added UseCouldSpeechProcessing flag.
…tch processing. a .tsv (tab separated values) file is written if logUtteranceFolder parameter is provided to SpeechlyClient initializer. BeginStream and EndStream calls wrap Start/StopContext to ensure log file is written and to provide a meaningful stream identifies, e.g. an audio file name.
…tartContext task wasn't completed until next update round and resulted in thread freeze.
…veraging new OnStreamStart/Stop and OnContextStart/Stop callbacks.
@arzga arzga changed the title Feature/dotnet vad New SpeechlyClient audio pipeline Apr 5, 2022
@arzga arzga changed the title New SpeechlyClient audio pipeline New .NET/Unity SpeechlyClient audio pipeline Apr 5, 2022
@arzga arzga marked this pull request as ready for review April 5, 2022 06:10
arzga added 10 commits April 7, 2022 17:04
…both of which implement the IDecoder interface.
…calls are now immediate and any async functionality goes thru websocket (WS) send queue. Introduced WS Send task. More graceful dispmantling of WS. Debug print flag is honoured by decoders.
…d them). Removed async signatures from methods that no longer need them.
…oning of Unity client too well. Added IsReady to check the status of SLU Decoder instead.
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 10:47 Inactive
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 11:06 Inactive
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 11:23 Inactive
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 11:56 Inactive
…ed some fields and classes internal. Regenerated docs.
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 20:14 Inactive
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 20:15 Inactive
@github-pages github-pages bot temporarily deployed to github-pages April 12, 2022 20:24 Inactive
…ucted from ApiUrl, which is now passed without path.
@github-pages github-pages bot temporarily deployed to github-pages April 14, 2022 07:48 Inactive
@arzga arzga merged commit 3765c08 into main Apr 14, 2022
@arzga arzga deleted the feature/dotnet-vad branch April 14, 2022 07:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants