This repository contains a simple service that receives audio data from clients, and serves the results of Mozilla DeepSpeech inference over a websocket. The server code in this project is a modified version of this GitHub project.
Because STT transcriptions can typically be considered "long running tasks", using websockets for client-server communication provides several benefits:
- Avoids all sorts of timeouts at several points in the path - for example at the client, server, load balancer and/or proxy, etc.
- Avoids the need for the client to poll the server for result, as well as avoids the complexity that is typically induced by a polling-based architecture
Server configuration is specified in the application.conf
file.
Make sure your model and scorer files are present in the same directory as the application.conf
file. Then execute:
python -m deepspeech_server.app
The client-server request-response process looks like the following:
- Client opens websocket W to server
- Client sends binary audio data via W
- Server responds with transcribed text via W once transcription process is completed. The server's response is in JSON format.
- Server closes W
The time t taken by the transcription process depends on several factors, such as the duration of the audio, how busy the service is, etc. Under normal circumstances, t is roughly the same as the duration of the provided audio.
Because this service uses websockets, it is currently not possible to interact with it using certain HTTP clients
which do not support websockets, like curl
. The following example uses the
Python websocket-client
package.
import websocket
ws = websocket.WebSocket()
ws.connect("ws://localhost:8080/api/v1/stt")
with open("audiofile.wav", mode='rb') as file: # b is important -> binary
audio = file.read()
ws.send_binary(audio)
result = ws.recv()
print(result) # Print text transcription received from server
Example output:
{"text": "experience proves this", "time": 2.4083645999999987}
The helm directory contains an example Helm deployment, that configures an Nginx ingress to expose the DeepSpeech service. The websocket timeout on the ingress is set to 1 hour.
Bug reports and merge requests are welcome.
pylint deepspeech_server
To run tests without coverage, execute:
python -m pytest tests/test_app.py
To run tests with coverage, and to print coverage to the terminal and write a coverage report, execute:
python -m pytest -p pytest_cov --cov=deepspeech_server --cov-report=xml --cov-report=term \
--junitxml=pytest-report.xml tests/test_app.py