EMS (External Media Server)

EMS is a service, that enriches your calls with:

Speech recognition
Speech synthesis
Audio playback using WebSocket API
Call termination using WebSocket API
Dynamic speech recognition service configuration

You can use it only in pair with some AudioSocket client, as EMS itself only receives extracted call audio and sends audio back for playback.

Currently, only one client has a support for AudioSocket protocol - Asterisk.

Relevant Asterisk documentation on configuration of external media and AudioSocket protocol can be found here:

While EMS can work fully standalone relying only on AudioSocket client, features like Text-to-Speech or Speech-to-Text require additional connection from your application via WebSocket. With configured WebSocket stream EMS can send you speech transcriptions and play text that you send via connection.

Currently, only Google Cloud services are supported - Cloud Speech-to-Text and Cloud Text-to-Speech. If you have an implementation for similar services of other cloud providers - PRs are always appreciated.

Build

EMS uses rustup to manage the Rust toolchain, so ensure that you have it installed.

$ git clone https://github.com/ivan770/ems
$ cd ems
$ cargo build --release
$ ./target/release/ems --help

Usage

To start using EMS you have to configure a server via ./ems.toml file (you can change path via --config flag):

audiosocket_addr = "0.0.0.0:12345"
websocket_addr = "0.0.0.0:12346"

recognition_driver = "google"
synthesis_driver = "google"

[gcs]
service_account_path = "./my_gcs_key.json"

[gctts]
service_account_path = "./my_gctts_key.json"

Then, launch EMS with ./ems.

Configuration

Config provided above uses recommended parameters for optional keys. All available keys and their default values are provided below:

Name	Value	Description
threads	All CPU threads	A max number of threads EMS can use in runtime
audiosocket_addr		Required. Address on which EMS should listen for incoming AudioSocket messages
websocket_addr		Required. Address on which EMS should listen for incoming WebSocket streams
message_timeout	3	Max amount of seconds that EMS can wait between new AudioSocket messages. If elapsed time is greater than provided value, then EMS will close AudioSocket connection
recognition_config_timeout	3	Max amount of seconds that EMS can wait for speech recognition config. If elapsed time is greater than provided value, then EMS will use the config provided in `ems.toml`.
recognition_driver		Speech recognition driver that EMS can use for call transcription generation. Supported values: `google`
synthesis_driver		Speech synthesis driver that EMS can use for call voice synthesis. Supported values: `google`
gcs		Google Cloud Text-to-Speech configuration
gctts		Google Cloud Speech-to-Text configuration
recognition_fallback		Speech recognition fallback config. Used in case if WebSocket client missed opportunity to send recognition config. If empty, `en-US` config is used as a fallback.
loopback_audio	false	Send all received audio back

WebSocket API

Every WebSocket message requires you to provide UUID of call. Usually, you specify this UUID when registering external media server.

Requests

Terminate call:

{
    "id": "00000000-0000-0000-0000-000000000000",
    "data": "hangup"
}

Synthesize speech:

gender, speaking_rate and pitch are optional.

{
    "id": "00000000-0000-0000-0000-000000000000",
    "data": {
        "synthesize": {
            "ssml": "Hello, world",
            "language_code": "en-US",
            "gender": "neutral",
            "speaking_rate": 2,
            "pitch": 1
        }
    }
}

Speech recognition config:

profanity_filter and punctuation are optional.

{
    "id": "00000000-0000-0000-0000-000000000000",
    "data": {
        "recognitionConfig": {
            "language": "en-US",
            "profanity_filter": false,
            "punctuation": false
        }
    }
}

Responses

Call transcription:

{
    "id": "00000000-0000-0000-0000-000000000000",
    "data": {
        "transcription": "Hello, world"
    }
}

Recognition config request:

{
    "id": "00000000-0000-0000-0000-000000000000",
    "data": "recognitionConfigRequest"
}

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain		rust-toolchain
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMS (External Media Server)

Build

Usage

Configuration

WebSocket API

Requests

Responses

About

Releases 2

Packages

Languages

License

ivan770/ems

Folders and files

Latest commit

History

Repository files navigation

EMS (External Media Server)

Build

Usage

Configuration

WebSocket API

Requests

Responses

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages