Shared Voice Interface

This project provides system-wide shared Speech to Text Engine to:

Add voice commands to any Desktop App (via Extension/Addon/Plugin ex.Code or integrate into your own app like TalkGPT).
Run your Automation scripts/command-line commands with personalized voice commands(uses a JSON file to link voice commands to programmatic execution).

Watch Demo

Built-In Features

RealTime transcription - starts recording when a speaker says something, stops recording if speaker stops speaking for 0.5sec.

Google Anything - start by saying google followed by 'your_search_query' (ex: Google what's the weather outside?)

Architecture

Software Environment

python 3.9.9
Nodejs 18.9
OS: Linux (very likely it will work on OSX without any tweaks. On Windows bash scripts(in ./universal-commands/scripts and anywhere in src) will have to converted into batch scripts)

Hardware Config used during Development and Execution

System Ram : 8Gb (2x4Gb) [Recommended > 16Gb]
Graphic card : Nvidia Graphics MX350(Pascal Architecture, CUDA capability 6.1, VRAM 2GB)
Microphone : External Bluetooth Headeset with Microphone Arm (Recommended). Avoid using in-built microphone of your laptop.

Installation

git clone git@github.com:UmangRajpara13/able.git
cd ./able/listen
python3 -m venv venv
source "venv/bin/activate"
pip install -r requirements.txt
deactivate
cd ..
npm install

The project uses Whisper by OpenAI which requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

Run

In 1st Terminal window

npm run engine

and in 2nd Terminal window.

npm run listen

Avoid using in-built microphone of your laptop, External headset with Microphone is recommended

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.vscode		.vscode
assets		assets
listen		listen
src		src
universal-commands		universal-commands
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TODO		TODO
TODO.txt		TODO.txt
listen.sh		listen.sh
native-commands.json		native-commands.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
stop_listening.sh		stop_listening.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shared Voice Interface

Watch Demo

Built-In Features

Architecture

Software Environment

Hardware Config used during Development and Execution

Installation

Run

In 1st Terminal window

and in 2nd Terminal window.

About

Releases

Packages

Languages

License

UmangRajpara13/Able

Folders and files

Latest commit

History

Repository files navigation

Shared Voice Interface

Watch Demo

Built-In Features

Architecture

Software Environment

Hardware Config used during Development and Execution

Installation

Run

In 1st Terminal window

and in 2nd Terminal window.

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages