This project provides system-wide shared Speech to Text Engine to:
RealTime transcription - starts recording when a speaker says something, stops recording if speaker stops speaking for 0.5sec.
Google Anything - start by saying google followed by 'your_search_query' (ex: Google what's the weather outside?)
python 3.9.9
Nodejs 18.9
OS: Linux (very likely it will work on OSX without any tweaks. On Windows bash scripts(in ./universal-commands/scripts and anywhere in src) will have to converted into batch scripts)
System Ram : 8Gb (2x4Gb) [Recommended > 16Gb]
Graphic card : Nvidia Graphics MX350(Pascal Architecture, CUDA capability 6.1, VRAM 2GB)
Microphone : External Bluetooth Headeset with Microphone Arm (Recommended). Avoid using in-built microphone of your laptop.
git clone git@github.com:UmangRajpara13/able.git
cd ./able/listen
python3 -m venv venv
source "venv/bin/activate"
pip install -r requirements.txt
deactivate
cd ..
npm install
The project uses Whisper by OpenAI which requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
npm run engine
npm run listen
Avoid using in-built microphone of your laptop, External headset with Microphone is recommended