Skip to content

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

License

Notifications You must be signed in to change notification settings

mgonzs13/whisper_ros

Repository files navigation

whisper_ros

This repository provides a set of ROS 2 packages to integrate whisper.cpp into ROS 2 using audio_common 4.0.3. Besides, silero-vad is used to perform VAD (Voice Activity Detection).

License: MIT GitHub release Code Size Last Commit GitHub issues GitHub pull requests Contributors Python Formatter Check C++ Formatter Check

ROS 2 Distro Branch Build status Docker Image Documentation
Humble main Humble Build Docker Image Doxygen Deployment
Iron main Iron Build Docker Image Doxygen Deployment
Jazzy main Jazzy Build Docker Image Doxygen Deployment
Rolling main Rolling Build Docker Image Doxygen Deployment

Table of Contents

  1. Related Projects
  2. Installation
  3. Docker
  4. Usage
  5. Demos

Related Projects

  • chatbot_ros → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with YASMIN.

Installation

To run whisper_ros with CUDA, first, you must install the CUDA Toolkit.

cd ~/ros2_ws/src
git clone https://github.com/mgonzs13/audio_common.git
git clone https://github.com/mgonzs13/whisper_ros.git
pip3 install -r whisper_ros/requirements.txt
cd ~/ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

Docker

Build the whisper_ros docker. Additionally, you can choose to build whisper_ros with CUDA (USE_CUDA) and choose the CUDA version (CUDA_VERSION). Remember that you have to use DOCKER_BUILDKIT=0 to compile whisper_ros with CUDA when building the image.

DOCKER_BUILDKIT=0 docker build -t whisper_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

Run the docker container. If you want to use CUDA, you have to install the NVIDIA Container Tollkit and add --gpus all.

docker run -it --rm --gpus all whisper_ros

Usage

Run Silero for VAD and Whisper for STT:

ros2 launch whisper_bringup whisper.launch.py

Demos

Send a goal action to listen:

ros2 action send_goal /whisper/listen whisper_msgs/action/STT "{}"

Or try the example of a whisper client:

ros2 run whisper_demos whisper_demo_node