Refactor microphone pipeline to remove unnecessary dependencies and i…

…mprove overall logic #785
neuml · Sep 27, 2024 · ea730ff · ea730ff
1 parent 05fe040
commit ea730ff
Show file tree

Hide file tree

Showing 9 changed files with 275 additions and 84 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -34,7 +34,7 @@ jobs:
           java-version: "8"
 
       - name: Install dependencies - Linux
-        run: sudo apt-get update && sudo apt-get install libsndfile1 portaudio19-dev
+        run: sudo apt-get update && sudo apt-get install libsndfile1 portaudio19
         if: matrix.os == 'ubuntu-latest'
 
       - name: Install dependencies - macOS

diff --git a/docker/base/Dockerfile b/docker/base/Dockerfile
@@ -21,7 +21,7 @@ ENV LANG=C.UTF-8
 RUN \
     # Install required packages
     apt-get update && \
-    apt-get -y --no-install-recommends install libgomp1 libsndfile1 portaudio19-dev gcc g++ python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip && \
+    apt-get -y --no-install-recommends install libgomp1 libsndfile1 portaudio19 gcc g++ python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip && \
     rm -rf /var/lib/apt/lists && \
     \
     # Install txtai project and dependencies

diff --git a/docs/install.md b/docs/install.md
@@ -130,13 +130,13 @@ Additional environment specific prerequisites are below.
 
 ### Linux
 
-The AudioStream and Microphone pipelines require the [PortAudio](https://people.csail.mit.edu/hubert/pyaudio) system library. The Transcription pipeline requires the [SoundFile](https://github.com/bastibe/python-soundfile#installation) system library.
+The AudioStream and Microphone pipelines require the [PortAudio](https://python-sounddevice.readthedocs.io/en/0.5.0/installation.html) system library. The Transcription pipeline requires the [SoundFile](https://github.com/bastibe/python-soundfile#installation) system library.
 
 ### macOS
 
 Older versions of Faiss have a runtime dependency on `libomp` for macOS. Run `brew install libomp` in this case.
 
-The AudioStream and Microphone pipelines require the [PortAudio](https://people.csail.mit.edu/hubert/pyaudio) system library.
+The AudioStream and Microphone pipelines require the [PortAudio](https://python-sounddevice.readthedocs.io/en/0.5.0/installation.html) system library. Run `brew install portaudio`.
 
 ### Windows
 

diff --git a/docs/pipeline/audio/texttospeech.md b/docs/pipeline/audio/texttospeech.md
@@ -15,6 +15,16 @@ from txtai.pipeline import TextToSpeech
 # Create and run pipeline
 tts = TextToSpeech()
 tts("Say something here")
+
+# Stream audio - incrementally generates snippets of audio
+yield from tts(
+  "Say something here. And say something else",
+  streaming=True
+)
+
+# Generate audio using a speaker id
+tts = TextToSpeech("neuml/vctk-vits-onnx")
+tts("Say something here", speaker=42)
 ```
 
 See the link below for a more detailed example.
@@ -27,6 +37,7 @@ This pipeline is backed by ONNX models from the Hugging Face Hub. The following
 
 - [ljspeech-jets-onnx](https://huggingface.co/NeuML/ljspeech-jets-onnx)
 - [ljspeech-vits-onnx](https://huggingface.co/NeuML/ljspeech-vits-onnx)
+- [vctk-vits-onnx](https://huggingface.co/NeuML/vctk-vits-onnx)
 
 ## Configuration-driven example
 

diff --git a/setup.py b/setup.py
@@ -56,9 +56,7 @@
 extras["pipeline-audio"] = [
     "onnx>=1.11.0",
     "onnxruntime>=1.11.0",
-    "pyaudio>=0.2.14",
     "scipy>=1.4.1",
-    "speechrecognition>=3.10.4",
     "sounddevice>=0.5.0",
     "soundfile>=0.10.3.post1",
     "ttstokenizer>=1.0.0",

diff --git a/src/python/txtai/pipeline/audio/audiostream.py b/src/python/txtai/pipeline/audio/audiostream.py
@@ -12,7 +12,7 @@
     import sounddevice as sd
 
     SOUNDDEVICE = True
-except ImportError:
+except (ImportError, OSError):
     SOUNDDEVICE = False
 
 from ..base import Pipeline
@@ -35,7 +35,7 @@ def __init__(self, rate=22050):
         """
 
         if not SOUNDDEVICE:
-            raise ImportError('AudioStream pipeline is not available - install "pipeline" extra to enable')
+            raise ImportError("SoundDevice library not installed or portaudio library not found")
 
         # Sampler rate
         self.rate = rate