-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: Add Camb.ai TTS plugin #4442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
b42239a
feat: Add Camb.ai TTS plugin for LiveKit Agents
eRuaro d91f98e
chore: remove example files from camb plugin
eRuaro fcef0df
feat(camb): make chunk_size configurable for streaming
eRuaro 07e6f3d
feat(camb): update speech models to mars-flash, mars-pro, mars-instruct
eRuaro ead8a61
docs(camb): update model names to mars-flash, mars-pro, mars-instruct
eRuaro df4cdcd
chore(camb): update default voice ID to 147320
eRuaro f4128ab
refactor(camb): migrate TTS to use official camb-sdk
eRuaro 95931f6
fix(camb): fixed type errors
eRuaro ba328c3
docs(camb): remove speed param from README
eRuaro 9b00e41
refactor(camb): simplify list_voices to return dicts, remove unused t…
eRuaro 4749186
fix: use 48kHz sample rate and pass to SDK output config
eRuaro d9d4d5f
fix: use model-specific sample rates (mars-flash/instruct 22.05kHz, m…
eRuaro 6bbed13
docs: update sample rates
eRuaro a9a5bd9
updated PR based on CodeRabbit changes
eRuaro f26b31c
fix(camb): proper httpx client cleanup without SDK context manager
eRuaro 4f26b54
switched from sdk to http
eRuaro b40b851
always require api key - for now
eRuaro ada232d
added camb to pyproject.toml
eRuaro 780f803
Merge branch 'main' into feature/camb-ai-tts-plugin
davidzhao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,239 @@ | ||
| # Camb.ai Plugin for LiveKit Agents | ||
|
|
||
| Text-to-Speech plugin for [Camb.ai](https://camb.ai) TTS API, powered by MARS technology. | ||
|
|
||
| ## Features | ||
|
|
||
| - High-quality neural text-to-speech with MARS series models | ||
| - Multiple model variants (mars-flash, mars-pro) | ||
| - Enhanced pronunciation for names and places | ||
| - Support for 140+ languages | ||
| - Real-time HTTP streaming | ||
| - Pre-built voice library | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| pip install livekit-plugins-camb | ||
| ``` | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| You'll need a Camb.ai API key. Set it as an environment variable: | ||
|
|
||
| ```bash | ||
| export CAMB_API_KEY=your_api_key_here | ||
| ``` | ||
|
|
||
| Or obtain it from [Camb.ai Studio](https://studio.camb.ai/public/onboarding). | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```python | ||
| import asyncio | ||
| from livekit.plugins.camb import TTS | ||
|
|
||
| async def main(): | ||
| # Initialize TTS (uses CAMB_API_KEY env var) | ||
| tts = TTS() | ||
|
|
||
| # Synthesize speech | ||
| stream = tts.synthesize("Hello from Camb.ai!") | ||
| audio_frame = await stream.collect() | ||
|
|
||
| # Save to file | ||
| with open("output.wav", "wb") as f: | ||
| f.write(audio_frame.to_wav_bytes()) | ||
|
|
||
| asyncio.run(main()) | ||
| ``` | ||
|
|
||
| ## List Available Voices | ||
|
|
||
| ```python | ||
| import asyncio | ||
| from livekit.plugins.camb import list_voices | ||
|
|
||
| async def main(): | ||
| voices = await list_voices() | ||
| for voice in voices: | ||
| print(f"{voice['name']} ({voice['id']}): {voice['gender']}, {voice['language']}") | ||
|
|
||
| asyncio.run(main()) | ||
| ``` | ||
|
|
||
| ## Select a Specific Voice | ||
|
|
||
| ```python | ||
| tts = TTS(voice_id=147320) | ||
| stream = tts.synthesize("Using a specific voice!") | ||
| ``` | ||
|
|
||
| ## Model Selection | ||
|
|
||
| Camb.ai offers multiple MARS models for different use cases: | ||
|
|
||
| ```python | ||
| # Faster inference, 22050 Hz (default) | ||
| tts = TTS(model="mars-flash") | ||
|
|
||
| # Higher quality, 48000 Hz | ||
| tts = TTS(model="mars-pro") | ||
| ``` | ||
|
|
||
| ## Advanced Configuration | ||
|
|
||
| ```python | ||
| tts = TTS( | ||
| api_key="your-api-key", # Or use CAMB_API_KEY env var | ||
| voice_id=147320, # Voice ID from list-voices | ||
| language="en-us", # BCP-47 locale | ||
| model="mars-pro", # MARS model variant | ||
| output_format="pcm_s16le", # Audio format | ||
| enhance_named_entities=True, # Better pronunciation for names/places | ||
| ) | ||
| ``` | ||
|
|
||
| ## Usage with LiveKit Agents | ||
|
|
||
| ```python | ||
| from livekit import agents | ||
| from livekit.plugins.camb import TTS | ||
|
|
||
| async def entrypoint(ctx: agents.JobContext): | ||
| # Connect to room | ||
| await ctx.connect() | ||
|
|
||
| # Initialize TTS | ||
| tts = TTS(language="en-us") | ||
|
|
||
| # Synthesize and publish | ||
| stream = tts.synthesize("Hello from LiveKit with Camb.ai!") | ||
| audio_frame = await stream.collect() | ||
|
|
||
| # Publish to room | ||
| source = agents.AudioSource(tts.sample_rate, tts.num_channels) | ||
| track = agents.LocalAudioTrack.create_audio_track("tts", source) | ||
| await ctx.room.local_participant.publish_track(track) | ||
| await source.capture_frame(audio_frame) | ||
| ``` | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| ### TTS Constructor Parameters | ||
|
|
||
| - **api_key** (str | None): Camb.ai API key | ||
| - **voice_id** (int): Voice ID to use (default: 147320) | ||
| - **language** (str): BCP-47 locale (default: "en-us") | ||
| - **model** (SpeechModel): MARS model variant (default: "mars-flash") | ||
| - **output_format** (OutputFormat): Audio format (default: "pcm_s16le") | ||
| - **enhance_named_entities** (bool): Enhanced pronunciation (default: False) | ||
| - **sample_rate** (int | None): Audio sample rate (auto-detected from model if None) | ||
| - **base_url** (str): API base URL | ||
| - **http_session** (httpx.AsyncClient | None): Reusable HTTP session | ||
|
|
||
| ### Available Models | ||
|
|
||
| - **mars-flash**: Faster inference, 22050 Hz (default) | ||
| - **mars-pro**: Higher quality synthesis, 48000 Hz | ||
|
|
||
| ### Output Formats | ||
|
|
||
| - **pcm_s16le**: 16-bit PCM (recommended for streaming) | ||
| - **pcm_s32le**: 32-bit PCM (highest quality) | ||
| - **wav**: WAV with headers | ||
| - **flac**: Lossless compression | ||
| - **adts**: ADTS streaming format | ||
|
|
||
| ## API Reference | ||
|
|
||
| ### TTS Class | ||
|
|
||
| Main text-to-speech interface. | ||
|
|
||
| **Methods:** | ||
| - `synthesize(text: str) -> ChunkedStream`: Synthesize text to speech | ||
| - `update_options(**kwargs)`: Update voice settings dynamically | ||
| - `aclose()`: Clean up resources | ||
|
|
||
| **Properties:** | ||
| - `model` (str): Current MARS model name | ||
| - `provider` (str): Provider name ("Camb.ai") | ||
| - `sample_rate` (int): Audio sample rate (22050 or 48000 Hz depending on model) | ||
| - `num_channels` (int): Number of audio channels (1) | ||
|
|
||
| ### list_voices Function | ||
|
|
||
| ```python | ||
| async def list_voices( | ||
| api_key: str | None = None, | ||
| base_url: str = "https://client.camb.ai/apis", | ||
| ) -> list[dict] | ||
| ``` | ||
|
|
||
| Returns list of voice dicts with: id, name, gender, age, language. | ||
|
|
||
| ## Multi-Language Support | ||
|
|
||
| Camb.ai supports 140+ languages. Specify using BCP-47 locales: | ||
|
|
||
| ```python | ||
| # French | ||
| tts = TTS(language="fr-fr", voice_id=...) | ||
|
|
||
| # Spanish | ||
| tts = TTS(language="es-es", voice_id=...) | ||
|
|
||
| # Japanese | ||
| tts = TTS(language="ja-jp", voice_id=...) | ||
| ``` | ||
|
|
||
| ## Dynamic Options | ||
|
|
||
| Update TTS settings without recreating the instance: | ||
|
|
||
| ```python | ||
| tts = TTS() | ||
|
|
||
| # Change voice | ||
| tts.update_options(voice_id=12345) | ||
|
|
||
| # Change model | ||
| tts.update_options(model="mars-pro") | ||
| ``` | ||
|
|
||
| ## Error Handling | ||
|
|
||
| The plugin handles errors according to LiveKit conventions: | ||
|
|
||
| ```python | ||
| from livekit.agents import APIStatusError, APIConnectionError, APITimeoutError | ||
|
|
||
| try: | ||
| stream = tts.synthesize("Hello!") | ||
| audio = await stream.collect() | ||
| except APIStatusError as e: | ||
| print(f"API error: {e.status_code} - {e.message}") | ||
| except APIConnectionError as e: | ||
| print(f"Connection error: {e}") | ||
| except APITimeoutError as e: | ||
| print(f"Request timed out: {e}") | ||
| ``` | ||
|
|
||
| ## Future Features | ||
|
|
||
| Coming soon: | ||
| - GCP Vertex AI integration | ||
| - Voice cloning via custom voice creation | ||
| - Voice generation from text descriptions | ||
| - WebSocket streaming for real-time applications | ||
|
|
||
| ## Links | ||
|
|
||
| - [Camb.ai Documentation](https://docs.camb.ai/) | ||
| - [LiveKit Agents Documentation](https://docs.livekit.io/agents/) | ||
| - [GitHub Repository](https://github.com/livekit/agents) | ||
|
|
||
| ## License | ||
|
|
||
| Apache License 2.0 | ||
97 changes: 97 additions & 0 deletions
97
livekit-plugins/livekit-plugins-camb/livekit/plugins/camb/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # Copyright 2023 LiveKit, Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import os | ||
|
|
||
| import aiohttp | ||
|
|
||
| from livekit.agents import APIStatusError, Plugin | ||
|
|
||
| from .log import logger | ||
| from .tts import API_BASE_URL, API_KEY_HEADER, TTS | ||
| from .version import __version__ | ||
|
|
||
| # Gender mapping from API integer to string | ||
| GENDER_MAP = {0: "Not Specified", 1: "Male", 2: "Female", 9: "Not Applicable"} | ||
|
|
||
|
|
||
| async def list_voices( | ||
| *, | ||
| api_key: str | None = None, | ||
| base_url: str = API_BASE_URL, | ||
| ) -> list[dict]: | ||
| """ | ||
| List available voices from Camb.ai. | ||
|
|
||
| Args: | ||
| api_key: Camb.ai API key (or use CAMB_API_KEY env var). | ||
| base_url: API base URL. | ||
|
|
||
| Returns: | ||
| List of voice dicts with id, name, gender, age, language. | ||
|
|
||
| Raises: | ||
| ValueError: If no API key provided. | ||
| APIStatusError: If API request fails. | ||
| """ | ||
| api_key = api_key or os.environ.get("CAMB_API_KEY") | ||
| if not api_key: | ||
| raise ValueError("api_key required (or set CAMB_API_KEY environment variable)") | ||
|
|
||
| async with aiohttp.ClientSession() as session: | ||
| async with session.get( | ||
| f"{base_url}/list-voices", | ||
| headers={API_KEY_HEADER: api_key}, | ||
| ) as resp: | ||
| if resp.status != 200: | ||
| content = await resp.text() | ||
| raise APIStatusError( | ||
| f"Failed to list voices: {content}", | ||
| status_code=resp.status, | ||
| ) | ||
|
|
||
| voice_list = await resp.json() | ||
| voices = [] | ||
|
|
||
| for voice in voice_list: | ||
| voice_id = voice.get("id") | ||
| if voice_id is None: | ||
| continue | ||
|
|
||
| gender_int = voice.get("gender") | ||
| gender = GENDER_MAP.get(gender_int) if gender_int is not None else None | ||
|
|
||
| voices.append( | ||
| { | ||
| "id": voice_id, | ||
| "name": voice.get("voice_name", ""), | ||
| "gender": gender, | ||
| "age": voice.get("age"), | ||
| "language": voice.get("language"), | ||
| } | ||
| ) | ||
|
|
||
| return voices | ||
|
|
||
|
|
||
| class CambPlugin(Plugin): | ||
| def __init__(self) -> None: | ||
| super().__init__(__name__, __version__, __package__, logger) | ||
|
|
||
|
|
||
| Plugin.register_plugin(CambPlugin()) | ||
|
|
||
| __all__ = ["TTS", "list_voices", "__version__"] |
17 changes: 17 additions & 0 deletions
17
livekit-plugins/livekit-plugins-camb/livekit/plugins/camb/log.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Copyright 2023 LiveKit, Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import logging | ||
|
|
||
| logger = logging.getLogger("livekit.plugins.camb") |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.