This is a small demo showing how to build an RTVI and Pipecat voice-to-voice AI bot. Small in two senses: I spent about an hour hacking this together, and it's super-simple code that doesn't use a front-end js framework, doesn't have any UI to speak of, and uses a Pipecat pipeline that I just pulled from the examples/foundational/
directory in the Pipecat repo and lightly customized.
The tech stack is:
- Deepgram speech-to-text
- Anthropic Claude 3.5 Sonnet LLM
- Cartesia voice
- Daily transport
You'll need to open two terminal windows and a web browser.
In the first terminal ...
cd simple-web
npm i
npm start
cd bot
python3.10 -m venv venv
source venv/bin/activate
pip install 'pipecat-ai[daily, anthropic, cartesia, openai, silero]'
pip install python-dotenv
In your bot
terminal run:
python pipeline.py
Now open the URL that npm start
printed out in a web browser. Click "start". In the bot terminal you should see DEBUG log lines printing out, showing the bot reacting to the client joining, and then generating LLM inference and TTS output. The web client should play out the audio.
The default prompt is just a simple "talk like a pirate" message.
To try tool calling and vision,
- Uncomment
enableCam: true
in the client, here - swap
tool_config.py
intopipeline.py
in place ofconfig.py
here - Restart the pipeline and rejoin from the web client.
This demo is just a single bot, that you manually start, waiting for a connection from a single client. It's easy to deploy this bot to the cloud -- it works exactly the same in the cloud as it does locally. But for an always-on service you'd need to also build a little load-balance/bot-runner back-end.
Here's an example showing a more complete RTVI service deployment to Modal
Daily also offers a "hosted Pipecat/RTVI" service, Daily Bots, which you are welcome to check out if you're interested in getting up and running with voice agents on infrastructure someone else maintains for you.