This repository showcases how to easily create a real-time voice assistant using Twilio and OpenAI. Whether you're new to AI voice assistants or looking for a scalable solution, this project will guide you through every step.
The assistant leverages OpenAI's Retrieval-Augmented Generation (RAG) and integrates with Twilio for seamless voice communication.
- Real-time Voice Interaction: Talk to your assistant and get instant, AI-generated responses.
- Easy Integration: Quickly connect Twilio with OpenAI's Realtime API.
- Scalable: Use this project as a base to create more sophisticated assistants.
- Customizable: Tweak responses, voice prompts, and more to suit your needs.
- JSON-based Database: Simulate orders, product inventories, and shipping statuses with a simple JSON database.
- Dummy Functions: This version connects to dummy functions, designed for simulation and testing. You are encouraged to modify and extend these functions to connect them to your actual services or documentation. This flexibility allows you to integrate real-world systems like order management, inventory checks, and shipping status updates.
The Realtime API enables you to build low-latency, multi-modal conversational experiences. It supports both text and audio as input and output, while also enabling tool calling, making it versatile for real-time, interactive applications.
- Native speech-to-speech: The API operates without text as an intermediary, reducing latency and delivering more nuanced, natural output.
- Natural, steerable voices: The models can produce natural inflections, including features like laughing and whispering, while adhering to specific tonal directions.
- Simultaneous multimodal output: While text can be useful for moderation or logging, the audio is faster-than-realtime, ensuring stable playback.
This API is websocket-based, marking the first time OpenAI has published an API capable of sending and receiving audio in real time. It's designed to provide developers with a seamless way to build conversational applications that require instant responses.
- Beta Stage: The Realtime API is currently in beta and does not offer client-side authentication. For security, audio must be relayed to a server to authenticate securely.
- Network Sensitivity: Real-time audio experiences can be affected by network conditions, especially when delivering audio reliably to a server. This makes production-scale use challenging in client-side or telephony applications where network conditions may vary.
For production use, especially in environments where network reliability is unpredictable, it is recommended to evaluate purpose-built third-party solutions or integrate with trusted partners, as listed by OpenAI.
dummy_db/
: Contains JSON files simulating a database of orders, products, and shipping statuses.helpers/
: Utility functions for reading files, handling Twilio interactions, and voice prompts.routers/
: API routes for the app, such as streaming voice responses.services/
: Contains the OpenAI interaction logic.tools/
: Specific tools for checking stock, shipping status, and more..env
: Store your environment variables (API keys, secrets, etc.).app.py
: The main application file.requirements.txt
: Project dependencies.
Before you begin, ensure you have the following:
- A Twilio account (Sign up here).
- An OpenAI API key (Get it here).
- Python 3.x installed on your machine.
- Set up a virtual environment for the project.
-
Create a Twilio Phone Number:
- Log into your Twilio account.
- Navigate to Phone Numbers from the console dashboard.
- Click on Buy a Number and choose a number with voice capabilities.
- Once purchased, configure this number to route incoming calls to your application.
-
Modify the Webhook Endpoint:
- Go to the Phone Numbers section in your Twilio console.
- Select the number you just purchased.
- Scroll down to the Voice & Fax section.
- In the A Call Comes In field, set the Webhook URL to the endpoint that connects to your service. For example: https://your-domain.com/stream/incoming-call
- This URL should point to your FastAPI application, specifically the
/stream/incoming-call
route that will handle the incoming Twilio calls.
-
Set up the WEBSOCKET_URL:
- In your
.env
file, you need to define theWEBSOCKET_URL
. This is the URL where Twilio will establish a WebSocket connection to stream the voice call to your service. - Example: WEBSOCKET_URL=your-domain.com
- Make sure this URL is publicly accessible.
- In your
If you want to run the service locally, you will need to create a public endpoint using port forwarding. Here’s how you can do this in Visual Studio:
-
Set up the port forwarding in Visual Studio:
- Open your project in Visual Studio.
- Go to Project > Properties > Debug.
- In the Web Server Settings, make sure the App URL is set to the port defined in your
.env
file (default is 5000).
-
Create a public port forward:
- To make your local FastAPI service publicly accessible, you'll need to use a tool like ngrok or Visual Studio's Port Forwarding feature.
- You can follow this guide on Youtube to achieve this: Visual Studio Port Forwarding.
-
Example
.env
configuration for local development
VOICE = 'echo'
OPENAI_API_KEY = 'your_openai_key'
WEBSOCKET_URL= 'your_ngrok_url'
PORT=5000
-
Clone this repository:
git clone https://github.com/ericrisco/twilio-realtime-openai-rag cd repo-name
-
Install the dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the root directory and add your API keys:VOICE = 'echo' OPENAI_API_KEY = 'your_openai_key' WEBSOCKET_URL= 'your_ngrok_url' PORT=5000
-
Run the app:
python app.py
-
Twilio Voice Setup: To connect your Twilio account to this project:
- Create a Twilio phone number.
- Configure the webhook to point to your app's
/stream/incoming-call
route. - Set up a Twilio TwiML app to handle incoming voice calls.
-
Making a Call: Call your Twilio number, and the voice assistant will interact with you using OpenAI's AI model, responding in real-time.
-
Sample Commands: You can ask questions like:
- "What's the status of my order?"
- "Do you have this Whey Protein in stock?"
Endpoint | Description | Method |
---|---|---|
/stream/incoming-call |
Handles incoming calls from Twilio | GET/POST |
/stream/websocket |
Establishes a WebSocket connection for streaming | WebSocket |
The main entry point for the FastAPI application, handling the configuration of routers and loading environment variables. It utilizes Uvicorn to serve the app on a specified port.
Handles incoming calls and WebSocket connections for real-time media streaming between Twilio and OpenAI. The connection to OpenAI is established via WebSocket for real-time processing.
read_json_file.py
: Reads JSON files asynchronously from thedummy_db
folder to simulate databases.twilio.py
: Manages the response for Twilio, directing it to connect via WebSocket to the OpenAI model.voice_system_prompt.py
: Contains the system prompt for OpenAI, defining the persona and role of the assistant (in this case, a friendly assistant named "CBum" working at MuscleBoost).
openai_functions.py
: Handles interactions with OpenAI, including generating a welcome message, updating the session with the assistant's configuration, and sending responses in audio format.
execute_tool.py
: Maps and executes the appropriate tool based on the user's request, such as searching for a product or checking an order's shipping status.process_order_tool.py
: Processes the status of a given order by ID.search_product_tool.py
: Searches for a product by name in the simulated product catalog.check_shipping_status_tool.py
: Checks the shipping status of an order based on a tracking number.check_stock_tool.py
: Verifies the stock availability of a specific product in the catalog.
Contributions are welcome! Feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.