Twilio Real-time OpenAI Voice Assistant

🚀 Build a Real-time Voice Assistant with Twilio and OpenAI

This repository showcases how to easily create a real-time voice assistant using Twilio and OpenAI. Whether you're new to AI voice assistants or looking for a scalable solution, this project will guide you through every step.

The assistant leverages OpenAI's Retrieval-Augmented Generation (RAG) and integrates with Twilio for seamless voice communication.

✨ Features

Real-time Voice Interaction: Talk to your assistant and get instant, AI-generated responses.
Easy Integration: Quickly connect Twilio with OpenAI's Realtime API.
Scalable: Use this project as a base to create more sophisticated assistants.
Customizable: Tweak responses, voice prompts, and more to suit your needs.
JSON-based Database: Simulate orders, product inventories, and shipping statuses with a simple JSON database.
Dummy Functions: This version connects to dummy functions, designed for simulation and testing. You are encouraged to modify and extend these functions to connect them to your actual services or documentation. This flexibility allows you to integrate real-world systems like order management, inventory checks, and shipping status updates.

💬 What is the OpenAI Realtime API?

The Realtime API enables you to build low-latency, multi-modal conversational experiences. It supports both text and audio as input and output, while also enabling tool calling, making it versatile for real-time, interactive applications.

Key Benefits of the Realtime API

Native speech-to-speech: The API operates without text as an intermediary, reducing latency and delivering more nuanced, natural output.
Natural, steerable voices: The models can produce natural inflections, including features like laughing and whispering, while adhering to specific tonal directions.
Simultaneous multimodal output: While text can be useful for moderation or logging, the audio is faster-than-realtime, ensuring stable playback.

This API is websocket-based, marking the first time OpenAI has published an API capable of sending and receiving audio in real time. It's designed to provide developers with a seamless way to build conversational applications that require instant responses.

Important Considerations

Beta Stage: The Realtime API is currently in beta and does not offer client-side authentication. For security, audio must be relayed to a server to authenticate securely.
Network Sensitivity: Real-time audio experiences can be affected by network conditions, especially when delivering audio reliably to a server. This makes production-scale use challenging in client-side or telephony applications where network conditions may vary.

For production use, especially in environments where network reliability is unpredictable, it is recommended to evaluate purpose-built third-party solutions or integrate with trusted partners, as listed by OpenAI.

🛠️ Project Structure

dummy_db/: Contains JSON files simulating a database of orders, products, and shipping statuses.
helpers/: Utility functions for reading files, handling Twilio interactions, and voice prompts.
routers/: API routes for the app, such as streaming voice responses.
services/: Contains the OpenAI interaction logic.
tools/: Specific tools for checking stock, shipping status, and more.
.env: Store your environment variables (API keys, secrets, etc.).
app.py: The main application file.
requirements.txt: Project dependencies.

📋 Prerequisites

Before you begin, ensure you have the following:

A Twilio account (Sign up here).
An OpenAI API key (Get it here).
Python 3.x installed on your machine.
Set up a virtual environment for the project.

Setting up Twilio

Create a Twilio Phone Number:
- Log into your Twilio account.
- Navigate to Phone Numbers from the console dashboard.
- Click on Buy a Number and choose a number with voice capabilities.
- Once purchased, configure this number to route incoming calls to your application.
Modify the Webhook Endpoint:
- Go to the Phone Numbers section in your Twilio console.
- Select the number you just purchased.
- Scroll down to the Voice & Fax section.
- In the A Call Comes In field, set the Webhook URL to the endpoint that connects to your service. For example: https://your-domain.com/stream/incoming-call
- This URL should point to your FastAPI application, specifically the /stream/incoming-call route that will handle the incoming Twilio calls.
Set up the WEBSOCKET_URL:
- In your .env file, you need to define the WEBSOCKET_URL. This is the URL where Twilio will establish a WebSocket connection to stream the voice call to your service.
- Example: WEBSOCKET_URL=your-domain.com
- Make sure this URL is publicly accessible.

Running Locally

If you want to run the service locally, you will need to create a public endpoint using port forwarding. Here’s how you can do this in Visual Studio:

Set up the port forwarding in Visual Studio:
- Open your project in Visual Studio.
- Go to Project > Properties > Debug.
- In the Web Server Settings, make sure the App URL is set to the port defined in your .env file (default is 5000).
Create a public port forward:
- To make your local FastAPI service publicly accessible, you'll need to use a tool like ngrok or Visual Studio's Port Forwarding feature.
- You can follow this guide on Youtube to achieve this: Visual Studio Port Forwarding.
Example .env configuration for local development

VOICE = 'echo'
OPENAI_API_KEY = 'your_openai_key'
WEBSOCKET_URL= 'your_ngrok_url'
PORT=5000

🛠️ Installation

Clone this repository:

git clone https://github.com/ericrisco/twilio-realtime-openai-rag
cd repo-name

Install the dependencies:
```
pip install -r requirements.txt
```

Create a .env file in the root directory and add your API keys:

 VOICE = 'echo'
 OPENAI_API_KEY = 'your_openai_key'
 WEBSOCKET_URL= 'your_ngrok_url'
 PORT=5000

Run the app:
```
python app.py
```

🔧 Usage

Twilio Voice Setup: To connect your Twilio account to this project:
- Create a Twilio phone number.
- Configure the webhook to point to your app's /stream/incoming-call route.
- Set up a Twilio TwiML app to handle incoming voice calls.
Making a Call: Call your Twilio number, and the voice assistant will interact with you using OpenAI's AI model, responding in real-time.
Sample Commands: You can ask questions like:
- "What's the status of my order?"
- "Do you have this Whey Protein in stock?"

📂 API Documentation

Endpoints

Endpoint	Description	Method
`/stream/incoming-call`	Handles incoming calls from Twilio	GET/POST
`/stream/websocket`	Establishes a WebSocket connection for streaming	WebSocket

🧩 Key Components

1. `app.py`

The main entry point for the FastAPI application, handling the configuration of routers and loading environment variables. It utilizes Uvicorn to serve the app on a specified port.

2. `stream.py`

Handles incoming calls and WebSocket connections for real-time media streaming between Twilio and OpenAI. The connection to OpenAI is established via WebSocket for real-time processing.

3. Helpers

read_json_file.py: Reads JSON files asynchronously from the dummy_db folder to simulate databases.
twilio.py: Manages the response for Twilio, directing it to connect via WebSocket to the OpenAI model.
voice_system_prompt.py: Contains the system prompt for OpenAI, defining the persona and role of the assistant (in this case, a friendly assistant named "CBum" working at MuscleBoost).

4. Services

openai_functions.py: Handles interactions with OpenAI, including generating a welcome message, updating the session with the assistant's configuration, and sending responses in audio format.

5. Tools

execute_tool.py: Maps and executes the appropriate tool based on the user's request, such as searching for a product or checking an order's shipping status.
process_order_tool.py: Processes the status of a given order by ID.
search_product_tool.py: Searches for a product by name in the simulated product catalog.
check_shipping_status_tool.py: Checks the shipping status of an order based on a tracking number.
check_stock_tool.py: Verifies the stock availability of a specific product in the catalog.

🤝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twilio Real-time OpenAI Voice Assistant

🚀 Build a Real-time Voice Assistant with Twilio and OpenAI

✨ Features

💬 What is the OpenAI Realtime API?

Key Benefits of the Realtime API

Important Considerations

🛠️ Project Structure

📋 Prerequisites

Setting up Twilio

Running Locally

🛠️ Installation

🔧 Usage

📂 API Documentation

Endpoints

🧩 Key Components

1. `app.py`

2. `stream.py`

3. Helpers

4. Services

5. Tools

🤝 Contributing

📝 License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dummy_db		dummy_db
helpers		helpers
images		images
routers		routers
services		services
tools		tools
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
app.py		app.py
readme.md		readme.md
requirements.txt		requirements.txt

License

SocialDiabetes/twilio-realtime-openai-rag

Folders and files

Latest commit

History

Repository files navigation

Twilio Real-time OpenAI Voice Assistant

🚀 Build a Real-time Voice Assistant with Twilio and OpenAI

✨ Features

💬 What is the OpenAI Realtime API?

Key Benefits of the Realtime API

Important Considerations

🛠️ Project Structure

📋 Prerequisites

Setting up Twilio

Running Locally

🛠️ Installation

🔧 Usage

📂 API Documentation

Endpoints

🧩 Key Components

1. app.py

2. stream.py

3. Helpers

4. Services

5. Tools

🤝 Contributing

📝 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `app.py`

2. `stream.py`

Packages