From 1ccf15773d3b11a649080be5105a2fe44f20e09b Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Tue, 31 Dec 2024 10:51:31 +0100 Subject: [PATCH 01/11] Init blogpost --- .../index.mdx | 204 ++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx new file mode 100644 index 0000000000..032654782c --- /dev/null +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx @@ -0,0 +1,204 @@ +--- +title: Simplifying Real-Time Voice Interactions with the Websocket Audio Adapter +authors: + - marklysze + - sternakt + - davorrunje + - davorinrusevljan +tags: [Realtime API, Voice Agents, Swarm Teams, Twilio, AI Tools] + +--- + +
+

Authors:

+ + +
+
+ +
+
+

Mark Sze

+

Software Engineer at AG2.ai

+
+
+
+ +
+
+ +
+
+

Tvrtko Sternak

+

Machine Learning Engineer at Airt

+
+
+
+ +
+
+ +
+
+

Davor Runje

+

CTO at Airt

+
+
+
+ +
+
+ +
+
+

Davorin Ruševljan

+

Developer

+
+
+
+
+
+ +**TL;DR:** +- **No More Twilio Setup**: Skip the complexity of configuring accounts, forwarding numbers, and managing telephony platforms. +- **Introducing WebsocketAudioAdapter**: Stream audio directly from your browser using WebSockets. +- **Simplified Development**: Connect to real-time agents quickly and effortlessly with minimal setup. + +# **Realtime over WebSockets** + +In our previous blog post, we introduced a way to interact with the Realtime Agent using Twilio. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce the `WebsocketAudioAdapter`, a streamlined approach to real-time audio streaming directly via a web browser. + +This post explores the features, benefits, and implementation of the `WebsocketAudioAdapter`, showing how it transforms the way we connect with real-time agents. + +--- + +## **Why We Built the WebsocketAudioAdapter** +### **Challenges with Existing Solutions** +Traditional methods like Twilio provide a robust telephony infrastructure, but they come with challenges: +- **Complex Setup**: Configuring Twilio accounts, verifying numbers, and setting up forwarding can be time-consuming. +- **Platform Dependency**: These solutions require developers to rely on external APIs, which may add latency or costs. +- **Browser Limitations**: For teams building web-first applications, integrating with a telephony platform can feel redundant. + +### **Our Solution** +The `WebsocketAudioAdapter` eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms. + +--- + +## **How It Works** +At its core, the `WebsocketAudioAdapter` leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a conversational AI agent processes them. + +Here’s a quick overview of its components and how they fit together: + +1. **WebSocket Connection**: + - The adapter establishes a WebSocket connection between the client (browser) and the server. + - Audio packets are streamed in real time through this connection. + +2. **Integration with FastAPI**: + - Using Python's FastAPI framework, developers can easily set up endpoints for handling WebSocket traffic. + +3. **Powered by Realtime Agents**: + - The audio adapter integrates with an AI-powered `RealtimeAgent`, allowing the bot to process audio inputs and respond intelligently. + +--- + +## **Key Features** +### **1. Simplified Setup** +Unlike traditional methods, the WebsocketAudioAdapter requires no phone numbers, no telephony configuration, and no external accounts. It's a plug-and-play solution. + +### **2. Real-Time Performance** +By streaming audio over WebSockets, the adapter ensures low latency, making conversations feel natural and seamless. + +### **3. Browser-Based** +Everything happens within the user's browser, meaning no additional software is required. This makes it ideal for web applications. + +### **4. Flexible Integration** +Whether you're building a chatbot, a voice assistant, or an interactive application, the adapter can integrate easily with existing frameworks and AI systems. + +--- + +## **Example: Build a Voice-Enabled Weather Bot** +Let’s walk through a practical example where we use the `WebsocketAudioAdapter` to create a voice-enabled weather bot. + +### **Step 1: Set Up the Server** +We use FastAPI to define the routes for handling WebSocket connections and serving the user interface. + +```python +from fastapi import FastAPI, WebSocket +from fastapi.responses import HTMLResponse, JSONResponse +from fastapi.staticfiles import StaticFiles +from fastapi.templating import Jinja2Templates +import uvicorn +from pathlib import Path + +app = FastAPI() +PORT = 5050 +notebook_path = Path.cwd() + +app.mount("/static", StaticFiles(directory=notebook_path / "static"), name="static") +templates = Jinja2Templates(directory=notebook_path / "templates") + +@app.get("/", response_class=JSONResponse) +async def index_page(): + return {"message": "Websocket Audio Stream Server is running!"} + +@app.get("/start-chat/", response_class=HTMLResponse) +async def start_chat(request): + return templates.TemplateResponse("chat.html", {"request": request, "port": PORT}) +``` + +### **Step 2: Define the WebSocket Endpoint** +Next, we create the `/media-stream` WebSocket route. This is where the real-time magic happens. + +```python +@app.websocket("/media-stream") +async def handle_media_stream(websocket: WebSocket): + await websocket.accept() + audio_adapter = WebsocketAudioAdapter(websocket) + realtime_agent = RealtimeAgent( + name="Weather Bot", + system_message="Hi! I can tell you the weather. Ask me anything.", + audio_adapter=audio_adapter + ) + + @realtime_agent.register_realtime_function(name="get_weather", description="Provides weather information.") + def get_weather(location: str) -> str: + return "The weather is sunny." if location.lower() != "seattle" else "The weather is cloudy." + + await realtime_agent.run() +``` + +### **Step 3: Serve the Interface** +A simple HTML interface (`chat.html`) allows users to interact with the bot. It connects to the `/media-stream` endpoint and streams audio. + +```html + +``` + +### **Step 4: Run the Server** +Start the server using Uvicorn: +```bash +uvicorn app:app --host 0.0.0.0 --port 5050 +``` + +## **Benefits in Action** +- **Quick Prototyping**: Spin up a real-time voice application in minutes. +- **Cost Efficiency**: Eliminate third-party telephony costs. +- **User-Friendly**: Runs in the browser, making it accessible to anyone with a microphone. + +## **Conclusion** +The `WebsocketAudioAdapter` marks a shift toward simpler, more accessible real-time audio solutions. By bypassing traditional telephony systems, it empowers developers to build and deploy voice applications faster and more efficiently. Whether you're creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming. + +Try it out and bring your voice-enabled ideas to life! From 276eee7a963d169c383450e9064931683a183ec2 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Tue, 31 Dec 2024 11:05:15 +0100 Subject: [PATCH 02/11] Fix notebook --- notebook/agentchat_realtime_websocket.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/notebook/agentchat_realtime_websocket.ipynb b/notebook/agentchat_realtime_websocket.ipynb index d97320401c..58dd9ddad8 100644 --- a/notebook/agentchat_realtime_websocket.ipynb +++ b/notebook/agentchat_realtime_websocket.ipynb @@ -81,7 +81,7 @@ "from fastapi.templating import Jinja2Templates\n", "\n", "import autogen\n", - "from autogen.agentchat.realtime_agent import RealtimeAgent, WebsocketAudioAdapter" + "from autogen.agentchat.realtime_agent import RealtimeAgent, WebSocketAudioAdapter" ] }, { @@ -261,7 +261,7 @@ "\n", " logger = getLogger(\"uvicorn.error\")\n", "\n", - " audio_adapter = WebsocketAudioAdapter(websocket, logger=logger)\n", + " audio_adapter = WebSocketAudioAdapter(websocket, logger=logger)\n", " realtime_agent = RealtimeAgent(\n", " name=\"Weather Bot\",\n", " system_message=\"Hello there! I am an AI voice assistant powered by Autogen and the OpenAI Realtime API. You can ask me about weather, jokes, or anything you can imagine. Start by saying 'How can I help you'?\",\n", From 286f041667d8b592ccd88c30f0e97294be163882 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Tue, 31 Dec 2024 13:10:07 +0100 Subject: [PATCH 03/11] Write websocket blogpost --- .../img/websocket_communication_diagram.png | 3 + .../index.mdx | 78 ++++++++----------- website/mint.json | 1 + 3 files changed, 38 insertions(+), 44 deletions(-) create mode 100644 website/blog/2024-12-31-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png b/website/blog/2024-12-31-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png new file mode 100644 index 0000000000..99c3e482c6 --- /dev/null +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1fc93941357add8d4c6db1b4675f9a04dda6bd43394b8b591e073b33d97297e6 +size 152437 diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx index 032654782c..8aea710e2c 100644 --- a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx @@ -5,7 +5,7 @@ authors: - sternakt - davorrunje - davorinrusevljan -tags: [Realtime API, Voice Agents, Swarm Teams, Twilio, AI Tools] +tags: [Realtime API, Voice Agents, AI Tools] --- @@ -59,33 +59,31 @@ tags: [Realtime API, Voice Agents, Swarm Teams, Twilio, AI Tools] +![Realtime agent communication over websocket](img/websocket_communication_diagram.png) + **TL;DR:** - **No More Twilio Setup**: Skip the complexity of configuring accounts, forwarding numbers, and managing telephony platforms. -- **Introducing WebsocketAudioAdapter**: Stream audio directly from your browser using WebSockets. +- **Introducing [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)**: Stream audio directly from your browser using WebSockets. - **Simplified Development**: Connect to real-time agents quickly and effortlessly with minimal setup. # **Realtime over WebSockets** -In our previous blog post, we introduced a way to interact with the Realtime Agent using Twilio. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce the `WebsocketAudioAdapter`, a streamlined approach to real-time audio streaming directly via a web browser. - -This post explores the features, benefits, and implementation of the `WebsocketAudioAdapter`, showing how it transforms the way we connect with real-time agents. +In our [previous blog post](/blog/2024-12-20-RealtimeAgent/index), we introduced a way to interact with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) using [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter). While effective, this approach required a setup-intensive process involving [Twilio](https://www.twilio.com/) integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce the WebsocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser. ---- +This post explores the features, benefits, and implementation of the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter), showing how it transforms the way we connect with real-time agents. -## **Why We Built the WebsocketAudioAdapter** +## **Why We Built the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)** ### **Challenges with Existing Solutions** -Traditional methods like Twilio provide a robust telephony infrastructure, but they come with challenges: +Previously introduced [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter) provides a robust way to cennect to your [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent), it comes with challenges: - **Complex Setup**: Configuring Twilio accounts, verifying numbers, and setting up forwarding can be time-consuming. -- **Platform Dependency**: These solutions require developers to rely on external APIs, which may add latency or costs. +- **Platform Dependency**: This solution requires developers to rely on external API, which adds latency and costs. - **Browser Limitations**: For teams building web-first applications, integrating with a telephony platform can feel redundant. ### **Our Solution** -The `WebsocketAudioAdapter` eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms. - ---- +The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms. ## **How It Works** -At its core, the `WebsocketAudioAdapter` leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a conversational AI agent processes them. +At its core, the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) agent processes them. Here’s a quick overview of its components and how they fit together: @@ -94,16 +92,14 @@ Here’s a quick overview of its components and how they fit together: - Audio packets are streamed in real time through this connection. 2. **Integration with FastAPI**: - - Using Python's FastAPI framework, developers can easily set up endpoints for handling WebSocket traffic. + - Using Python's [FastAPI](https://fastapi.tiangolo.com/) framework, developers can easily set up endpoints for handling WebSocket traffic. 3. **Powered by Realtime Agents**: - - The audio adapter integrates with an AI-powered `RealtimeAgent`, allowing the bot to process audio inputs and respond intelligently. - ---- + - The audio adapter integrates with an AI-powered [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent), allowing the agent to process audio inputs and respond intelligently. ## **Key Features** ### **1. Simplified Setup** -Unlike traditional methods, the WebsocketAudioAdapter requires no phone numbers, no telephony configuration, and no external accounts. It's a plug-and-play solution. +Unlike [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter), the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) requires no phone numbers, no telephony configuration, and no external accounts. It's a plug-and-play solution. ### **2. Real-Time Performance** By streaming audio over WebSockets, the adapter ensures low latency, making conversations feel natural and seamless. @@ -114,13 +110,15 @@ Everything happens within the user's browser, meaning no additional software is ### **4. Flexible Integration** Whether you're building a chatbot, a voice assistant, or an interactive application, the adapter can integrate easily with existing frameworks and AI systems. ---- - ## **Example: Build a Voice-Enabled Weather Bot** -Let’s walk through a practical example where we use the `WebsocketAudioAdapter` to create a voice-enabled weather bot. +Let’s walk through a practical example where we use the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) to create a voice-enabled weather bot. +The full example is in [this notebook](/notebooks/agentchat_realtime_websocket), you can run it locally by cloning the AG2 repo and executing the notebook. ### **Step 1: Set Up the Server** -We use FastAPI to define the routes for handling WebSocket connections and serving the user interface. +We use [FastAPI](https://fastapi.tiangolo.com/) to serve the chat interface and handle WebSocket connections. A key part is configuring the server to load and render HTML templates dynamically for the user interface. + +- **Template Loading**: Use `Jinja2Templates` to load `chat.html` from the `templates` directory. The template is dynamically rendered with variables like the server's `port`. +- **Static Files**: Serve assets (e.g., JavaScript, CSS) from the `static` directory. ```python from fastapi import FastAPI, WebSocket @@ -167,38 +165,30 @@ async def handle_media_stream(websocket: WebSocket): await realtime_agent.run() ``` -### **Step 3: Serve the Interface** -A simple HTML interface (`chat.html`) allows users to interact with the bot. It connects to the `/media-stream` endpoint and streams audio. - -```html - -``` - -### **Step 4: Run the Server** +### **Step 3: Run the Server** Start the server using Uvicorn: ```bash uvicorn app:app --host 0.0.0.0 --port 5050 ``` +After you start the server you should your application running in the logs: + +```bash +INFO: Started server process [64425] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:5050 (Press CTRL+C to quit) +``` + +### Ready to Chat? 🚀 +Now you can simply open [**localhost:5050/start-chat**](http://localhost:5050/start-chat) in your browser, and dive into an interactive conversation with the RealtimeAgent! 🎤✨ + ## **Benefits in Action** - **Quick Prototyping**: Spin up a real-time voice application in minutes. - **Cost Efficiency**: Eliminate third-party telephony costs. - **User-Friendly**: Runs in the browser, making it accessible to anyone with a microphone. ## **Conclusion** -The `WebsocketAudioAdapter` marks a shift toward simpler, more accessible real-time audio solutions. By bypassing traditional telephony systems, it empowers developers to build and deploy voice applications faster and more efficiently. Whether you're creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming. +The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) marks a shift toward simpler, more accessible real-time audio solutions. It empowers developers to build and deploy voice applications faster and more efficiently. Whether you're creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming. Try it out and bring your voice-enabled ideas to life! diff --git a/website/mint.json b/website/mint.json index 665f3caaa4..9605ba4d94 100644 --- a/website/mint.json +++ b/website/mint.json @@ -498,6 +498,7 @@ { "group": "Recent posts", "pages": [ + "blog/2024-12-31-RealtimeAgent-over-websocket/index", "blog/2024-12-20-RealtimeAgent/index", "blog/2024-12-20-Tools-interoperability/index", "blog/2024-12-20-Reasoning-Update/index", From 01d4440f7a6b41b0fcb562f1bd98821c6c20268e Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Thu, 2 Jan 2025 10:59:50 +0100 Subject: [PATCH 04/11] Fix websocket notebook issues --- notebook/agentchat_realtime_websocket.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/notebook/agentchat_realtime_websocket.ipynb b/notebook/agentchat_realtime_websocket.ipynb index 58dd9ddad8..b30bc0c71d 100644 --- a/notebook/agentchat_realtime_websocket.ipynb +++ b/notebook/agentchat_realtime_websocket.ipynb @@ -31,7 +31,7 @@ "\n", "To use the realtime agent we will connect it to a local websocket trough the browser.\n", "\n", - "We have prepared a `WebsocketAudioAdapter` to enable you to connect your realtime agent to a websocket service.\n", + "We have prepared a `WebSocketAudioAdapter` to enable you to connect your realtime agent to a websocket service.\n", "\n", "To be able to run this notebook, you will need to install ag2, fastapi and uvicorn.\n", "````{=mdx}\n", @@ -51,7 +51,7 @@ "metadata": {}, "outputs": [], "source": [ - "!pip install \"ag2\" \"fastapi>=0.115.0,<1\" \"uvicorn>=0.30.6,<1\"" + "!pip install \"ag2\" \"fastapi>=0.115.0,<1\" \"uvicorn>=0.30.6,<1\" \"jinja2\"" ] }, { @@ -171,7 +171,7 @@ "\n", "1. **Define Port**: Sets the `PORT` variable to `5050`, which will be used for the server.\n", "2. **Initialize FastAPI App**: Creates a `FastAPI` instance named `app`, which serves as the main application.\n", - "3. **Define Root Endpoint**: Adds a `GET` endpoint at the root URL (`/`). When accessed, it returns a JSON response with the message `\"Websocket Audio Stream Server is running!\"`.\n", + "3. **Define Root Endpoint**: Adds a `GET` endpoint at the root URL (`/`). When accessed, it returns a JSON response with the message `\"WebSocket Audio Stream Server is running!\"`.\n", "\n", "This sets up a basic FastAPI server and provides a simple health-check endpoint to confirm that the server is operational." ] @@ -188,7 +188,7 @@ "\n", "@app.get(\"/\", response_class=JSONResponse)\n", "async def index_page():\n", - " return {\"message\": \"Websocket Audio Stream Server is running!\"}" + " return {\"message\": \"WebSocket Audio Stream Server is running!\"}" ] }, { @@ -237,7 +237,7 @@ "1. **Set Up the WebSocket Endpoint**: Define the `/media-stream` WebSocket route to handle audio streaming.\n", "2. **Accept WebSocket Connections**: Accept incoming WebSocket connections from clients.\n", "3. **Initialize Logger**: Retrieve a logger instance for logging purposes.\n", - "4. **Configure Audio Adapter**: Instantiate a `WebsocketAudioAdapter`, connecting the WebSocket to handle audio streaming with logging.\n", + "4. **Configure Audio Adapter**: Instantiate a `WebSocketAudioAdapter`, connecting the WebSocket to handle audio streaming with logging.\n", "5. **Set Up Realtime Agent**: Create a `RealtimeAgent` with the following:\n", " - **Name**: `Weather Bot`.\n", " - **System Message**: Introduces the AI assistant and its capabilities.\n", @@ -303,7 +303,7 @@ ] }, "kernelspec": { - "display_name": ".venv-3.9", + "display_name": ".venv", "language": "python", "name": "python3" }, @@ -317,7 +317,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.20" + "version": "3.10.16" } }, "nbformat": 4, From 138ba21c483a3a3e9fd74afb5ecd903ce7f50572 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Fri, 3 Jan 2025 13:48:05 +0100 Subject: [PATCH 05/11] WIP: blogpost --- notebook/agentchat_realtime_websocket.ipynb | 2 +- .../index.mdx | 24 +++++++++---------- website/mint.json | 5 +++- 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/notebook/agentchat_realtime_websocket.ipynb b/notebook/agentchat_realtime_websocket.ipynb index defeb80012..bcd9affd0a 100644 --- a/notebook/agentchat_realtime_websocket.ipynb +++ b/notebook/agentchat_realtime_websocket.ipynb @@ -65,7 +65,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx index 8aea710e2c..1e07da61ef 100644 --- a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx @@ -1,5 +1,5 @@ --- -title: Simplifying Real-Time Voice Interactions with the Websocket Audio Adapter +title: Real-Time Voice Interactions with the WebSocket Audio Adapter authors: - marklysze - sternakt @@ -62,37 +62,37 @@ tags: [Realtime API, Voice Agents, AI Tools] ![Realtime agent communication over websocket](img/websocket_communication_diagram.png) **TL;DR:** -- **No More Twilio Setup**: Skip the complexity of configuring accounts, forwarding numbers, and managing telephony platforms. -- **Introducing [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)**: Stream audio directly from your browser using WebSockets. +- **Demo implementation**: Implement a website using websockets and communicate using voice with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) +- **Introducing [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)**: Stream audio directly from your browser using [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/). - **Simplified Development**: Connect to real-time agents quickly and effortlessly with minimal setup. # **Realtime over WebSockets** -In our [previous blog post](/blog/2024-12-20-RealtimeAgent/index), we introduced a way to interact with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) using [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter). While effective, this approach required a setup-intensive process involving [Twilio](https://www.twilio.com/) integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce the WebsocketAudioAdapter, a streamlined approach to real-time audio streaming directly via a web browser. +In our [previous blog post](/blog/2024-12-20-RealtimeAgent/index), we introduced a way to interact with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) using [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter). While effective, this approach required a setup-intensive process involving [Twilio](https://www.twilio.com/) integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce the[**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter), a streamlined approach to real-time audio streaming directly via a web browser. This post explores the features, benefits, and implementation of the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter), showing how it transforms the way we connect with real-time agents. ## **Why We Built the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)** ### **Challenges with Existing Solutions** Previously introduced [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter) provides a robust way to cennect to your [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent), it comes with challenges: +- **Browser Limitations**: For teams building web-first applications, integrating with a telephony platform can feel redundant. - **Complex Setup**: Configuring Twilio accounts, verifying numbers, and setting up forwarding can be time-consuming. - **Platform Dependency**: This solution requires developers to rely on external API, which adds latency and costs. -- **Browser Limitations**: For teams building web-first applications, integrating with a telephony platform can feel redundant. ### **Our Solution** -The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms. +The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) eliminates these challenges by allowing direct audio streaming over [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/). It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms. ## **How It Works** -At its core, the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) agent processes them. +At its core, the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) leverages [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) agent processes them. Here’s a quick overview of its components and how they fit together: 1. **WebSocket Connection**: - - The adapter establishes a WebSocket connection between the client (browser) and the server. + - The adapter establishes a [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) connection between the client (browser) and the server. - Audio packets are streamed in real time through this connection. 2. **Integration with FastAPI**: - - Using Python's [FastAPI](https://fastapi.tiangolo.com/) framework, developers can easily set up endpoints for handling WebSocket traffic. + - Using Python's [FastAPI](https://fastapi.tiangolo.com/) framework, developers can easily set up endpoints for handling [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) traffic. 3. **Powered by Realtime Agents**: - The audio adapter integrates with an AI-powered [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent), allowing the agent to process audio inputs and respond intelligently. @@ -102,7 +102,7 @@ Here’s a quick overview of its components and how they fit together: Unlike [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter), the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) requires no phone numbers, no telephony configuration, and no external accounts. It's a plug-and-play solution. ### **2. Real-Time Performance** -By streaming audio over WebSockets, the adapter ensures low latency, making conversations feel natural and seamless. +By streaming audio over [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/), the adapter ensures low latency, making conversations feel natural and seamless. ### **3. Browser-Based** Everything happens within the user's browser, meaning no additional software is required. This makes it ideal for web applications. @@ -137,7 +137,7 @@ templates = Jinja2Templates(directory=notebook_path / "templates") @app.get("/", response_class=JSONResponse) async def index_page(): - return {"message": "Websocket Audio Stream Server is running!"} + return {"message": "WebSocket Audio Stream Server is running!"} @app.get("/start-chat/", response_class=HTMLResponse) async def start_chat(request): @@ -151,7 +151,7 @@ Next, we create the `/media-stream` WebSocket route. This is where the real-time @app.websocket("/media-stream") async def handle_media_stream(websocket: WebSocket): await websocket.accept() - audio_adapter = WebsocketAudioAdapter(websocket) + audio_adapter = WebSocketAudioAdapter(websocket) realtime_agent = RealtimeAgent( name="Weather Bot", system_message="Hi! I can tell you the weather. Ask me anything.", diff --git a/website/mint.json b/website/mint.json index d1b277083f..454f51ddc5 100644 --- a/website/mint.json +++ b/website/mint.json @@ -303,13 +303,16 @@ { "group": "agentchat.realtime_agent", "pages": [ + "docs/reference/agentchat/realtime_agent/client", "docs/reference/agentchat/realtime_agent/function_observer", "docs/reference/agentchat/realtime_agent/oai_realtime_client", "docs/reference/agentchat/realtime_agent/realtime_agent", "docs/reference/agentchat/realtime_agent/realtime_client", "docs/reference/agentchat/realtime_agent/realtime_observer", "docs/reference/agentchat/realtime_agent/twilio_audio_adapter", - "docs/reference/agentchat/realtime_agent/websocket_audio_adapter" + "docs/reference/agentchat/realtime_agent/twilio_observer", + "docs/reference/agentchat/realtime_agent/websocket_audio_adapter", + "docs/reference/agentchat/realtime_agent/websocket_observer" ] }, "docs/reference/agentchat/agent", From f4df0d42cee2a98517ee7b7da850b1e7cd5e709a Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Fri, 3 Jan 2025 15:54:57 +0100 Subject: [PATCH 06/11] Reference repository in blogpost --- .../index.mdx | 192 ++++++++++++++---- 1 file changed, 152 insertions(+), 40 deletions(-) diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx index 1e07da61ef..98df874837 100644 --- a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx @@ -112,76 +112,188 @@ Whether you're building a chatbot, a voice assistant, or an interactive applicat ## **Example: Build a Voice-Enabled Weather Bot** Let’s walk through a practical example where we use the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) to create a voice-enabled weather bot. -The full example is in [this notebook](/notebooks/agentchat_realtime_websocket), you can run it locally by cloning the AG2 repo and executing the notebook. +The full example is in [this repository](https://github.com/sternakt/RealtimeAgent-WebSocketAudioAdapter/tree/main). -### **Step 1: Set Up the Server** +To run the demo example, follow these steps: + +### **1. Clone the Repository** +```bash +git clone https://github.com/sternakt/RealtimeAgent-WebSocketAudioAdapter.git +cd RealtimeAgent-WebSocketAudioAdapter +``` + +### **2. Set Up Environment Variables** +Create a `.env` file based on the provided `.env.example`: +```bash +cp .env.example .env +``` +In the .env file, update the `OPENAI_API_KEY` to your OpenAI API key. + +### (Optional) Create and use a virtual environment + +To reduce cluttering your global Python environment on your machine, you can create a virtual environment. On your command line, enter: + +``` +python3 -m venv env +source env/bin/activate +``` + +### **3. Install Dependencies** +Install the required Python packages using `pip`: +```bash +pip install -r requirements.txt +``` + +### **4. Start the Server** +Run the application with Uvicorn: +```bash +uvicorn realtime_over_websockets.main:app --port 5050 +``` + +After you start the server you should see your application running in the logs: + +```bash +INFO: Started server process [64425] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:5050 (Press CTRL+C to quit) +``` + +### Ready to Chat? 🚀 +Now you can simply open [**localhost:5050/start-chat**](http://localhost:5050/start-chat) in your browser, and dive into an interactive conversation with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent)! 🎤✨ + + +## Code review +Let’s dive in and break down how this example works—from setting up the server to handling real-time audio streaming with [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/). + +### **Set Up the FastAPI app** We use [FastAPI](https://fastapi.tiangolo.com/) to serve the chat interface and handle WebSocket connections. A key part is configuring the server to load and render HTML templates dynamically for the user interface. - **Template Loading**: Use `Jinja2Templates` to load `chat.html` from the `templates` directory. The template is dynamically rendered with variables like the server's `port`. - **Static Files**: Serve assets (e.g., JavaScript, CSS) from the `static` directory. ```python -from fastapi import FastAPI, WebSocket -from fastapi.responses import HTMLResponse, JSONResponse -from fastapi.staticfiles import StaticFiles -from fastapi.templating import Jinja2Templates -import uvicorn -from pathlib import Path - app = FastAPI() -PORT = 5050 -notebook_path = Path.cwd() -app.mount("/static", StaticFiles(directory=notebook_path / "static"), name="static") -templates = Jinja2Templates(directory=notebook_path / "templates") @app.get("/", response_class=JSONResponse) -async def index_page(): +async def index_page() -> dict[str, str]: return {"message": "WebSocket Audio Stream Server is running!"} + +website_files_path = Path(__file__).parent / "website_files" + +app.mount( + "/static", StaticFiles(directory=website_files_path / "static"), name="static" +) + +templates = Jinja2Templates(directory=website_files_path / "templates") + + @app.get("/start-chat/", response_class=HTMLResponse) -async def start_chat(request): - return templates.TemplateResponse("chat.html", {"request": request, "port": PORT}) +async def start_chat(request: Request) -> HTMLResponse: + """Endpoint to return the HTML page for audio chat.""" + port = request.url.port + return templates.TemplateResponse("chat.html", {"request": request, "port": port}) ``` -### **Step 2: Define the WebSocket Endpoint** -Next, we create the `/media-stream` WebSocket route. This is where the real-time magic happens. +### Defining the WebSocket Endpoint + +The `/media-stream` WebSocket route is where real-time audio interaction is processed and streamed to the AI assistant. Let’s break it down step-by-step: + +1. **Accept the WebSocket Connection** + The WebSocket connection is established when a client connects to `/media-stream`. Using `await websocket.accept()`, we ensure the connection is live and ready for communication. ```python @app.websocket("/media-stream") -async def handle_media_stream(websocket: WebSocket): +async def handle_media_stream(websocket: WebSocket) -> None: + """Handle WebSocket connections providing audio stream and OpenAI.""" await websocket.accept() - audio_adapter = WebSocketAudioAdapter(websocket) +``` + +2. **Initialize Logging** + A logger instance (`getLogger("uvicorn.error")`) is set up to monitor and debug the server's activities, helping track events during the connection and interaction process. + +```python + logger = getLogger("uvicorn.error") +``` +3. **Set Up the `WebSocketAudioAdapter`** + The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) bridges the client’s audio stream with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent). It streams audio data over [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) in real time, ensuring seamless communication between the browser and the agent. + +```python + audio_adapter = WebSocketAudioAdapter(websocket, logger=logger) +``` + +4. **Configure the Realtime Agent** + The `RealtimeAgent` is the AI assistant driving the interaction. Key parameters include: + - **Name**: The agent identity, here called `"Weather Bot"`. + - **System Message**: System message for the agent. + - **Language Model Configuration**: Defined by `realtime_llm_config` for LLM settings. + - **Audio Adapter**: Connects the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) for handling audio. + - **Logger**: Logs the agent's activities for better observability. + +```python realtime_agent = RealtimeAgent( name="Weather Bot", - system_message="Hi! I can tell you the weather. Ask me anything.", - audio_adapter=audio_adapter + system_message="Hello there! I am an AI voice assistant powered by Autogen and the OpenAI Realtime API. You can ask me about weather, jokes, or anything you can imagine. Start by saying 'How can I help you'?", + llm_config=realtime_llm_config, + audio_adapter=audio_adapter, + logger=logger, ) +``` - @realtime_agent.register_realtime_function(name="get_weather", description="Provides weather information.") - def get_weather(location: str) -> str: - return "The weather is sunny." if location.lower() != "seattle" else "The weather is cloudy." +5. **Define a Custom Realtime Function** + The `get_weather` function is registered as a realtime callable function. When the user asks about the weather, the agent can call the function to get an accurate weather report and respond based on the provided information: + - Returns `"The weather is cloudy."` for `"Seattle"`. + - Returns `"The weather is sunny."` for other locations. - await realtime_agent.run() +```python + @realtime_agent.register_realtime_function( # type: ignore [misc] + name="get_weather", description="Get the current weather" + ) + def get_weather(location: Annotated[str, "city"]) -> str: + return ( + "The weather is cloudy." + if location == "Seattle" + else "The weather is sunny." + ) ``` -### **Step 3: Run the Server** -Start the server using Uvicorn: -```bash -uvicorn app:app --host 0.0.0.0 --port 5050 -``` +6. **Run the Realtime Agent** + The `await realtime_agent.run()` method starts the agent, handling incoming audio streams, processing user queries, and responding in real time. -After you start the server you should your application running in the logs: +Here is the full code for the `/media-stream` endpoint: -```bash -INFO: Started server process [64425] -INFO: Waiting for application startup. -INFO: Application startup complete. -INFO: Uvicorn running on http://0.0.0.0:5050 (Press CTRL+C to quit) -``` +```python +@app.websocket("/media-stream") +async def handle_media_stream(websocket: WebSocket) -> None: + """Handle WebSocket connections providing audio stream and OpenAI.""" + await websocket.accept() -### Ready to Chat? 🚀 -Now you can simply open [**localhost:5050/start-chat**](http://localhost:5050/start-chat) in your browser, and dive into an interactive conversation with the RealtimeAgent! 🎤✨ + logger = getLogger("uvicorn.error") + + audio_adapter = WebSocketAudioAdapter(websocket, logger=logger) + + realtime_agent = RealtimeAgent( + name="Weather Bot", + system_message="Hello there! I am an AI voice assistant powered by Autogen and the OpenAI Realtime API. You can ask me about weather, jokes, or anything you can imagine. Start by saying 'How can I help you'?", + llm_config=realtime_llm_config, + audio_adapter=audio_adapter, + logger=logger, + ) + + @realtime_agent.register_realtime_function( # type: ignore [misc] + name="get_weather", description="Get the current weather" + ) + def get_weather(location: Annotated[str, "city"]) -> str: + return ( + "The weather is cloudy." + if location == "Seattle" + else "The weather is sunny." + ) + + await realtime_agent.run() +``` ## **Benefits in Action** - **Quick Prototyping**: Spin up a real-time voice application in minutes. From 4c2120e5ae0293c40e9122d93bb5cfd6b764d758 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Fri, 3 Jan 2025 15:57:13 +0100 Subject: [PATCH 07/11] Polish --- website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx index 98df874837..8c821a9788 100644 --- a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx @@ -112,7 +112,7 @@ Whether you're building a chatbot, a voice assistant, or an interactive applicat ## **Example: Build a Voice-Enabled Weather Bot** Let’s walk through a practical example where we use the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) to create a voice-enabled weather bot. -The full example is in [this repository](https://github.com/sternakt/RealtimeAgent-WebSocketAudioAdapter/tree/main). +You can find the full example [here](https://github.com/sternakt/RealtimeAgent-WebSocketAudioAdapter/tree/main). To run the demo example, follow these steps: From 607e7116f490dea67de9982b31afce481afe9e14 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Fri, 3 Jan 2025 16:44:05 +0100 Subject: [PATCH 08/11] Polish blogpost --- .../blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx index 8c821a9788..127d504e5e 100644 --- a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx @@ -123,11 +123,11 @@ cd RealtimeAgent-WebSocketAudioAdapter ``` ### **2. Set Up Environment Variables** -Create a `.env` file based on the provided `.env.example`: +Create a `OAI_CONFIG_LIST` file based on the provided `OAI_CONFIG_LIST_sample`: ```bash -cp .env.example .env +cp OAI_CONFIG_LIST_sample OAI_CONFIG_LIST ``` -In the .env file, update the `OPENAI_API_KEY` to your OpenAI API key. +In the OAI_CONFIG_LIST file, update the `api_key` to your OpenAI API key. ### (Optional) Create and use a virtual environment From 5e31097c2354ebe1b277f81e49a904fa75fca100 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Wed, 8 Jan 2025 08:12:23 +0100 Subject: [PATCH 09/11] Update date for realtime websocket blogpost --- .../img/websocket_communication_diagram.png | 0 .../index.mdx | 6 +++--- 2 files changed, 3 insertions(+), 3 deletions(-) rename website/blog/{2024-12-31-RealtimeAgent-over-websocket => 2025-01-08-RealtimeAgent-over-websocket}/img/websocket_communication_diagram.png (100%) rename website/blog/{2024-12-31-RealtimeAgent-over-websocket => 2025-01-08-RealtimeAgent-over-websocket}/index.mdx (98%) diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png b/website/blog/2025-01-08-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png similarity index 100% rename from website/blog/2024-12-31-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png rename to website/blog/2025-01-08-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png diff --git a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx b/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx similarity index 98% rename from website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx rename to website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx index 127d504e5e..fe07e64dc6 100644 --- a/website/blog/2024-12-31-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx @@ -112,14 +112,14 @@ Whether you're building a chatbot, a voice assistant, or an interactive applicat ## **Example: Build a Voice-Enabled Weather Bot** Let’s walk through a practical example where we use the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) to create a voice-enabled weather bot. -You can find the full example [here](https://github.com/sternakt/RealtimeAgent-WebSocketAudioAdapter/tree/main). +You can find the full example [here](https://github.com/ag2ai/realtime-agent-over-websockets/tree/main). To run the demo example, follow these steps: ### **1. Clone the Repository** ```bash -git clone https://github.com/sternakt/RealtimeAgent-WebSocketAudioAdapter.git -cd RealtimeAgent-WebSocketAudioAdapter +git clone https://github.com/ag2ai/realtime-agent-over-websockets.git +cd realtime-agent-over-websockets ``` ### **2. Set Up Environment Variables** From 075f086d63c362673e405001b14b4df4fcbf257b Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Wed, 8 Jan 2025 08:18:30 +0100 Subject: [PATCH 10/11] Fix mint.json after folder renaming --- website/mint.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/mint.json b/website/mint.json index 664700348b..c54861d607 100644 --- a/website/mint.json +++ b/website/mint.json @@ -507,7 +507,7 @@ { "group": "Recent posts", "pages": [ - "blog/2024-12-31-RealtimeAgent-over-websocket/index", + "blog/2025-01-08-RealtimeAgent-over-websocket/index", "blog/2024-12-20-RealtimeAgent/index", "blog/2024-12-20-Tools-interoperability/index", "blog/2024-12-20-Reasoning-Update/index", From ab1316a32d7d18e1df1d07b94a707dc5cc302cc5 Mon Sep 17 00:00:00 2001 From: Tvrtko Sternak Date: Wed, 8 Jan 2025 10:08:21 +0100 Subject: [PATCH 11/11] Polishing --- .../blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx b/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx index fe07e64dc6..6885e53379 100644 --- a/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx +++ b/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx @@ -162,6 +162,11 @@ INFO: Uvicorn running on http://0.0.0.0:5050 (Press CTRL+C to quit) ### Ready to Chat? 🚀 Now you can simply open [**localhost:5050/start-chat**](http://localhost:5050/start-chat) in your browser, and dive into an interactive conversation with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent)! 🎤✨ +To get started, simply speak into your microphone and ask a question. For example, you can say: + +**"What's the weather like in Seattle?"** + +This initial question will activate the agent, and it will respond, showcasing its ability to understand and interact with you in real time. ## Code review Let’s dive in and break down how this example works—from setting up the server to handling real-time audio streaming with [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/).