ag2ai · davorrunje · Jan 8, 2025 · Dec 31, 2024 · Dec 31, 2024 · Dec 31, 2024
diff --git a/notebook/agentchat_realtime_websocket.ipynb b/notebook/agentchat_realtime_websocket.ipynb
@@ -51,7 +51,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install \"ag2\" \"fastapi>=0.115.0,<1\" \"uvicorn>=0.30.6,<1\""
+    "!pip install \"ag2\" \"fastapi>=0.115.0,<1\" \"uvicorn>=0.30.6,<1\" \"jinja2\""
    ]
   },
   {
@@ -65,7 +65,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -171,7 +171,7 @@
     "\n",
     "1. **Define Port**: Sets the `PORT` variable to `5050`, which will be used for the server.\n",
     "2. **Initialize FastAPI App**: Creates a `FastAPI` instance named `app`, which serves as the main application.\n",
-    "3. **Define Root Endpoint**: Adds a `GET` endpoint at the root URL (`/`). When accessed, it returns a JSON response with the message `\"Websocket Audio Stream Server is running!\"`.\n",
+    "3. **Define Root Endpoint**: Adds a `GET` endpoint at the root URL (`/`). When accessed, it returns a JSON response with the message `\"WebSocket Audio Stream Server is running!\"`.\n",
     "\n",
     "This sets up a basic FastAPI server and provides a simple health-check endpoint to confirm that the server is operational."
    ]
@@ -189,7 +189,7 @@
     "\n",
     "@app.get(\"/\", response_class=JSONResponse)\n",
     "async def index_page():\n",
-    "    return {\"message\": \"Websocket Audio Stream Server is running!\"}"
+    "    return {\"message\": \"WebSocket Audio Stream Server is running!\"}"
    ]
   },
   {
@@ -310,7 +310,7 @@
    ]
   },
   "kernelspec": {
-   "display_name": ".venv-3.9",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
@@ -324,7 +324,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.20"
+   "version": "3.10.16"
   }
  },
  "nbformat": 4,

diff --git a/...2025-01-08-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png b/...2025-01-08-RealtimeAgent-over-websocket/img/websocket_communication_diagram.png
diff --git a/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx b/website/blog/2025-01-08-RealtimeAgent-over-websocket/index.mdx
@@ -0,0 +1,311 @@
+---
+title: Real-Time Voice Interactions with the WebSocket Audio Adapter
+authors:
+  - marklysze
+  - sternakt
+  - davorrunje
+  - davorinrusevljan
+tags: [Realtime API, Voice Agents, AI Tools]
+
+---
+
+<div class="blog-authors">
+   <p class="authors">Authors:</p>
+   <CardGroup cols={2}>
+      <Card href="https://github.com/marklysze">
+         <div class="col card">
+         <div class="img-placeholder">
+            <img noZoom src="https://github.com/marklysze.png" />
+         </div>
+         <div>
+            <p class="name">Mark Sze</p>
+            <p>Software Engineer at AG2.ai</p>
+         </div>
+      </div>
+      </Card>
+      <Card href="https://github.com/sternakt">
+         <div class="col card">
+         <div class="img-placeholder">
+            <img noZoom src="https://github.com/sternakt.png" />
+         </div>
+         <div>
+            <p class="name">Tvrtko Sternak</p>
+            <p>Machine Learning Engineer at Airt</p>
+         </div>
+      </div>
+      </Card>
+      <Card href="https://github.com/davorrunje">
+         <div class="col card">
+         <div class="img-placeholder">
+            <img noZoom src="https://github.com/davorrunje.png" />
+         </div>
+         <div>
+            <p class="name">Davor Runje</p>
+            <p>CTO at Airt</p>
+         </div>
+      </div>
+      </Card>
+      <Card href="https://github.com/davorinrusevljan">
+         <div class="col card">
+         <div class="img-placeholder">
+            <img noZoom src="https://github.com/davorinrusevljan.png" />
+         </div>
+         <div>
+            <p class="name">Davorin Ruševljan</p>
+            <p>Developer</p>
+         </div>
+      </div>
+      </Card>
+   </CardGroup>
+</div>
+
+![Realtime agent communication over websocket](img/websocket_communication_diagram.png)
+
+**TL;DR:**
+- **Demo implementation**: Implement a website using websockets and communicate using voice with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent)
+- **Introducing [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)**: Stream audio directly from your browser using [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/).
+- **Simplified Development**: Connect to real-time agents quickly and effortlessly with minimal setup.
+
+# **Realtime over WebSockets**
+
+In our [previous blog post](/blog/2024-12-20-RealtimeAgent/index), we introduced a way to interact with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) using [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter). While effective, this approach required a setup-intensive process involving [Twilio](https://www.twilio.com/) integration, account configuration, number forwarding, and other complexities. Today, we're excited to introduce the[**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter), a streamlined approach to real-time audio streaming directly via a web browser.
+
+This post explores the features, benefits, and implementation of the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter), showing how it transforms the way we connect with real-time agents.
+
+## **Why We Built the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter)**
+### **Challenges with Existing Solutions**
+Previously introduced [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter) provides a robust way to cennect to your [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent), it comes with challenges:
+- **Browser Limitations**: For teams building web-first applications, integrating with a telephony platform can feel redundant.
+- **Complex Setup**: Configuring Twilio accounts, verifying numbers, and setting up forwarding can be time-consuming.
+- **Platform Dependency**: This solution requires developers to rely on external API, which adds latency and costs.
+
+### **Our Solution**
+The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) eliminates these challenges by allowing direct audio streaming over [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/). It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms.
+
+## **How It Works**
+At its core, the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) leverages [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent) agent processes them.
+
+Here’s a quick overview of its components and how they fit together:
+
+1. **WebSocket Connection**:
+   - The adapter establishes a [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) connection between the client (browser) and the server.
+   - Audio packets are streamed in real time through this connection.
+
+2. **Integration with FastAPI**:
+   - Using Python's [FastAPI](https://fastapi.tiangolo.com/) framework, developers can easily set up endpoints for handling  [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) traffic.
+
+3. **Powered by Realtime Agents**:
+   - The audio adapter integrates with an AI-powered [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent), allowing the agent to process audio inputs and respond intelligently.
+
+## **Key Features**
+### **1. Simplified Setup**
+Unlike [**`TwilioAudioAdapter`**](/docs/reference/agentchat/realtime_agent/twilio_audio_adapter#twilioaudioadapter), the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) requires no phone numbers, no telephony configuration, and no external accounts. It's a plug-and-play solution.
+
+### **2. Real-Time Performance**
+By streaming audio over [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/), the adapter ensures low latency, making conversations feel natural and seamless.
+
+### **3. Browser-Based**
+Everything happens within the user's browser, meaning no additional software is required. This makes it ideal for web applications.
+
+### **4. Flexible Integration**
+Whether you're building a chatbot, a voice assistant, or an interactive application, the adapter can integrate easily with existing frameworks and AI systems.
+
+## **Example: Build a Voice-Enabled Weather Bot**
+Let’s walk through a practical example where we use the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) to create a voice-enabled weather bot.
+You can find the full example [here](https://github.com/ag2ai/realtime-agent-over-websockets/tree/main).
+
+To run the demo example, follow these steps:
+
+### **1. Clone the Repository**
+```bash
+git clone https://github.com/ag2ai/realtime-agent-over-websockets.git
+cd realtime-agent-over-websockets
+```
+
+### **2. Set Up Environment Variables**
+Create a `OAI_CONFIG_LIST` file based on the provided `OAI_CONFIG_LIST_sample`:
+```bash
+cp OAI_CONFIG_LIST_sample OAI_CONFIG_LIST
+```
+In the OAI_CONFIG_LIST file, update the `api_key` to your OpenAI API key.
+
+### (Optional) Create and use a virtual environment
+
+To reduce cluttering your global Python environment on your machine, you can create a virtual environment. On your command line, enter:
+
+```
+python3 -m venv env
+source env/bin/activate
+```
+
+### **3. Install Dependencies**
+Install the required Python packages using `pip`:
+```bash
+pip install -r requirements.txt
+```
+
+### **4. Start the Server**
+Run the application with Uvicorn:
+```bash
+uvicorn realtime_over_websockets.main:app --port 5050
+```
+
+After you start the server you should see your application running in the logs:
+
+```bash
+INFO:     Started server process [64425]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:5050 (Press CTRL+C to quit)
+```
+
+### Ready to Chat? 🚀
+Now you can simply open [**localhost:5050/start-chat**](http://localhost:5050/start-chat) in your browser, and dive into an interactive conversation with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent)! 🎤✨
+
+To get started, simply speak into your microphone and ask a question. For example, you can say:
+
+**"What's the weather like in Seattle?"**
+
+This initial question will activate the agent, and it will respond, showcasing its ability to understand and interact with you in real time.
+
+## Code review
+Let’s dive in and break down how this example works—from setting up the server to handling real-time audio streaming with [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/).
+
+### **Set Up the FastAPI app**
+We use [FastAPI](https://fastapi.tiangolo.com/) to serve the chat interface and handle WebSocket connections. A key part is configuring the server to load and render HTML templates dynamically for the user interface.
+
+- **Template Loading**: Use `Jinja2Templates` to load `chat.html` from the `templates` directory. The template is dynamically rendered with variables like the server's `port`.
+- **Static Files**: Serve assets (e.g., JavaScript, CSS) from the `static` directory.
+
+```python
+app = FastAPI()
+
+
+@app.get("/", response_class=JSONResponse)
+async def index_page() -> dict[str, str]:
+    return {"message": "WebSocket Audio Stream Server is running!"}
+
+
+website_files_path = Path(__file__).parent / "website_files"
+
+app.mount(
+    "/static", StaticFiles(directory=website_files_path / "static"), name="static"
+)
+
+templates = Jinja2Templates(directory=website_files_path / "templates")
+
+
+@app.get("/start-chat/", response_class=HTMLResponse)
+async def start_chat(request: Request) -> HTMLResponse:
+    """Endpoint to return the HTML page for audio chat."""
+    port = request.url.port
+    return templates.TemplateResponse("chat.html", {"request": request, "port": port})
+```
+
+### Defining the WebSocket Endpoint
+
+The `/media-stream` WebSocket route is where real-time audio interaction is processed and streamed to the AI assistant. Let’s break it down step-by-step:
+
+1. **Accept the WebSocket Connection**
+   The WebSocket connection is established when a client connects to `/media-stream`. Using `await websocket.accept()`, we ensure the connection is live and ready for communication.
+
+```python
+@app.websocket("/media-stream")
+async def handle_media_stream(websocket: WebSocket) -> None:
+    """Handle WebSocket connections providing audio stream and OpenAI."""
+    await websocket.accept()
+```
+
+2. **Initialize Logging**
+   A logger instance (`getLogger("uvicorn.error")`) is set up to monitor and debug the server's activities, helping track events during the connection and interaction process.
+
+```python
+    logger = getLogger("uvicorn.error")
+```
+3. **Set Up the `WebSocketAudioAdapter`**
+   The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) bridges the client’s audio stream with the [**`RealtimeAgent`**](/docs/reference/agentchat/realtime_agent/realtime_agent). It streams audio data over [WebSockets](https://fastapi.tiangolo.com/advanced/websockets/) in real time, ensuring seamless communication between the browser and the agent.
+
+```python
+    audio_adapter = WebSocketAudioAdapter(websocket, logger=logger)
+```
+
+4. **Configure the Realtime Agent**
+   The `RealtimeAgent` is the AI assistant driving the interaction. Key parameters include:
+   - **Name**: The agent identity, here called `"Weather Bot"`.
+   - **System Message**: System message for the agent.
+   - **Language Model Configuration**: Defined by `realtime_llm_config` for LLM settings.
+   - **Audio Adapter**: Connects the [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) for handling audio.
+   - **Logger**: Logs the agent's activities for better observability.
+
+```python
+    realtime_agent = RealtimeAgent(
+        name="Weather Bot",
+        system_message="Hello there! I am an AI voice assistant powered by Autogen and the OpenAI Realtime API. You can ask me about weather, jokes, or anything you can imagine. Start by saying 'How can I help you'?",
+        llm_config=realtime_llm_config,
+        audio_adapter=audio_adapter,
+        logger=logger,
+    )
+```
+
+5. **Define a Custom Realtime Function**
+   The `get_weather` function is registered as a realtime callable function. When the user asks about the weather, the agent can call the function to get an accurate weather report and respond based on the provided information:
+   - Returns `"The weather is cloudy."` for `"Seattle"`.
+   - Returns `"The weather is sunny."` for other locations.
+
+```python
+    @realtime_agent.register_realtime_function(  # type: ignore [misc]
+        name="get_weather", description="Get the current weather"
+    )
+    def get_weather(location: Annotated[str, "city"]) -> str:
+        return (
+            "The weather is cloudy."
+            if location == "Seattle"
+            else "The weather is sunny."
+        )
+```
+
+6. **Run the Realtime Agent**
+   The `await realtime_agent.run()` method starts the agent, handling incoming audio streams, processing user queries, and responding in real time.
+
+Here is the full code for the `/media-stream` endpoint:
+
+```python
+@app.websocket("/media-stream")
+async def handle_media_stream(websocket: WebSocket) -> None:
+    """Handle WebSocket connections providing audio stream and OpenAI."""
+    await websocket.accept()
+
+    logger = getLogger("uvicorn.error")
+
+    audio_adapter = WebSocketAudioAdapter(websocket, logger=logger)
+
+    realtime_agent = RealtimeAgent(
+        name="Weather Bot",
+        system_message="Hello there! I am an AI voice assistant powered by Autogen and the OpenAI Realtime API. You can ask me about weather, jokes, or anything you can imagine. Start by saying 'How can I help you'?",
+        llm_config=realtime_llm_config,
+        audio_adapter=audio_adapter,
+        logger=logger,
+    )
+
+    @realtime_agent.register_realtime_function(  # type: ignore [misc]
+        name="get_weather", description="Get the current weather"
+    )
+    def get_weather(location: Annotated[str, "city"]) -> str:
+        return (
+            "The weather is cloudy."
+            if location == "Seattle"
+            else "The weather is sunny."
+        )
+
+    await realtime_agent.run()
+```
+
+## **Benefits in Action**
+- **Quick Prototyping**: Spin up a real-time voice application in minutes.
+- **Cost Efficiency**: Eliminate third-party telephony costs.
+- **User-Friendly**: Runs in the browser, making it accessible to anyone with a microphone.
+
+## **Conclusion**
+The [**`WebSocketAudioAdapter`**](/docs/reference/agentchat/realtime_agent/websocket_audio_adapter#websocketaudioadapter) marks a shift toward simpler, more accessible real-time audio solutions. It empowers developers to build and deploy voice applications faster and more efficiently. Whether you're creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming.
+
+Try it out and bring your voice-enabled ideas to life!
diff --git a/website/mint.json b/website/mint.json
@@ -507,6 +507,7 @@
         {
           "group": "Recent posts",
           "pages": [
+            "blog/2025-01-08-RealtimeAgent-over-websocket/index",
             "blog/2024-12-20-RealtimeAgent/index",
             "blog/2024-12-20-Tools-interoperability/index",
             "blog/2024-12-20-Reasoning-Update/index",