Skip to content

Commit 029852f

Browse files
committed
Improve readme
1 parent 93cf015 commit 029852f

File tree

1 file changed

+100
-9
lines changed

1 file changed

+100
-9
lines changed

plugins/baseten/README.md

Lines changed: 100 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,117 @@
1-
# Baseten Plugin for Vision Agents
1+
# Qwen3-VL hosted on Baseten
2+
Qwen3-VL is the latest open-source Video Language Model (VLM) from Alibaba. This plugin allows developers to easily run the model hosted on [Baseten](https://www.baseten.co/) with Vision Agents. The model accepts text and video and responds with text vocalised with the TTS service of your choice.
23

3-
LLM integrations for the models hosted on Baseten for Vision Agents framework.
4+
## Features
45

5-
TODO
6+
- **Video understanding**: Automatically buffers and forwards video frames to Baseten-hosted VLM models
7+
- **Streaming responses**: Supports streaming text responses with real-time chunk events
8+
- **Frame buffering**: Configurable frame rate and buffer duration for optimal performance
9+
- **Event-driven**: Emits LLM events (chunks, completion, errors) for integration with other components
610

711
## Installation
812

913
```bash
10-
pip install vision-agents-plugins-baseten
14+
uv add vision-agents[baseten]
1115
```
1216

13-
## Usage
17+
## Quick Start
1418

1519
```python
20+
from vision_agents.core import Agent, User
21+
from vision_agents.plugins import baseten, getstream, deepgram, elevenlabs, vogent
1622

23+
async def create_agent(**kwargs) -> Agent:
24+
# Initialize the Baseten VLM
25+
llm = baseten.VLM(model="qwen3vl")
26+
27+
# Create an agent with video understanding capabilities
28+
agent = Agent(
29+
edge=getstream.Edge(),
30+
agent_user=User(name="Video Assistant", id="agent"),
31+
instructions="You're a helpful video AI assistant. Analyze the video frames and respond to user questions about what you see.",
32+
llm=llm,
33+
stt=deepgram.STT(),
34+
tts=elevenlabs.TTS(),
35+
turn_detection=vogent.TurnDetection(),
36+
processors=[],
37+
)
38+
return agent
39+
40+
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
41+
await agent.create_user()
42+
call = await agent.create_call(call_type, call_id)
43+
44+
with await agent.join(call):
45+
# The agent will automatically process video frames and respond to user input
46+
await agent.finish()
1747
```
1848

49+
## Configuration
50+
51+
### Environment Variables
52+
53+
- **`BASETEN_API_KEY`**: Your Baseten API key (required)
54+
- **`BASETEN_BASE_URL`**: The base URL for your Baseten API endpoint (required)
55+
56+
### Initialization Parameters
57+
58+
```python
59+
baseten.VLM(
60+
model: str, # Baseten model name (e.g., "qwen3vl")
61+
api_key: Optional[str] = None, # API key (defaults to BASETEN_API_KEY env var)
62+
base_url: Optional[str] = None, # Base URL (defaults to BASETEN_BASE_URL env var)
63+
fps: int = 1, # Frames per second to process (default: 1)
64+
frame_buffer_seconds: int = 10, # Seconds of video to buffer (default: 10)
65+
client: Optional[AsyncOpenAI] = None, # Custom OpenAI client (optional)
66+
)
67+
```
68+
69+
### Parameters
70+
71+
- **`model`**: The name of the Baseten-hosted model to use. Must be a vision-capable model.
72+
- **`api_key`**: Your Baseten API key. If not provided, reads from `BASETEN_API_KEY` environment variable.
73+
- **`base_url`**: The base URL for Baseten API. If not provided, reads from `BASETEN_BASE_URL` environment variable.
74+
- **`fps`**: Number of video frames per second to capture and send to the model. Lower values reduce API costs but may miss fast-moving content. Default is 1 fps.
75+
- **`frame_buffer_seconds`**: How many seconds of video to buffer. Total buffer size = `fps * frame_buffer_seconds`. Default is 10 seconds.
76+
- **`client`**: Optional pre-configured `AsyncOpenAI` client. If provided, `api_key` and `base_url` are ignored.
77+
78+
## How It Works
79+
80+
1. **Video Frame Buffering**: The plugin automatically subscribes to video tracks when the agent joins a call. It buffers frames at the specified FPS for the configured duration.
81+
82+
2. **Frame Processing**: When responding to user input, the plugin:
83+
- Converts buffered video frames to JPEG format
84+
- Resizes frames to 800x600 (maintaining aspect ratio)
85+
- Encodes frames as base64 data URLs
86+
87+
3. **API Request**: Sends the conversation history (including system instructions) along with all buffered frames to the Baseten model.
88+
89+
4. **Streaming Response**: Processes the streaming response and emits events for each chunk and completion.
90+
91+
## Events
92+
93+
The plugin emits the following events:
94+
95+
- **`LLMResponseChunkEvent`**: Emitted for each text chunk in the streaming response
96+
- **`LLMResponseCompletedEvent`**: Emitted when the response stream completes
97+
- **`LLMErrorEvent`**: Emitted if an API request fails
1998

2099
## Requirements
100+
21101
- Python 3.10+
22-
- `openai`
23-
- GetStream SDK
102+
- `openai>=2.5.0`
103+
- `vision-agents` (core framework)
104+
- Baseten API key and base URL
105+
106+
## Notes
107+
108+
- **Frame Rate**: The default FPS of 1 is optimized for VLM use cases. Higher FPS values will increase API costs and latency.
109+
- **Frame Size**: Frames are automatically resized to 800x600 pixels while maintaining aspect ratio to optimize API payload size.
110+
- **Buffer Duration**: The 10-second default buffer provides context for the model while keeping memory usage reasonable.
111+
- **Tool Calling**: Tool/function calling support is not yet implemented (see TODOs in code).
112+
113+
## Troubleshooting
24114

25-
## License
26-
MIT
115+
- **No video processing**: Ensure the agent has joined a call with video tracks available. The plugin automatically subscribes to video when tracks are added.
116+
- **API errors**: Verify your `BASETEN_API_KEY` and `BASETEN_BASE_URL` are set correctly and the model name is valid.
117+
- **High latency**: Consider reducing `fps` or `frame_buffer_seconds` to decrease the number of frames sent per request.

0 commit comments

Comments
 (0)