Skip to content

Commit a9397b3

Browse files
Merge pull request #82 from GetStream/tschellenbach-patch-1
Revise README for improved clarity and updates
2 parents c88734f + 2d80a2a commit a9397b3

File tree

1 file changed

+23
-12
lines changed

1 file changed

+23
-12
lines changed

README.md

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Build Vision Agents quickly with any model or video provider.
1212
- **Video AI**: Built for real-time video AI. Combine Yolo, Roboflow and others with gemini/openai realtime
1313
- **Low Latency**: Join quickly (500ms) and low audio/video latency (30ms)
1414
- **Open**: Built by Stream, but use any video edge network that you like
15-
- **Native APIs**: Native SDK methods from OpenAI (create response), Gemini (generate) and Claude (create message). So you're never behind on the latest features
15+
- **Native APIs**: Native SDK methods from OpenAI (create response), Gemini (generate) and Claude (create message). So you can always use the latest LLM capabilities.
1616
- **SDKs**: SDKs for React, Android, iOS, Flutter, React, React Native and Unity.
1717

1818
Created by Stream, uses [Stream's edge network](https://getstream.io/video/) for ultra-low latency.
@@ -39,7 +39,7 @@ agent = Agent(
3939
)
4040
```
4141

42-
### Cluely style Invisible Assistant
42+
### Cluely style Invisible Assistant (coming soon)
4343

4444
Apps like Cluely offer realtime coaching via an invisible overlay. This example shows you how you can build your own invisible assistant.
4545
It combines Gemini realtime (to watch your screen and audio), and doesn't broadcast audio (only text). This approach
@@ -59,13 +59,18 @@ agent = Agent(
5959

6060
## Processors
6161

62-
Processors make it easy to combine the video & LLM with additional state. Here are some built-in examples
62+
Processors enable you to provide state and receive/publish video & audio.
63+
Many video AI use case require you to do things like
6364

64-
* YoloPose
65-
* ImageCapture
66-
* BufferedVideoCapture
65+
* Run a smaller AI model next to the LLM (like Yolo or roboflow)
66+
* Make API calls to maintain relevant info/game state
67+
* Modify audio/video, for instance avatars
68+
* Capture audio/video
69+
70+
This is all handled by processors.
6771

6872
## Docs
73+
6974
To get started with Vision Agents, check out our getting started guide at [VisionAgents.ai](https://visionagents.ai).
7075

7176
- Quickstart: [Building a Voice AI app](https://visionagents.ai/introduction/voice-agents)
@@ -95,8 +100,8 @@ Our favorite people & projects to follow for vision AI
95100
## Inspiration
96101

97102
- Livekit Agents: Great syntax, Livekit only
98-
- Pipecat: Flexible, but more verbose. Open, we will add support for Stream
99-
- OpenAI Agents: Focused on openAI only, but we will try to add support
103+
- Pipecat: Flexible, but more verbose.
104+
- OpenAI Agents: Focused on openAI only
100105

101106
## Open Platform
102107
Reach out to nash@getstream.io, and we'll collaborate on getting you added
@@ -119,8 +124,14 @@ We'd like to add support for and are reaching out to:
119124
- Support for MCP and function calling for Gemini and OpenAI
120125
- Support for realtime WebRTC video and voice with GPT Realtime
121126

122-
**0.2 - Next release**
123-
- The Python WebRTC lib we use has some problems. This can cause sudden spikes and latency issues. We'll be pushing fixes for the project
124-
- Hosting guidelines
127+
**Coming Soon**
128+
- The Python WebRTC lib we use has some limitations. Investigating this.
129+
- Hosting & production deploy example
130+
- More built-in Yolo processors: Object detection, person detection, etc
131+
- Roboflow support
132+
- Computer use support
133+
- AI avatar support. Tavus etc
134+
- QWen3 vision support
135+
- Buffered video capture support (enabling AI to capture video when something exciting happens)
136+
- Moondream vision
125137

126-
**Later**

0 commit comments

Comments
 (0)