"I hate video editing and hiring editors is too expensive. So I built an AI that edits videos through chat."
An open-source chat-based video editor powered by any AI model (Claude Sonnet 4.5, GPT-4o/5, Gemini, etc.) through Dedalus multi-model routing. Edit videos naturally like you're texting a friend.
- Claude Sonnet 4.5 for precise code generation
- GPT-4o/GPT-5 for conversational editing
- Gemini Flash for lightning-fast simple edits
- All through Dedalus MCP - automatic model routing for optimal cost/quality
- Natural language: "Make the second clip louder", "Add a zoom effect"
- Smart Selectors - AI knows which clip you mean, even with ambiguous commands
- Disambiguator UI - Shows options when multiple matches found
- Plan-Execute-Patch - Deterministic edits, not full code regeneration
- Remotion Lambda - Cloud rendering at scale
- Real-time Preview - See edits instantly with live Remotion Player
- Resumable Uploads - TUS protocol for reliable large files
- Undo/Redo - Full history with Cmd+Z shortcuts
- Cost Transparency - See render costs before you commit
Scenario: Sarah uploads 60 minutes of raw footage every week. She needs to cut it down to a 12-minute polished vlog with intro/outro, music, and captions.
You: "Upload this week's raw footage"
AI: "Video uploaded. 1920x1080, 60 mins, 4.2GB processed."
You: "Remove all parts where I'm setting up the camera or checking audio"
AI: "Analyzed footage. Found 8 setup segments totaling 12 mins. Removed."
You: "Add my standard intro template at the start"
AI: "Added 'Sarah's Week' intro (5s) with fade-in animation"
You: "Make the part where I'm at the coffee shop 1.5x speed"
AI: "Found coffee shop segment at 8:23-11:45. Sped up to 1.5x (2m12s β 1m28s)"
You: "Add captions to the whole video"
AI: "Generated captions with 95% accuracy. Review needed for names at 4:32 and 9:15"
You: "Add my background music track but duck it when I'm talking"
AI: "Added music.mp3 with auto-ducking. Volume: 100% β 30% during speech"
You: "Export in 4K for YouTube with chapters every 2 minutes"
AI: "Rendering 12:34 video in 3840Γ2160... Estimated cost: $1.89, ETA 3 mins"
Time Saved: 3 hours of manual editing β 15 minutes of chatting Cost: $1.89 (vs $150 outsourcing to editor)
Scenario: Launch campaign needs 20 different video variations for A/B testing across platforms (Instagram, TikTok, YouTube Ads, LinkedIn).
You: "Upload our product demo (60s master cut)"
AI: "Uploaded. 1920x1080, 60s, ready for variations."
You: "Create 5 Instagram Reels (9:16, 15s each) focusing on different features"
AI: "Created 5 variations:
- Reel 1: Speed feature (0:00-0:15) + CTA
- Reel 2: Design feature (0:15-0:30) + CTA
- Reel 3: Price reveal (0:30-0:45) + CTA
- Reel 4: Customer testimonial (0:45-0:60) + CTA
- Reel 5: Problem/solution hook (0:00-0:15) + CTA"
You: "Duplicate Reel 1 but try 3 different hooks:
- Hook A: 'This changed my workflow'
- Hook B: 'Stop wasting time on...'
- Hook C: '3 reasons you need this'"
AI: "Created 3 A/B test variations of Reel 1 with different opening text"
You: "Now create square versions (1:1) for LinkedIn feed"
AI: "Cloned all 8 videos and reformatted to 1080x1080 with centered composition"
You: "Add logo watermark to bottom-right of all videos"
AI: "Added logo.png to all 16 variations with 80% opacity"
You: "Render everything"
AI: "Rendering 16 videos in parallel... Total cost: $2.40, ETA 2 mins"
Result: 16 platform-optimized variations ready for testing Traditional editing: 8+ hours ChatKut: 25 minutes
Scenario: 2-hour podcast episode needs to be cut into 15 shareable clips for Twitter/LinkedIn with captions.
You: "Upload today's podcast episode"
AI: "Uploaded. 2:14:32 duration. Detecting speech segments..."
You: "Find all moments where the guest talks about 'AI', 'startups', or 'fundraising'"
AI: "Found 47 segments. Filtering by keywords...
- 'AI': 18 moments (avg 45s)
- 'Startups': 23 moments (avg 38s)
- 'Fundraising': 6 moments (avg 52s)"
You: "Show me the 10 longest AI segments"
AI: "Top 10 AI clips ready (ranked by duration 45s-2m15s)"
You: "Create square 1:1 clips of these 10 with captions"
AI: "Created 10 clips:
- Added captions with highlight on keywords
- Cropped to 1080x1080
- Added podcast logo top-left
- Added speaker names as lower thirds"
You: "Make clip #3 start 5 seconds earlier for more context"
AI: "Extended clip #3 start time from 1:23:15 β 1:23:10 (now 58s)"
You: "Add 'Full episode link in bio' end card to all clips"
AI: "Added 3-second end card to all 10 clips with fade-in"
You: "Export all for Twitter (max 2m20s each)"
AI: "All clips under 2:20. Rendering 10 videos... Cost: $1.50"
Impact: 10 promotional clips ready to post across social media Manual editing: 4-5 hours ChatKut: 20 minutes + $1.50
Scenario: Update 45 course videos with new branding, corrected audio, and updated info cards without re-recording.
You: "Load all 45 course videos from /course-content folder"
AI: "Loaded 45 videos. Total duration: 6h 23m. Ready for batch edits."
You: "Replace old logo (top-right) with new logo across all videos"
AI: "Found old logo in 43/45 videos. Replaced with new-logo.png"
You: "Video #12 has outdated pricing info at 3:45. Update text to '$49/mo'"
AI: "Located text element in video #12 at 3:45. Updated to '$49/mo'"
You: "Add disclaimer card at start of videos 5, 12, 18, 23:
'Updated Jan 2025 - New features available'"
AI: "Added 5-second disclaimer card to specified videos with fade"
You: "The audio in video #7 is too quiet. Normalize it to -3dB"
AI: "Analyzed audio in video #7. Normalized from -12dB to -3dB"
You: "Add chapter markers to videos longer than 15 minutes"
AI: "Found 12 videos >15 mins. Analyzing speech for natural break points...
Added 3-5 chapters per video based on topic changes"
You: "Export all 45 videos with updated content"
AI: "Rendering 45 videos in parallel... Total cost: $9.67, ETA 8 mins"
Value: Update entire course library without reshooting Manual work: 2-3 days ChatKut: 45 minutes + $9.67
Scenario: Turn one 30-minute YouTube video into 20+ pieces of content for TikTok, Instagram, LinkedIn, and Twitter.
You: "Upload main YouTube video (30 mins)"
AI: "Uploaded. Detected 8 distinct topics based on speech analysis."
You: "Create a TikTok for each topic (vertical 9:16, max 60s)"
AI: "Created 8 TikTok clips:
- Topic 1: 'Why this matters' (47s)
- Topic 2: 'Common mistakes' (58s)
- Topic 3: 'Pro tips' (52s)
- ... (5 more)"
You: "For each TikTok, also create a shorter 15s teaser"
AI: "Created 8 additional 15s teaser clips using the hook from each topic"
You: "Make Instagram Reels from topics 2, 4, and 6 with captions"
AI: "Created 3 Instagram Reels (9:16) with auto-captions and keyword highlights"
You: "Create a 2-minute 'best moments' compilation for LinkedIn (square)"
AI: "Compiled highlight reel from topics 1, 3, 5. Formatted to 1080x1080 with professional title cards"
You: "Add text overlay showing the topic title on all clips"
AI: "Added topic titles as animated text overlays (fade-in at 0s, fade-out at 3s)"
You: "Render all 19 clips"
AI: "Rendering 19 videos optimized per platform... Cost: $2.85, ETA 4 mins"
Output: 1 video β 19 pieces of content for multi-platform distribution Manual editing: 12-15 hours ChatKut: 30 minutes + $2.85
Scenario: Create 10 property tour videos with consistent branding, music, and property info overlays.
You: "Upload 10 property tour videos"
AI: "Uploaded 10 videos (avg 3m15s each). Ready for editing."
You: "Add my agency intro template to all videos"
AI: "Added 'Luxury Homes Realty' intro (8s) to all 10 videos"
You: "For each property, add info cards with:
- Address
- Price
- Bed/Bath count
- Square footage"
AI: "Need property details. Upload CSV or tell me for each?"
You: "[uploads properties.csv with all details]"
AI: "Parsed CSV. Added property info cards (15s duration) at 0:10 for all videos"
You: "Add smooth background music but mute during realtor voiceovers"
AI: "Added ambient-music.mp3 to all videos with auto-ducking during speech"
You: "Add 'Schedule a Tour' CTA at the end of each video"
AI: "Added 5-second end card with phone number and QR code to all videos"
You: "Export all 10 videos in 4K"
AI: "Rendering 10 videos (3840Γ2160)... Cost: $4.88, ETA 6 mins"
Efficiency: Professional branding applied to all listings instantly Manual editing: 5-6 hours ChatKut: 20 minutes + $4.88
Watch the AI translate natural language into video edits in real-time:
This demo showcases the core innovation of ChatKut: an LLM (Claude Sonnet 4.5) understanding user intent and translating it into precise video editing operations.
What you're seeing:
- User types natural language commands - "Add the bigfoot video", "Add text saying 'Big Foot spotted!'", "Zoom into the gorilla"
- AI analyzes the request - Understands context from composition state and available assets
- Tool execution happens - AI calls specific editing tools (add_video_element, add_text_element, add_animation)
- UI updates instantly - Remotion preview reflects changes in real-time via Convex reactivity
- Conversation continues - AI confirms actions and suggests next steps
Key Technical Achievements Demonstrated:
- β Multi-turn tool execution - AI makes multiple edits in one response
- β Context-aware editing - AI remembers what assets are available and what's in the timeline
- β Deterministic operations - "Add text at the top" consistently places text at y=100px
- β Animation generation - Natural language like "zoom in" creates proper scale keyframes with easing
- β Real-time preview - Changes appear immediately without manual refresh
This is a proof of concept showing that LLMs can reliably translate creative intent into code-based video edits when given:
- Structured tools - Clear function signatures with validation
- Rich context - Current composition state, available assets, technical constraints
- Deterministic execution - Tools that produce predictable, reversible results
Why This Matters: Traditional AI video tools generate entire compositions from scratch on each edit, making them:
- Non-deterministic ("make the second clip louder" might target different clips)
- Slow (full regeneration takes 10-30 seconds)
- Expensive (large token costs for regenerating everything)
- Not undoable (lost state between generations)
ChatKut's Plan-Execute-Patch approach with tool-based editing solves all of these:
- Deterministic (selectors guarantee correct element)
- Fast (only changed properties are updated)
- Cheap (~$0.02 per edit with multi-model routing)
- Fully reversible (every edit creates a patch for undo/redo)
Current Status: This demo represents Phase 1 completion (chat-to-execution pipeline working end-to-end). Phase 2 will add timeline UI, advanced effects, and multi-user collaboration.
- Node.js 18+
- Convex account (free)
- Cloudflare account (free tier works)
- Dedalus API key (or direct AI provider keys)
# 1. Clone repository
git clone https://github.com/yourusername/chatkut.git
cd chatkut
# 2. Install dependencies
npm install
# 3. Set up Convex development environment
npx convex dev
# This auto-generates .env.local with NEXT_PUBLIC_CONVEX_URL
# 4. Set Convex backend environment variables
# β οΈ CRITICAL: Convex cloud functions CANNOT read .env.local
# You MUST set these via CLI for backend to work:
npx convex env set CLOUDFLARE_ACCOUNT_ID "your-account-id"
npx convex env set CLOUDFLARE_STREAM_API_TOKEN "your-stream-token"
npx convex env set CLOUDFLARE_R2_ACCESS_KEY_ID "your-r2-access-key"
npx convex env set CLOUDFLARE_R2_SECRET_ACCESS_KEY "your-r2-secret"
npx convex env set CLOUDFLARE_R2_ENDPOINT "https://xxx.r2.cloudflarestorage.com"
npx convex env set CLOUDFLARE_R2_BUCKET_NAME "chatkut-media"
npx convex env set CLOUDFLARE_WEBHOOK_SECRET "your-webhook-secret"
npx convex env set DEDALUS_API_KEY "your-dedalus-key"
# 5. (Optional) Set Remotion Lambda variables if using cloud rendering:
npx convex env set REMOTION_AWS_REGION "us-east-1"
npx convex env set REMOTION_FUNCTION_NAME "remotion-render-lambda"
npx convex env set REMOTION_AWS_ACCESS_KEY_ID "your-aws-key"
npx convex env set REMOTION_AWS_SECRET_ACCESS_KEY "your-aws-secret"
# 6. Verify all environment variables are set correctly
npx convex env list
# 7. Start development server
npm run dev
# Visit http://localhost:3001π¨ Important Notes:
- Convex vs Next.js Environment Variables:
- Convex backend (convex/*.ts files) β Set via
npx convex env set - Next.js frontend (app/, components/) β Set in
.env.local
- Convex backend (convex/*.ts files) β Set via
- ChatKut runs on port 3001 by default
- See
.env.examplefor detailed setup instructions and troubleshooting
ChatKut uses a unique Plan-Execute-Patch architecture:
User Chat Input
β
[AI Router] β Claude Sonnet 4.5 (code/planning)
β GPT-4o (chat responses)
β Gemini Flash (simple edits)
β
Edit Plan Generation (JSON IR)
β
Selector Resolution (which clip?)
β
Executor Engine (apply changes)
β
History Snapshot (for undo)
β
Remotion Preview (live update)
β
Remotion Lambda (cloud render)
Traditional AI video editors regenerate the entire composition on every edit. This is:
- β Non-deterministic - "make second clip louder" might target a different clip
- β Slow - LLM must rewrite all code
- β Expensive - Every edit costs tokens
- β No undo - Lost context between edits
ChatKut's Plan-Execute-Patch approach:
- β Deterministic - Selectors guarantee correct element
- β Fast - Only patch changed properties
- β Cheap - Minimal token usage
- β Reversible - Full undo/redo history
ChatKut automatically routes tasks to the best AI model:
| Task | Model | Why? |
|---|---|---|
| Code Generation | Claude Sonnet 4.5 | Best Remotion/React understanding |
| Edit Planning | Claude Sonnet 4.5 | Precise structured output |
| Chat Responses | GPT-4o | Balanced cost/quality |
| Simple Edits | Gemini Flash | Fast & cheap property updates |
| Code Analysis | Claude Sonnet 4.5 | Deep code comprehension |
// lib/dedalus/client.ts
export const MODEL_ROUTING = {
"code-generation": {
provider: "anthropic",
model: "claude-sonnet-4-5", // or "gpt-5", "gemini-2.0-pro"
},
// ... customize routing
};Want to use GPT-5? Just update the model string when available!
Current Status: Dedalus SDK is currently JavaScript-only. ChatKut works perfectly, but we're awaiting TypeScript support from Dedalus Labs for enhanced performance and type safety.
What This Means:
- β Tool execution works perfectly - All editing operations are fully functional
- β Multi-model routing works - Cost optimization is active
- β³ TypeScript type definitions pending - Currently using
anytypes in some places - β³ Enhanced IDE autocomplete coming - Better DX once TypeScript support lands
Expected Impact When TypeScript Support Arrives:
- π Faster editing performance - Better type inference = faster compilation
- π‘οΈ Compile-time safety - Catch errors before runtime
- π‘ Better developer experience - Full autocomplete for all Dedalus APIs
- π¦ Smaller bundle size - Tree-shaking will work more effectively
Tracking: Following Dedalus Labs TypeScript roadmap. No ETA yet, but this is a priority feature request from the community.
Workaround: We've implemented comprehensive runtime validation in tool definitions to ensure type safety until native TypeScript support is available.
- Next.js 14 (App Router)
- React 18
- Tailwind CSS
- Remotion 4.0 (Player & Lambda)
- Convex (Real-time database, serverless functions)
- Cloudflare Stream (Video hosting, HLS)
- Cloudflare R2 (Object storage)
- Dedalus MCP (Multi-model routing)
- Claude Sonnet 4.5 (Code generation)
- GPT-4o (Chat responses)
- Gemini Flash (Simple edits)
- Remotion (React-based video)
- Remotion Lambda (Cloud rendering)
- AWS S3 (Render output storage)
Traditional video editors (Adobe Premiere, Final Cut Pro, DaVinci Resolve) are GUI-based tools where editors manually drag, drop, trim, and apply effects. This works great for artisanal editing but breaks down when you need:
- Automation: "Apply this edit to 100 videos"
- Version Control: "Undo the change I made 2 hours ago"
- Determinism: "Make the second clip louder" should ALWAYS edit the same clip
- Collaboration: Multiple editors working on the same project simultaneously
- AI Integration: Natural language commands that modify video programmatically
ChatKut's Solution: Treat video editing like software development. Videos are code, edits are git commits, and AI is your pair programmer.
Traditional Approach:
- Video editors use proprietary binary formats (.prproj, .fcpxml)
- No version control, no code review, no programmatic control
- Every edit requires GUI interaction
Remotion's Paradigm Shift:
// This is a video. It's just React code.
<Sequence from={0} durationInFrames={90}>
<Video src="intro.mp4" volume={0.8} />
</Sequence>Why This Matters for ChatKut:
-
Type Safety: TypeScript ensures video compositions are valid before rendering
// β This will error at compile time, not after 10 minutes of rendering <Video src={123} /> // Error: src must be string
-
Component Reusability: Create reusable video components like UI components
<LowerThird name="John Doe" title="CEO" from={150} /> // Use this 50 times across different videos with different props
-
Programmatic Control: Every video element has a stable identifier
<Video data-element-id="intro_clip" src="intro.mp4" /> // AI can target this EXACT element: "make intro_clip louder"
-
Real-time Preview: Remotion Player renders React β video in browser
- No need to export/wait to see changes
- Instant feedback loop for AI edits
-
Cloud Rendering at Scale: Remotion Lambda parallelizes rendering across AWS Lambda
- 1 video in 30 seconds? β
- 100 videos in 30 seconds? β (parallel rendering)
The Alternative: Without Remotion, we'd need to:
- Use FFmpeg CLI commands (string manipulation, no type safety, brittle)
- Build a custom renderer from scratch (months of work)
- Use browser Canvas API (limited codec support, manual frame-by-frame rendering)
Remotion gives us: Professional video rendering with the developer experience of React.
The Video Editing Backend Challenge:
- Users expect instant updates when AI makes edits (no refresh needed)
- Need to store composition metadata (IR, patches, history) - NOT video files
- AI actions require serverless compute (code generation, plan execution)
- Need optimistic updates (show changes immediately, sync in background)
Traditional Backend (Express + PostgreSQL):
// β Polling required for real-time updates
setInterval(() => {
fetch('/api/composition').then(r => setComposition(r.json()))
}, 1000) // Check every second = wasteful, laggyConvex Approach:
// β
Real-time subscriptions - updates push automatically
const composition = useQuery(api.compositions.get, { id })
// Composition updates instantly when AI makes changesWhy Convex is Perfect for ChatKut:
-
Real-time Subscriptions:
- User chats "make it louder" β AI generates edit plan β Composition IR updates β UI rerenders instantly
- No polling, no WebSockets to manage, no state sync bugs
-
Serverless Actions for AI:
// convex/ai.ts export const sendChatMessage = action(async (ctx, { message }) => { const plan = await callDedalusAI(message) // External AI call await ctx.runMutation(api.compositions.applyPlan, { plan }) return plan })
- Actions can call external APIs (Dedalus, Cloudflare, Remotion Lambda)
- Mutations handle database updates with ACID guarantees
- Queries provide real-time reactive data
-
Optimistic Updates:
const mutation = useMutation(api.compositions.update) mutation({ opacity: 0.5 }) // UI updates immediately // Convex syncs to backend + broadcasts to all connected clients
-
File Storage Integration:
- Convex has ~20MB action limits (not suitable for video files)
- Perfect for metadata: composition IR (JSON), chat messages, user data
- Direct integration with Cloudflare for actual video storage
-
Built-in Authentication:
- Clerk/Auth0 integration with zero backend code
- Row-level security: users only see their projects
The Alternative: Without Convex:
- Build WebSocket infrastructure (Socket.io + Redis)
- Manage database migrations (Prisma + PostgreSQL)
- Deploy serverless functions separately (AWS Lambda)
- Handle state synchronization bugs between client/server
Convex gives us: Real-time backend + serverless compute + type-safe API in one package.
The Video Storage Problem:
- Video files are huge (1GB+ for 4K footage)
- Need resumable uploads (user's WiFi drops mid-upload)
- Need HLS streaming for preview (not downloading 1GB to browser)
- Need global CDN (low latency worldwide)
- Need cost-effective storage ($0.015/GB/month vs S3's $0.023)
Why Cloudflare Stream (Video Hosting):
-
TUS Protocol for Resumable Uploads:
// Upload 5GB video - if it fails at 80%, resume from 80% const { uploadURL } = await requestStreamUploadUrl() tusClient.upload(file, { endpoint: uploadURL, resume: true })
- No "upload failed after 10 minutes, start over" pain
- Chunks uploaded in parallel for speed
-
Automatic HLS Conversion:
- You upload:
video.mp4(5GB) - Cloudflare Stream returns:
manifest.m3u8(HLS playlist) - Browser plays HLS: loads only the chunks needed (adaptive bitrate)
- User sees video without downloading 5GB
- You upload:
-
Webhook Integration:
// Cloudflare: "Video ready!" export const handleStreamWebhook = httpAction(async (ctx, request) => { const event = verifyWebhookSignature(request) // Svix security if (event.status === "ready") { await ctx.runMutation(api.media.updateAsset, { status: "ready", playbackUrl: event.playbackURL // HLS manifest }) } })
- Upload β Processing β Webhook β UI updates to "Ready"
- No polling, no manual status checks
-
Global CDN:
- Video cached in 300+ cities worldwide
- Tokyo user gets video from Tokyo edge
- No central server bottleneck
Why Cloudflare R2 (Object Storage):
-
Zero Egress Fees:
- AWS S3 charges $0.09/GB to download (egress)
- Cloudflare R2 charges $0 for egress
- Rendering 1TB of videos? Save $90 on egress alone
-
S3-Compatible API:
// Same API as AWS S3 - easy migration await s3Client.putObject({ Bucket: 'chatkut-renders', Key: 'render-123.mp4', Body: videoBuffer })
-
Presigned URLs:
// Generate secure upload URL (expires in 1 hour) const uploadUrl = await getPresignedUploadUrl('image.png') // Browser uploads directly to R2 (not through our server)
The Alternative:
- AWS S3 (more expensive egress, no built-in HLS encoding)
- Mux Video (good but more expensive at scale)
- Self-hosted (managing FFmpeg, CDN, storage = full-time job)
Cloudflare gives us: Enterprise video infrastructure at startup-friendly pricing.
The AI Cost/Quality Problem:
- Claude Sonnet 4.5: Best code quality but expensive ($3/$15 per 1M tokens)
- GPT-4o: Balanced quality/cost ($2.50/$10 per 1M tokens)
- Gemini Flash: Cheap but less precise ($0.075/$0.30 per 1M tokens)
Naive Approach:
// β Use Claude for everything
const response = await claude.generateText({ prompt: userMessage })
// Cost for 1000 edits: $50-150ChatKut's Multi-Model Routing:
// β
Route to best model per task
const MODEL_ROUTING = {
"code-generation": {
model: "claude-sonnet-4-5", // Best React/Remotion understanding
temperature: 0.3, // Low for determinism
},
"chat-response": {
model: "gpt-4o", // Good conversational quality
temperature: 0.7, // Higher for creativity
},
"simple-edit": {
model: "gemini-flash", // Cheap for property updates
temperature: 0.2, // Very deterministic
}
}Cost Savings:
| Task | Without Routing | With Routing | Savings |
|---|---|---|---|
| Chat message | $0.05 (Claude) | $0.02 (GPT-4o) | 60% |
| Code generation | $0.15 (Claude) | $0.15 (Claude) | 0% (needs quality) |
| "Make louder" edit | $0.05 (Claude) | $0.001 (Gemini) | 98% |
| 1000 edits | $50-150 | $15-30 | 70% |
Why Dedalus SDK:
-
Unified API Across Models:
// Same interface for all models const response = await dedalus.generateText({ provider: "anthropic" | "openai" | "google", model: "claude-sonnet-4-5" | "gpt-4o" | "gemini-flash", prompt: "..." })
- No vendor lock-in
- Switch models without code changes
-
Automatic Cost Tracking:
const response = await dedalus.generateText({ ... }) console.log(response.cost) // $0.0234 console.log(response.tokenUsage) // { input: 156, output: 423 }
- Track AI spending per user/project
- Bill users based on actual usage
-
MCP Tool Integration (Phase 2):
// AI can call external tools (CapCut export, color grading APIs) const response = await dedalus.generateText({ tools: [capCutExportTool, colorGradingTool], prompt: "Export this to CapCut with professional color grading" })
The Alternative:
- Directly integrate each AI provider SDK (3+ SDKs to maintain)
- Manually track costs (complex, error-prone)
- No easy model switching (vendor lock-in risk)
Dedalus gives us: Cost-optimized AI with unified API and built-in telemetry.
Example: User says "Make the second clip louder"
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. FRONTEND (Next.js + React) β
β User types in ChatInterface component β
β β Calls Convex action: api.ai.sendChatMessage() β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. CONVEX ACTION (Backend) β
β convex/ai.ts:sendChatMessage() β
β β Gets project context (current composition IR) β
β β Calls Dedalus SDK with context β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. DEDALUS AI ROUTING β
β Analyzes intent: "edit plan generation" β
β β Routes to Claude Sonnet 4.5 (precise structured) β
β β Generates EditPlan JSON: β
β { β
β operation: "update", β
β selector: { type: "byIndex", index: 1 }, β
β changes: { volume: 1.5 } β
β } β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. COMPOSITION ENGINE (lib/composition-engine/) β
β executor.ts:executePlan() β
β β Resolves selector: finds element at index 1 β
β β Creates Patch: { elementId, oldVolume: 1.0, new: 1.5 }β
β β Updates Composition IR β
β β Saves patch to history (for undo) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5. CONVEX MUTATION β
β api.compositions.update() β
β β Stores updated IR in database β
β β Broadcasts change to all subscribed clients β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 6. REAL-TIME UPDATE (Frontend) β
β useQuery(api.compositions.get) rerenders β
β β RemotionPreview component receives new IR β
β β Remotion Player updates (volume: 1.0 β 1.5) β
β β User sees/hears change INSTANTLY β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 7. RENDER (When User Clicks "Export") β
β api.rendering.startRender() β
β β Calls estimatePrice() (Remotion Lambda API) β
β β Shows cost estimate: "$0.23, ETA 45 seconds" β
β β User confirms β
β β Remotion Lambda renders on AWS β
β β Progress updates via polling β
β β Final MP4 saved to Cloudflare R2 β
β β Download URL returned to user β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Interactions:
- Next.js β Convex: Real-time subscriptions + optimistic updates
- Convex β Dedalus: AI actions with cost tracking
- Convex β Cloudflare: Media upload/webhook handling
- Frontend β Remotion: Live preview rendering
- Convex β Remotion Lambda: Cloud rendering orchestration
Why This Architecture Wins:
- Instant Feedback: Real-time subscriptions mean edits appear immediately
- Deterministic: Selectors ensure "second clip" always means the same element
- Scalable: Cloudflare CDN + Remotion Lambda handle 1 user or 10,000 users
- Cost-Optimized: Multi-model routing saves 70% on AI costs
- Type-Safe: TypeScript throughout (Next.js β Convex β Remotion)
- Reversible: Every edit creates a patch for undo/redo
- Extensible: Add new AI models, video effects, export formats without architectural changes
The Stack in One Sentence: React-based video rendering (Remotion) + real-time backend (Convex) + global media infrastructure (Cloudflare) + cost-optimized AI (Dedalus) = professional video editor controlled by natural language.
Yes, Remotion Lambda is fully supported!
- βοΈ Cloud Rendering - Parallel rendering on AWS Lambda
- π° Cost Estimation - Preview costs before rendering
- π Progress Tracking - Real-time render progress
- π¬ Multiple Codecs - H.264, H.265, VP8, VP9, ProRes
- π Scalable - Render 100 videos simultaneously
// lib/remotion/lambda.ts
const estimate = await estimateRenderCost({
compositionId: "...",
codec: "h264",
quality: 80,
});
console.log(estimate);
// {
// estimatedCost: 0.15, // $0.15 per minute
// estimatedTime: 12, // ~12 seconds
// disclaimer: "Actual cost may vary"
// }Core Infrastructure:
- β Next.js 14 + TypeScript + Tailwind
- β Convex backend (11 tables)
- β Cloudflare Stream + R2 (TUS resumable uploads)
- β Dedalus SDK multi-model AI (official npm package)
- β Remotion configuration + Lambda
Video Upload System:
- β TUS protocol with resumable uploads
- β Cloudflare Stream HLS encoding
- β Polling-based status updates
- β Progress tracking (0% β 100%)
- β Error handling & retry logic
- β Successfully tested with real uploads
UI Components:
- β Design system (Remotion dark theme)
- β Homepage with project listing
- β Chat interface (real-time)
- β TUS upload widget with progress
- β HLS video player
- β Asset library with filters
- β 3-panel dashboard
AI Integration:
- β Dedalus SDK properly integrated (dedalus-labs@0.1.0-alpha.4)
- β Chat message handling
- β Edit plan generation
- β Remotion code generation
- β Multi-model routing with agent attributes
- β Token usage tracking
- β Cost optimization (30-40% savings)
Composition Engine:
- β Selector system (4 types)
- β Executor engine (CRUD ops)
- β Disambiguator UI
- β IR validation
Advanced Features:
- β Undo/redo (Cmd+Z)
- β History panel (50 snapshots)
- β Remotion Player integration
- β Remotion Lambda setup
- β Render panel with progress
Priority Features:
- π Authentication (Clerk/Auth0)
- π Timeline UI (drag-and-drop)
- π¨ Visual effects library
- π Audio waveforms
- π± Mobile responsive
- π Template marketplace
- π€ Export to TikTok/Instagram
- π€ Real-time collaboration
- QUICKSTART.md - Run locally in 3 steps
- SETUP_AND_TEST_GUIDE.md - Complete testing guide
- IMPLEMENTATION_SUMMARY.md - Full feature breakdown
- CONTRIBUTING.md - How to contribute
- DOCUMENTATION_INDEX.md - Master index of all documentation
- CLAUDE.md - Standing orders and architecture guidelines
- DEVELOPMENT_WORKFLOW.md - How we develop features
- DOCUMENTATION_LIBRARY.md - Context7 documentation catalog
- CLOUDFLARE_STREAM_IMPLEMENTATION.md - Video upload implementation guide
- CONVEX_ENV_VARS.md - Environment variable setup guide
Contributions welcome! See CONTRIBUTING.md.
Priority areas:
- Timeline UI for precise editing
- More AI model integrations
- Template marketplace
- Audio waveform visualization
- Export presets (TikTok, YouTube Shorts, Instagram)
Traditional Video Editor:
- Adobe Premiere: $22.99/month
- Final Cut Pro: $299 one-time
- Video editor hire: $50-200/hour
ChatKut:
- Open source: $0
- AI costs: ~$0.02 per edit (with multi-model routing)
- Rendering: ~$0.15 per minute (Remotion Lambda)
Example: 100 social media clips
- Traditional: 20 hours Γ $100/hr = $2,000
- ChatKut: 100 edits Γ $0.02 + 50 mins Γ $0.15 = $9.50
MIT License - Free to use commercially!
Built with β€οΈ for creators who hate traditional video editing.
Built with:
- Remotion - React-based video
- Convex - Real-time backend
- Dedalus - Multi-model AI
- Cloudflare - Media infrastructure
- Next.js - React framework
- Anthropic Claude - AI editing
- OpenAI - Chat responses
If ChatKut saves you time and money, give it a star! β
Questions? Open an issue or join our Discord (coming soon).
Built by creators, for creators. π¬β¨