Skip to content

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • added more tts providers, added stt and videogen models, fixed search modal keyboard nav

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Nov 21, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
docs Ready Ready Preview Comment Nov 21, 2025 10:58pm

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 21, 2025

Greptile Overview

Greptile Summary

This PR significantly expands Sim's multimedia capabilities by adding support for multiple TTS, STT, and video generation providers, plus fixes a keyboard navigation bug in the search modal.

Key Changes

  • TTS Providers: Added 7 text-to-speech providers (OpenAI, Deepgram, ElevenLabs, Cartesia, Google Cloud, Azure, PlayHT) with a unified proxy route at /api/proxy/tts/unified
  • STT Providers: Added AssemblyAI and Gemini speech-to-text providers with advanced features like sentiment analysis and entity detection
  • Video Generation: Added 5 video generation providers (Runway Gen-4, Google Veo, Luma Ray, MiniMax, Fal.ai) with polling-based job completion
  • Search Modal Fix: Fixed keyboard navigation by tracking visual order of grouped items instead of filtered order
  • Infrastructure: Added comprehensive type definitions, tool configurations, block UI components, and documentation

Technical Implementation

The implementation follows a consistent pattern across all new features:

  1. Type definitions in tools/{category}/types.ts with provider-specific parameters
  2. Individual tool configs in tools/{category}/{provider}.ts routing to unified proxies
  3. Unified proxy routes handling multiple providers with proper authentication, validation, and error handling
  4. Block configurations with conditional UI based on provider selection
  5. Comprehensive documentation with examples and best practices

Minor Cleanup

  • Commented out unused Supabase OAuth provider in lib/auth.ts

Confidence Score: 4/5

  • Safe to merge with minor considerations for polling timeout handling and provider API error scenarios
  • Score reflects well-structured implementation with proper error handling, validation, and type safety. The code follows established patterns and includes comprehensive documentation. Minor deduction for complexity of polling logic in video generation (10-minute timeout with no cancellation mechanism) and potential memory concerns with large file buffers
  • Pay close attention to apps/sim/app/api/proxy/video/route.ts for polling timeout behavior and memory usage with large video files

Important Files Changed

File Analysis

Filename Score Overview
apps/sim/app/workspace/[workspaceId]/w/components/sidebar/components-new/search-modal/search-modal.tsx 5/5 Fixed keyboard navigation by tracking visual order of items instead of filtered order, ensuring arrow keys and Enter work correctly with grouped sections
apps/sim/app/api/proxy/tts/unified/route.ts 4/5 New unified TTS proxy route supporting 7 providers (OpenAI, Deepgram, ElevenLabs, Cartesia, Google, Azure, PlayHT) with proper error handling and file storage
apps/sim/app/api/proxy/video/route.ts 4/5 New video generation proxy supporting 5 providers (Runway, Veo, Luma, MiniMax, Fal.ai) with polling, validation, and execution context support
apps/sim/app/api/proxy/stt/route.ts 4/5 Enhanced STT proxy to support AssemblyAI and Gemini providers in addition to existing Whisper, Deepgram, and ElevenLabs
apps/sim/tools/tts/types.ts 5/5 Comprehensive type definitions for 7 TTS providers with voice options, models, and format constants
apps/sim/tools/video/types.ts 5/5 Type definitions for video generation across 5 providers with provider-specific parameters and job response types
apps/sim/tools/registry.ts 5/5 Registered 14 new tools: 2 STT providers, 7 TTS providers, and 5 video generation providers
apps/sim/blocks/blocks/tts.ts 5/5 New TTS block configuration with conditional UI for 7 providers, extensive voice/model options, and parameter validation
apps/sim/blocks/blocks/video_generator.ts 5/5 New video generation block with provider-specific conditional UI, validation, and execution logic for 5 providers

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as Block/Tool UI
    participant Proxy as API Proxy Route
    participant Provider as External Provider API
    participant Storage as File Storage
    participant DB as Database/Context

    Note over User,DB: TTS Flow
    User->>UI: Configure TTS parameters
    UI->>Proxy: POST /api/proxy/tts/unified
    Note right of Proxy: Validate auth & params
    Proxy->>Provider: Request audio synthesis
    Provider-->>Proxy: Return audio buffer
    Proxy->>Storage: Upload audio file
    Storage-->>Proxy: File URL & metadata
    Proxy->>DB: Store in execution context (if applicable)
    Proxy-->>UI: Return audio URL & file object
    UI-->>User: Display audio player

    Note over User,DB: STT Flow
    User->>UI: Upload audio file
    UI->>Proxy: POST /api/proxy/stt
    Note right of Proxy: Download from storage
    Proxy->>Provider: Send audio for transcription
    Provider-->>Proxy: Return transcript & metadata
    Proxy-->>UI: Return transcript with segments
    UI-->>User: Display transcript text

    Note over User,DB: Video Generation Flow
    User->>UI: Configure video parameters
    UI->>Proxy: POST /api/proxy/video
    Note right of Proxy: Validate duration & aspect ratio
    alt Runway (requires image)
        Proxy->>Storage: Download visual reference
        Storage-->>Proxy: Image buffer
    end
    Proxy->>Provider: Create video generation job
    Provider-->>Proxy: Job ID
    loop Poll every 5s (max 10min)
        Proxy->>Provider: Check job status
        Provider-->>Proxy: Status update
    end
    Provider-->>Proxy: Video URL (on completion)
    Proxy->>Storage: Download & upload video
    Storage-->>Proxy: Final video URL
    Proxy->>DB: Store in execution context (if applicable)
    Proxy-->>UI: Return video URL & metadata
    UI-->>User: Display video player

    Note over User,DB: Search Modal Keyboard Nav
    User->>UI: Press Arrow Down/Up
    Note right of UI: Calculate visual index<br/>from grouped items
    UI->>UI: Update selectedIndex
    UI->>UI: Scroll into view
    User->>UI: Press Enter
    UI->>UI: Navigate to selected item
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

39 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@waleedlatif1 waleedlatif1 merged commit 0a4244b into staging Nov 22, 2025
9 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/videogen branch November 22, 2025 00:56
@waleedlatif1 waleedlatif1 mentioned this pull request Nov 22, 2025
10 tasks
MagellaX pushed a commit to MagellaX/sim that referenced this pull request Nov 23, 2025
… fixed search modal keyboard nav (simstudioai#2094)

* feat(tools): added more tts providers, added stt and videogen models, fixed search modal keyboard nav

* fixed icons

* cleaned up

* added falai

* improvement: icons

* fixed build

---------

Co-authored-by: Emir Karabeg <emirkarabeg@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants