-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(tools): added more tts providers, added stt and videogen models, fixed search modal keyboard nav #2094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… fixed search modal keyboard nav
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR significantly expands Sim's multimedia capabilities by adding support for multiple TTS, STT, and video generation providers, plus fixes a keyboard navigation bug in the search modal. Key Changes
Technical ImplementationThe implementation follows a consistent pattern across all new features:
Minor Cleanup
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant UI as Block/Tool UI
participant Proxy as API Proxy Route
participant Provider as External Provider API
participant Storage as File Storage
participant DB as Database/Context
Note over User,DB: TTS Flow
User->>UI: Configure TTS parameters
UI->>Proxy: POST /api/proxy/tts/unified
Note right of Proxy: Validate auth & params
Proxy->>Provider: Request audio synthesis
Provider-->>Proxy: Return audio buffer
Proxy->>Storage: Upload audio file
Storage-->>Proxy: File URL & metadata
Proxy->>DB: Store in execution context (if applicable)
Proxy-->>UI: Return audio URL & file object
UI-->>User: Display audio player
Note over User,DB: STT Flow
User->>UI: Upload audio file
UI->>Proxy: POST /api/proxy/stt
Note right of Proxy: Download from storage
Proxy->>Provider: Send audio for transcription
Provider-->>Proxy: Return transcript & metadata
Proxy-->>UI: Return transcript with segments
UI-->>User: Display transcript text
Note over User,DB: Video Generation Flow
User->>UI: Configure video parameters
UI->>Proxy: POST /api/proxy/video
Note right of Proxy: Validate duration & aspect ratio
alt Runway (requires image)
Proxy->>Storage: Download visual reference
Storage-->>Proxy: Image buffer
end
Proxy->>Provider: Create video generation job
Provider-->>Proxy: Job ID
loop Poll every 5s (max 10min)
Proxy->>Provider: Check job status
Provider-->>Proxy: Status update
end
Provider-->>Proxy: Video URL (on completion)
Proxy->>Storage: Download & upload video
Storage-->>Proxy: Final video URL
Proxy->>DB: Store in execution context (if applicable)
Proxy-->>UI: Return video URL & metadata
UI-->>User: Display video player
Note over User,DB: Search Modal Keyboard Nav
User->>UI: Press Arrow Down/Up
Note right of UI: Calculate visual index<br/>from grouped items
UI->>UI: Update selectedIndex
UI->>UI: Scroll into view
User->>UI: Press Enter
UI->>UI: Navigate to selected item
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
39 files reviewed, no comments
… fixed search modal keyboard nav (simstudioai#2094) * feat(tools): added more tts providers, added stt and videogen models, fixed search modal keyboard nav * fixed icons * cleaned up * added falai * improvement: icons * fixed build --------- Co-authored-by: Emir Karabeg <emirkarabeg@berkeley.edu>
Summary
Type of Change
Testing
Tested manually
Checklist