-
Notifications
You must be signed in to change notification settings - Fork 541
Description
Overview
A comprehensive transcript editor that enables users to review, edit, and refine transcriptions with speaker identification, voice prints, and memory-based learning. Inspired by Descript's editing experience.
Core Features
Speaker Identification & Segmentation
- Automatic speaker segmentation during transcription
- Smart speaker tagging that handles over-segmentation (e.g., if 2 people are wrongly segmented into 4, provide easy way to merge)
- Voice print support for recognizing recurring speakers across sessions
- Related: speaker diarization #956 (speaker diarization), voiceprint #2266 (voiceprint)
Descript-like Editing Experience
Speaker Labels
- Click on speaker label to hear a 10-second audio snippet (if recording is saved)
- If no recording available, speaker is still selectable but no audio preview
- Pop-over with audio playback controls when clicking speaker
Transcript Editing
- Double-click on transcript content to enter edit mode
- Inline editing of transcribed text
- Easy correction of mistranscribed words
- Cmd+F / search and replace functionality
Memory & Learning System
- Track user corrections to mistranscribed words
- When user changes "Hyperno" (H-Y-P-E-R-N-O) to "Hyprnote" (H-Y-P-R-N-O-T-E), remember this correction
- Two implementation approaches:
- Keyword boosting: Boost corrected words in future transcriptions
- Rule-based system: Apply correction rules automatically
- Related: memory #3474 (memory), implement keyword boosting #3512 (implement keyword boosting)
This creates an experience where "Hyprnote evolves together with you" - learning your vocabulary, jargon, and preferences over time.
Technical Context
- Transcript data stored as
Word2objects with timestamps - Integration with
ListenerActorfor real-time updates - Audio playback requires coordination with stored recordings
- Memory system needs persistent storage for corrections
Acceptance Criteria
Speaker Identification
- Automatic speaker segmentation works during transcription
- Users can merge incorrectly segmented speakers
- Voice prints can be saved and recognized across sessions
- Speaker labels are clearly displayed
Editing Experience
- Click speaker label → 10-second audio snippet plays (if recording exists)
- Double-click transcript → inline edit mode
- Cmd+F opens search with replace functionality
- Changes save automatically
Memory System
- Corrections are tracked and stored
- Corrected words are boosted in future transcriptions
- Rule-based corrections apply automatically
- User can view/manage their correction history
Related Issues
- speaker diarization #956: speaker diarization
- voiceprint #2266: voiceprint
- memory #3474: memory
- implement keyword boosting #3512: implement keyword boosting
- Make (double)-clicking on transcript switch to edit mode #1430: Make (double)-clicking on transcript switch to edit mode
Labels
product/desktop, area/ui, area/backend
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Type
Projects
Status
Backlog