Skip to content

transcript editor #2547

@ComputelessComputer

Description

@ComputelessComputer

Overview

A comprehensive transcript editor that enables users to review, edit, and refine transcriptions with speaker identification, voice prints, and memory-based learning. Inspired by Descript's editing experience.

Core Features

Speaker Identification & Segmentation

  • Automatic speaker segmentation during transcription
  • Smart speaker tagging that handles over-segmentation (e.g., if 2 people are wrongly segmented into 4, provide easy way to merge)
  • Voice print support for recognizing recurring speakers across sessions
  • Related: speaker diarization #956 (speaker diarization), voiceprint #2266 (voiceprint)

Descript-like Editing Experience

Speaker Labels

  • Click on speaker label to hear a 10-second audio snippet (if recording is saved)
  • If no recording available, speaker is still selectable but no audio preview
  • Pop-over with audio playback controls when clicking speaker

Transcript Editing

  • Double-click on transcript content to enter edit mode
  • Inline editing of transcribed text
  • Easy correction of mistranscribed words
  • Cmd+F / search and replace functionality

Memory & Learning System

  • Track user corrections to mistranscribed words
  • When user changes "Hyperno" (H-Y-P-E-R-N-O) to "Hyprnote" (H-Y-P-R-N-O-T-E), remember this correction
  • Two implementation approaches:
    1. Keyword boosting: Boost corrected words in future transcriptions
    2. Rule-based system: Apply correction rules automatically
  • Related: memory #3474 (memory), implement keyword boosting #3512 (implement keyword boosting)

This creates an experience where "Hyprnote evolves together with you" - learning your vocabulary, jargon, and preferences over time.

Technical Context

  • Transcript data stored as Word2 objects with timestamps
  • Integration with ListenerActor for real-time updates
  • Audio playback requires coordination with stored recordings
  • Memory system needs persistent storage for corrections

Acceptance Criteria

Speaker Identification

  • Automatic speaker segmentation works during transcription
  • Users can merge incorrectly segmented speakers
  • Voice prints can be saved and recognized across sessions
  • Speaker labels are clearly displayed

Editing Experience

  • Click speaker label → 10-second audio snippet plays (if recording exists)
  • Double-click transcript → inline edit mode
  • Cmd+F opens search with replace functionality
  • Changes save automatically

Memory System

  • Corrections are tracked and stored
  • Corrected words are boosted in future transcriptions
  • Rule-based corrections apply automatically
  • User can view/manage their correction history

Related Issues

Labels

product/desktop, area/ui, area/backend

Sub-issues

Metadata

Metadata

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions