voice transcription by nornagon-openai · Pull Request #3381 · openai/codex

nornagon-openai · 2025-09-09T19:07:49Z

Adds voice transcription on press-and-hold of spacebar.

Screen.Recording.2025-09-19.at.12.24.02.PM.mov

- Hold Space on empty composer to record; release to transcribe - Block input and show 'Recording' hint while capturing - Send audio to OpenAI Whisper (whisper-1) via reqwest multipart - Resolve API key via codex_login auth (no env var read) - Insert transcription into composer Add cpal + hound deps for audio capture + WAV encoding.

- Insert atomic textarea element when transcription starts - Keep textarea fully editable; element moves with edits - Replace element by id when Whisper result returns; fallback insert at cursor - Add element id support to TextArea (named elements + replace by id) - Switch to AppEvent::TranscriptionComplete(id, text)

- Add AppEvent::TranscriptionFailed { id, error } - On error, delete the placeholder element; leave editor state intact - Fix voice thread to send failure event with correct id - Keep success path replacing placeholder by id

…ng' on release - Insert named 'recording' element at start of capture - On stop, change the same element text to 'transcribing' and send audio - Remove footer 'Recording' hint

- Add TextArea::update_named_element_by_id to preserve element id - On PageDown release, update existing element text to 'transcribing' - Final transcription replaces element with plain text; errors delete it - Route keys while recording; stop on Release or next key

- Use webrtc-vad to detect voiced frames (10ms) - Aggressive mode + 200ms padding to avoid clipping - Downmix to mono, resample to supported rates - Trim leading/trailing silence before upload - Skip upload and remove placeholder if no speech - Add webrtc-vad dependency to TUI

Fix push-to-talk voice mode where PageDown release didn't trigger transcription because Release events were filtered at the app layer. Now all key events are forwarded, allowing the composer to stop recording on release and send audio for transcription immediately.

- Short-clip handling: remove placeholder without transcribing when <1s - Hold-to-talk: start immediately on empty textarea; skip space + delay - Disable VAD trimming; always send full clip - Add live recording meter with adaptive gain and compression - Animate via new AppEvent::RecordingMeter and in-place updates - Use atomic peak from audio callback to avoid blocking audio thread - Normalize audio (peak with headroom) before WAV upload - History nav: trigger on Press/Repeat only - Hide cursor while recording - Meter UI: 12-char sparkline, scrolling left, no label

- Remove unused functions (to_mono_i16, resample_linear_i16, detect_voiced_bounds_webrtc) - Prune unused imports (std::convert::TryFrom, webrtc-vad types) - Remove webrtc-vad from tui/Cargo.toml - Delete unused local in recording meter task No behavior change; voice still records and transcribes full clip. Ran fmt/fix and tests for codex-tui.

- Remove AppEvent::SpaceHoldTimeout and app/chatwidget/bottom_pane handlers - Manage 500ms hold via tokio::spawn that flips an atomic flag - Convert to recording on next input event when flag is observed Behavior: identical in typical terminals; on non-repeat terminals, starts on next key event after timeout.

…repeats - Drop id from hold state and conversions - Spawn tokio task that flips atomic flag and schedules a frame - Process conversion in a new pre_draw_tick called before rendering - Pass FrameRequester into ChatComposer; update tests accordingly No AppEvent used for timeout; behavior now independent of key repeat.

…tick - Remove key-event path for timeout processing; rely on frame scheduled by timer - Keep local tokio task + atomic flag approach; fewer code paths All tests pass.

- Replace static "transcribing" with animated braille spinner frames via RecordingMeter updates - Spinner auto-stops after max duration or when placeholder is replaced/removed All TUI tests pass.

- Insert a named element containing a space on Space press - On release or cancel, replace the element with a plain space - On timeout, remove the element and begin recording Keeps behavior while simplifying state (no index math). All tests pass.

- Add stop_recording_and_start_transcription() and call from handle_key_event - Keeps behavior; improves readability and testability All TUI tests pass.

- Add start_recording_with_placeholder() and reuse for empty-text space press and hold-timeout - Keeps behavior; consolidates meter placeholder + spawn logic All TUI tests pass.

…lean up on drop - Maintain stop flags for spinner tasks; stop on replace/remove or when update fails - Implement Drop for ChatComposer to stop spinners and end capture on teardown - Make RecordingMeter path schedule a frame only when update applied This avoids runaway spinner tasks across UI changes (e.g., NewSession). All tests pass.

…ance and 60s cap - Remove explicit spinner stop flags and stop calls - Spinner tasks auto-expire after 60s; UI ignores updates once placeholder is gone - Keep Drop minimal: stop capture and clear placeholder All TUI tests pass.

…isappearance and 60s cap" This reverts commit 5461929.

- Add ChatComposer helpers (ta_* wrappers) that auto-sync popups after text changes - Use wrappers for programmatic edits (placeholders, spinner frames, space-hold element) - Remove scattered manual sync calls accordingly All TUI tests pass.

…y paths - Revert to direct TextArea calls - Ensure sync_command_popup/sync_file_search_popup are called in event handlers and key paths - Keep on-space-hold timeout and recording flows consistent All TUI tests pass.

- Centralize sync in handle_key_event end; for early-return branches, perform sync then return - Remove ad-hoc syncs added inside match branches now covered by centralized sync All TUI tests pass.

- Add ChatComposer::sync_popups() to unify command/file popup updates - Call sync_popups after key events; remove scattered explicit sync calls - BottomPane now triggers sync_popups after events (key, paste, inserts, pre-draw, history, transcription) - Keeps behavior consistent and simplifies control flow; tests and snapshots pass

nornagon-openai · 2026-02-19T19:24:19Z

@aibrahim-oai for the activation timing, the intention is that if the composer is empty, then pressing space should immediately trigger voice input. if there's text in the composer, there should be a delay. the theory is that " " is an unusual first character to input.

when you're talking about increasing time to activate, are you talking about the empty-composer state or the non-empty-composer state?

aibrahim-oai · 2026-02-19T19:26:25Z

I meant empty composer. I think it's unexpected. feels a bit buggy.

# Conflicts: # codex-rs/app-server/tests/suite/v2/thread_resume.rs

# Conflicts: # codex-rs/tui/src/bottom_pane/mod.rs

# Conflicts: # MODULE.bazel.lock # codex-rs/Cargo.lock # codex-rs/tui/src/chatwidget.rs

# Conflicts: # MODULE.bazel.lock

nornagon-openai added 30 commits August 14, 2025 21:28

tui: remove transcribing placeholder on error

b455e9b

- Add AppEvent::TranscriptionFailed { id, error } - On error, delete the placeholder element; leave editor state intact - Fix voice thread to send failure event with correct id - Keep success path replacing placeholder by id

key is pgdn

062e9f0

tui: show in-text placeholder during recording; update to 'transcribi…

7abde77

…ng' on release - Insert named 'recording' element at start of capture - On stop, change the same element text to 'transcribing' and send audio - Remove footer 'Recording' hint

fix lint

e887b06

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

1dc9065

fix auth

6e892e9

fix rendering

785b5e1

space hold

2accebe

tui(voice): simplify hold logic by handling timeout only in pre-draw …

e68b934

…tick - Remove key-event path for timeout processing; rely on frame scheduled by timer - Keep local tokio task + atomic flag approach; fewer code paths All tests pass.

tui(voice): animate transcribing with braille spinner

f0481fa

- Replace static "transcribing" with animated braille spinner frames via RecordingMeter updates - Spinner auto-stops after max duration or when placeholder is replaced/removed All TUI tests pass.

tui(voice): extract end-of-recording logic into helper

88f0145

- Add stop_recording_and_start_transcription() and call from handle_key_event - Keeps behavior; improves readability and testability All TUI tests pass.

tui(voice): extract start-recording logic into helper

2c7a8eb

- Add start_recording_with_placeholder() and reuse for empty-text space press and hold-timeout - Keeps behavior; consolidates meter placeholder + spawn logic All TUI tests pass.

Revert "tui(voice): simplify spinner lifecycle; rely on placeholder d…

643d707

…isappearance and 60s cap" This reverts commit 5461929.

tui: ensure popup sync runs for all key paths; remove mid-function syncs

456d786

- Centralize sync in handle_key_event end; for early-return branches, perform sync then return - Remove ad-hoc syncs added inside match branches now covered by centralized sync All TUI tests pass.

helpers

b01d34f

tui: isolate voice key handling

b2e75b2

nornagon-openai and others added 11 commits February 19, 2026 11:33

tui: avoid double-processing voice key events

b279365

tui: apply voice-space handling before popup dispatch

2dd4612

tui: fix release-key handling for voice and popups

58ad723

tui: gate linux-only imports for voice handling

77f4a71

core: move voice transcription to under-development stage

a82adf1

Merge branch 'main' into nornagon/voice-mode

84b4256

test(app-server): deflake running thread resume timeouts

13d1728

# Conflicts: # codex-rs/app-server/tests/suite/v2/thread_resume.rs

chore: refresh MODULE.bazel.lock

ff048e0

Revert "test(app-server): deflake running thread resume timeouts"

dbe02f4

Merge branch 'main' into nornagon/voice-mode

293584d

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

e4c09b4

nornagon-openai force-pushed the nornagon/voice-mode branch from 46a2439 to e4c09b4 Compare February 20, 2026 17:09

nornagon-openai added 11 commits February 20, 2026 16:02

drop linux_run_main changes, tweak space insertion behavior

d74296a

Fix clippy if-else in voice placeholder replacement

3fc0140

tui: revert status-line merge-conflict changes

b726be7

tui: extract voice meter computation

c39bcdb

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

4522e01

# Conflicts: # codex-rs/tui/src/bottom_pane/mod.rs

tui: add meter stub for no voice-input builds

a09c89f

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

1589fd2

# Conflicts: # MODULE.bazel.lock # codex-rs/Cargo.lock # codex-rs/tui/src/chatwidget.rs

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

b8318a6

build: regenerate lockfiles after merge

129b686

build: pin aws-lc-sys for bazel patch

50b8c4f

Merge remote-tracking branch 'origin/main' into nornagon/voice-mode

b2387c7

# Conflicts: # MODULE.bazel.lock

nornagon-openai enabled auto-merge (squash) February 23, 2026 21:57

nornagon-openai merged commit 855e275 into main Feb 23, 2026
37 of 39 checks passed

nornagon-openai deleted the nornagon/voice-mode branch February 23, 2026 22:15

github-actions bot locked and limited conversation to collaborators Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

voice transcription#3381

voice transcription#3381
nornagon-openai merged 151 commits intomainfrom
nornagon/voice-mode

nornagon-openai commented Sep 9, 2025 •

edited

Loading

Uh oh!

nornagon-openai commented Feb 19, 2026

Uh oh!

aibrahim-oai commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Comments

Conversation

nornagon-openai commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nornagon-openai commented Feb 19, 2026

Uh oh!

aibrahim-oai commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nornagon-openai commented Sep 9, 2025 •

edited

Loading