Skip to content

Feature Proposal: Unified Native Voice Input Architecture (Local-First) #18067

@fayerman-source

Description

@fayerman-source

Context & Problem

There is significant community interest in Voice Input for Gemini CLI, currently fragmented across multiple discussions:

Current Gap: The ecosystem lacks a lightweight, privacy-first, and native solution. Users currently have to choose between complex MCP setups, cloud-based dependencies (OpenAI API), or external wrapper scripts that break the CLI's native UX.

The Proposal

I propose adding a native Voice Input hook directly into the core CLI (packages/cli).

Architecture:

  1. Local-First: Uses standard system binaries (sox on macOS, arecord on Linux) for capture.
  2. Privacy: Transcription is handled by a local whisper binary (configurable path), ensuring no audio leaves the user's machine.
  3. Integrated UX:
    • Toggles via Alt+V, Ctrl+Q, or /voice slash command.
    • Visual status indicator (🎤 Recording...) directly in the InputPrompt header.
    • Inserts text at cursor position (maintaining editability).

Implementation Status

I have fully implemented and verified this architecture.
My branch includes:

  • useVoiceInput React hook for process management.
  • VoiceContext for global state.
  • ✅ Unit tests for recording logic and shortcut bindings.
  • ✅ Optimized low-latency polling for transcription.

Request

I believe this implementation solves the core requirement of #13798 while adhering to the privacy and performance standards of the project.

Is the team open to a PR for this native integration? I am ready to push the branch immediately.

Metadata

Metadata

Labels

area/coreIssues related to User Interface, OS Support, Core Functionalityhelp wantedWe will accept PRs from all issues marked as "help wanted". Thanks for your support!priority/p2Important but can be addressed in a future release.type/feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions