Feature Proposal: Unified Native Voice Input Architecture (Local-First)

### Context & Problem
There is significant community interest in Voice Input for Gemini CLI, currently fragmented across multiple discussions:
*   **#13798:** Core feature request for STT/TTS (currently suggests "wrapper scripts").
*   **#1982:** General "Voice Mode" discussion (leans towards external MCP servers).
*   **#16461:** Requests for voice loops and complex backend overhauls.

**Current Gap:** The ecosystem lacks a **lightweight, privacy-first, and native** solution. Users currently have to choose between complex MCP setups, cloud-based dependencies (OpenAI API), or external wrapper scripts that break the CLI's native UX.

### The Proposal
I propose adding a **native Voice Input hook** directly into the core CLI (`packages/cli`).

**Architecture:**
1.  **Local-First:** Uses standard system binaries (`sox` on macOS, `arecord` on Linux) for capture.
2.  **Privacy:** Transcription is handled by a local `whisper` binary (configurable path), ensuring no audio leaves the user's machine.
3.  **Integrated UX:** 
    *   Toggles via `Alt+V`, `Ctrl+Q`, or `/voice` slash command.
    *   Visual status indicator (`🎤 Recording...`) directly in the `InputPrompt` header.
    *   Inserts text at cursor position (maintaining editability).

### Implementation Status
**I have fully implemented and verified this architecture.** 
My branch includes:
*   ✅ `useVoiceInput` React hook for process management.
*   ✅ `VoiceContext` for global state.
*   ✅ Unit tests for recording logic and shortcut bindings.
*   ✅ Optimized low-latency polling for transcription.

### Request
I believe this implementation solves the core requirement of #13798 while adhering to the privacy and performance standards of the project.

**Is the team open to a PR for this native integration?** I am ready to push the branch immediately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Proposal: Unified Native Voice Input Architecture (Local-First) #18067

Context & Problem

The Proposal

Implementation Status

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Proposal: Unified Native Voice Input Architecture (Local-First) #18067

Description

Context & Problem

The Proposal

Implementation Status

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions