epic: audio input support for agent commands

## Overview

Add support for sending commands to the agent via audio — both recorded audio files (Telegram voice messages, Discord/Slack file uploads) and real-time audio streaming where supported by provider APIs.

## Motivation

Voice input is a natural interaction mode for AI agents. Telegram, Discord, and Slack all support voice/audio messages natively, but Zeph currently ignores all non-text content. Adding audio support unlocks hands-free interaction and expands accessibility.

## Architecture

Audio input follows a two-stage pipeline:

```
Audio Source → Transcription (STT) → Text → Agent Loop (unchanged)
```

### Transcription backends (pluggable):
1. **OpenAI Whisper API** — cloud, high accuracy, 25 MB file limit
2. **Local Whisper** (candle/whisper.cpp) — offline, no API costs, GPU optional
3. **Native multimodal** — Claude and GPT-4o accept audio natively (no separate STT step)

### Key design decisions:
- Audio is transcribed to text **before** entering the agent loop — minimal changes to core
- `ChannelMessage` gains an `attachments: Vec<Attachment>` field
- A new `SpeechToText` trait abstracts transcription backends
- Channel adapters extract audio from platform-specific message types
- Transcription happens at the channel boundary, not in the agent

## Sub-issues

- [x] #521 Extend `ChannelMessage` and `MessagePart` with multimodal attachment support
- [x] #522 Add `SpeechToText` trait and OpenAI Whisper backend
- [x] #523 Add local Whisper backend via candle (feature-gated)
- [x] #524 Telegram: handle voice messages and audio files
- [x] #525 Discord: handle voice messages and audio attachments
- [x] #526 Slack: handle audio file uploads
- [x] #527 CLI: audio file path as input argument
- [ ] #528 Streaming audio input support (real-time STT)
- [x] #529 Configuration and documentation

## Non-goals (v1)
- Text-to-speech (TTS) output
- Voice call support (phone, WebRTC)
- Video input

## References
- [OpenAI Whisper API](https://platform.openai.com/docs/guides/speech-to-text)
- [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime)
- [Claude audio support](https://docs.anthropic.com/en/docs/build-with-claude/audio)
- [Telegram voice messages](https://core.telegram.org/bots/api#voice)
- [candle-whisper](https://github.com/huggingface/candle/tree/main/candle-examples/examples/whisper)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: audio input support for agent commands #520

Overview

Motivation

Architecture

Transcription backends (pluggable):

Key design decisions:

Sub-issues

Non-goals (v1)

References

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

epic: audio input support for agent commands #520

Description

Overview

Motivation

Architecture

Transcription backends (pluggable):

Key design decisions:

Sub-issues

Non-goals (v1)

References

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions