Skip to content

audio: Telegram voice message and audio file handling #526

@bug-ops

Description

@bug-ops

Parent: #520
Depends on: #521, #522

Context

Telegram supports voice messages (OGG/Opus) and audio file attachments natively. teloxide already provides `Message::voice()` and `Message::audio()` accessors — currently ignored.

Changes

`crates/zeph-channels/src/telegram.rs`

  • Check `msg.voice()` and `msg.audio()` in addition to `msg.text()`
  • Download audio via Telegram Bot API `getFile` + HTTP fetch
  • Convert to `Attachment::Audio` and attach to `ChannelMessage`
  • If text is also present (caption), include both

Audio flow

  1. User sends voice message in Telegram
  2. Adapter detects `msg.voice()`, gets `file_id`
  3. Downloads OGG/Opus file via Bot API
  4. Creates `ChannelMessage { text: "", attachments: [Audio { data, mime: "audio/ogg" }] }`
  5. Agent loop transcribes via configured STT backend
  6. Transcribed text enters normal agent pipeline

Size limits

  • Telegram voice messages: up to 20 MB
  • Apply configurable max duration (default: 120s) to prevent abuse

Acceptance criteria

  • Voice messages transcribed and processed as text commands
  • Audio file attachments (.mp3, .ogg, .wav) supported
  • Caption text combined with transcription
  • File size/duration limits enforced
  • Graceful error message if STT is not configured

Metadata

Metadata

Assignees

No one assigned

    Labels

    channelsUser interface channelsenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions