Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
- Audio input support: `Attachment`/`AttachmentKind` types, `SpeechToText` trait, OpenAI Whisper backend behind `stt` feature flag (#520, #521, #522)
- Telegram voice and audio message handling with automatic file download (#524)
- STT bootstrap wiring: `WhisperProvider` created from `[llm.stt]` config behind `stt` feature (#529)
- Slack audio file upload handling with host validation and size limits (#525)

## [0.10.0] - 2026-02-18

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ Skills **evolve**: failure detection triggers self-reflection, and the agent gen
|----------|-------------|
| **MCP** | Connect external tool servers (stdio + HTTP) with SSRF protection |
| **A2A** | Agent-to-agent communication via JSON-RPC 2.0 with SSE streaming |
| **Audio input** | Speech-to-text via OpenAI Whisper (25 MB limit); Telegram voice messages transcribed automatically |
| **Audio input** | Speech-to-text via OpenAI Whisper (25 MB limit); Telegram and Slack audio files transcribed automatically |
| **Channels** | CLI, Telegram (text + voice), Discord, Slack, TUI — all with streaming support |
| **Gateway** | HTTP webhook ingestion with bearer auth and rate limiting |
| **Native tool_use** | Structured tool calling via Claude/OpenAI APIs; text fallback for local models |
Expand Down
2 changes: 1 addition & 1 deletion crates/zeph-channels/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Implements I/O channel adapters that connect the agent to different frontends. S
| `cli` | `CliChannel` — interactive terminal I/O |
| `telegram` | Telegram adapter via teloxide with streaming; voice/audio message detection and file download |
| `discord` | Discord adapter (optional feature) |
| `slack` | Slack adapter (optional feature) |
| `slack` | Slack adapter (optional feature); audio file detection and download with Bearer auth |
| `any` | `AnyChannel` — enum dispatch over all channels |
| `markdown` | Markdown rendering helpers |
| `error` | `ChannelError` — unified error type |
Expand Down
32 changes: 32 additions & 0 deletions crates/zeph-channels/src/slack/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use serde::{Deserialize, Serialize};
use serde_json::Value;

const SLACK_API: &str = "https://slack.com/api";
const MAX_AUDIO_BYTES: usize = 25 * 1024 * 1024;

pub struct SlackApi {
client: reqwest::Client,
Expand Down Expand Up @@ -131,4 +132,35 @@ impl SlackApi {
}
Ok(())
}

/// Download a file from Slack using the bot token for authorization.
///
/// # Errors
///
/// Returns an error if the HTTP request fails or the response status is not success.
pub async fn download_file(
&self,
url: &str,
) -> Result<Vec<u8>, Box<dyn std::error::Error + Send + Sync>> {
let host = reqwest::Url::parse(url)
.ok()
.and_then(|u| u.host_str().map(String::from));
if !host.is_some_and(|h| h.ends_with(".slack.com")) {
return Err(format!("refusing to send token to non-slack host: {url}").into());
}

let resp = self.client.get(url).bearer_auth(&self.token).send().await?;
if !resp.status().is_success() {
return Err(format!("slack file download failed: {}", resp.status()).into());
}
let bytes = resp.bytes().await?;
if bytes.len() > MAX_AUDIO_BYTES {
return Err(format!(
"slack file too large: {} bytes (max {MAX_AUDIO_BYTES})",
bytes.len()
)
.into());
}
Ok(bytes.to_vec())
}
}
112 changes: 111 additions & 1 deletion crates/zeph-channels/src/slack/events.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,19 @@

type HmacSha256 = Hmac<Sha256>;

#[derive(Clone)]
pub struct SlackFile {
pub url: String,
pub filename: Option<String>,
pub mimetype: String,
}

#[derive(Clone)]
pub struct IncomingMessage {
pub channel_id: String,
pub text: String,
pub user_id: String,
pub files: Vec<SlackFile>,
}

#[derive(Clone)]
Expand Down Expand Up @@ -94,7 +102,9 @@
let subtype = event.get("subtype").and_then(|v| v.as_str());
let event_type = event.get("type").and_then(|v| v.as_str());

if event_type == Some("message") && subtype.is_none() {
if event_type == Some("message")
&& (subtype.is_none() || subtype == Some("file_share"))
{
let user = event.get("user").and_then(|v| v.as_str()).unwrap_or("");
let channel = event.get("channel").and_then(|v| v.as_str()).unwrap_or("");
let text = event.get("text").and_then(|v| v.as_str()).unwrap_or("");
Expand All @@ -116,12 +126,15 @@
return Ok(String::new());
}

let files = parse_audio_files(event);

let _ = state
.tx
.send(IncomingMessage {
channel_id: channel.to_owned(),
text: text.to_owned(),
user_id: user.to_owned(),
files,
})
.await;
}
Expand All @@ -132,6 +145,32 @@
}
}

fn is_audio_mime(mime: &str) -> bool {
mime.starts_with("audio/") || mime == "video/webm"
}

fn parse_audio_files(event: &Value) -> Vec<SlackFile> {
event
.get("files")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|f| {
let mime = f.get("mimetype")?.as_str()?;
if !is_audio_mime(mime) {
return None;
}
Some(SlackFile {
url: f.get("url_private_download")?.as_str()?.to_owned(),
filename: f.get("name").and_then(|v| v.as_str()).map(String::from),
mimetype: mime.to_owned(),
})
})
.collect()
})
.unwrap_or_default()
}

pub(crate) fn verify_signature(
signing_secret: &str,
headers: &HeaderMap,
Expand Down Expand Up @@ -332,6 +371,7 @@
assert_eq!(msg.user_id, "U123");
assert_eq!(msg.channel_id, "C456");
assert_eq!(msg.text, "hi");
assert!(msg.files.is_empty());
}

#[tokio::test]
Expand Down Expand Up @@ -437,4 +477,74 @@
let _ = handle_event(State(state), headers, body.to_owned()).await;
assert!(rx.try_recv().is_err());
}

#[tokio::test]
async fn handle_event_file_share_with_audio() {
let (tx, mut rx) = mpsc::channel(16);
let state = EventState {
signing_secret: "secret".into(),
tx,
bot_user_id: String::new(),
allowed_user_ids: vec![],
allowed_channel_ids: vec![],
};

let body = r#"{"type":"event_callback","event":{"type":"message","subtype":"file_share","user":"U1","channel":"C1","text":"","files":[{"name":"voice.webm","mimetype":"audio/webm","url_private_download":"https://files.slack.com/voice.webm"}]}}"#;
let timestamp = current_timestamp();
let sig = compute_signature("secret", &timestamp, body);

let mut headers = HeaderMap::new();
headers.insert(
"X-Slack-Request-Timestamp",
HeaderValue::from_str(&timestamp).unwrap(),
);
headers.insert("X-Slack-Signature", HeaderValue::from_str(&sig).unwrap());

let result = handle_event(State(state), headers, body.to_owned()).await;
assert!(result.is_ok());

let msg = rx.try_recv().unwrap();
assert_eq!(msg.files.len(), 1);
assert_eq!(msg.files[0].mimetype, "audio/webm");
assert_eq!(msg.files[0].filename.as_deref(), Some("voice.webm"));
}

#[test]
fn parse_audio_files_filters_non_audio() {
let event: Value = serde_json::from_str(
r#"{"files":[
{"name":"img.png","mimetype":"image/png","url_private_download":"https://x/img"},
{"name":"voice.ogg","mimetype":"audio/ogg","url_private_download":"https://x/voice"}
]}"#,
)
.unwrap();
let files = parse_audio_files(&event);
assert_eq!(files.len(), 1);
assert_eq!(files[0].mimetype, "audio/ogg");
}

#[test]
fn parse_audio_files_accepts_video_webm() {
let event: Value = serde_json::from_str(
r#"{"files":[{"name":"v.webm","mimetype":"video/webm","url_private_download":"https://x/v"}]}"#,
)
.unwrap();
let files = parse_audio_files(&event);
assert_eq!(files.len(), 1);
}

#[test]
fn parse_audio_files_empty_when_no_files() {
let event: Value = serde_json::from_str(r#"{"text":"hi"}"#).unwrap();
assert!(parse_audio_files(&event).is_empty());
}

#[test]
fn is_audio_mime_cases() {
assert!(is_audio_mime("audio/webm"));
assert!(is_audio_mime("audio/mpeg"));
assert!(is_audio_mime("video/webm"));
assert!(!is_audio_mime("video/mp4"));
assert!(!is_audio_mime("image/png"));
}
}
21 changes: 19 additions & 2 deletions crates/zeph-channels/src/slack/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ pub mod events;
use std::time::{Duration, Instant};

use tokio::sync::mpsc;
use zeph_core::channel::{Channel, ChannelError, ChannelMessage};
use zeph_core::channel::{Attachment, AttachmentKind, Channel, ChannelError, ChannelMessage};

use self::events::IncomingMessage;

Expand Down Expand Up @@ -138,9 +138,25 @@ impl Channel for SlackChannel {
self.last_edit = None;
self.message_ts = None;

let mut attachments = Vec::new();
for file in &incoming.files {
match self.api.download_file(&file.url).await {
Ok(data) => {
attachments.push(Attachment {
kind: AttachmentKind::Audio,
data,
filename: file.filename.clone(),
});
}
Err(e) => {
tracing::warn!("failed to download slack audio file: {e}");
}
}
}

Ok(Some(ChannelMessage {
text: incoming.text,
attachments: vec![],
attachments,
}))
}

Expand Down Expand Up @@ -251,6 +267,7 @@ mod tests {
channel_id: "C123".into(),
text: "hello".into(),
user_id: "U1".into(),
files: vec![],
})
.unwrap();
let msg = ch.try_recv().unwrap();
Expand Down
10 changes: 10 additions & 0 deletions docs/src/guide/audio-input.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,16 @@ Download failures (network errors, expired file links) are logged at `warn` leve

Bootstrap wiring is automatic: when `[llm.stt]` is present in the config and the `stt` feature is enabled, `main.rs` creates a `WhisperProvider` and injects it into the agent via `with_stt()`. No additional setup is needed beyond the configuration shown above.

## Slack Audio Files

The Slack channel automatically detects audio file uploads and voice messages in incoming events. When a message contains files with audio MIME types (`audio/*`) or `video/webm` (commonly used for voice recordings), the adapter downloads the file and wraps it as an `Attachment` with `AttachmentKind::Audio`. The attachment then follows the standard transcription pipeline.

Files are downloaded via `url_private_download` using Bearer token authentication with the bot token. For security, the adapter validates that the download URL host ends with `.slack.com` before making the request. Files exceeding 25 MB are skipped.

Download failures (network errors, host validation rejection, oversized files) are logged at `warn` level and gracefully skipped — the message is delivered without an attachment.

To enable Slack audio transcription, ensure both the `slack` and `stt` features are active and `[llm.stt]` is configured. Add the `files:read` OAuth scope to your Slack app so the bot can access uploaded files.

## Limitations

- **25 MB file size limit** — audio files exceeding this are rejected before upload.
Expand Down
2 changes: 1 addition & 1 deletion docs/src/guide/channels.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ When the queue is full (10 messages), new input is silently dropped until space

## Attachments

`ChannelMessage` supports an optional `attachments` field carrying `Attachment` values with typed `AttachmentKind` variants (Audio, Image, Video, File). When the `stt` feature is enabled, audio attachments are automatically transcribed before entering the agent loop. The Telegram channel automatically downloads voice and audio messages and delivers them as attachments. See [Audio Input](audio-input.md) for details.
`ChannelMessage` supports an optional `attachments` field carrying `Attachment` values with typed `AttachmentKind` variants (Audio, Image, Video, File). When the `stt` feature is enabled, audio attachments are automatically transcribed before entering the agent loop. The Telegram channel automatically downloads voice and audio messages and delivers them as attachments. The Slack channel detects audio file uploads and voice messages (`audio/*`, `video/webm`), downloads them via `url_private_download` with host validation (`.slack.com` only) and a 25 MB size limit, and delivers them as audio attachments. See [Audio Input](audio-input.md) for details.

## Channel Selection Logic

Expand Down
Loading