Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]

### Added
- Per-tool inline filter stats in CLI chat: `[shell] cargo test (342 lines -> 28 lines, 91.8% filtered)` (#449)
- Filter metrics in TUI Resources panel: confidence distribution, command hit rate, token savings (#448)
- Periodic 250ms tick in TUI event loop for real-time metrics refresh (#447)
- Output filter architecture improvements (M26.1): `CommandMatcher` enum, `FilterConfidence`, `FilterPipeline`, `SecurityPatterns`, per-filter TOML config (#452)
- Token savings tracking and metrics for output filtering (#445)
- Smart tool output filtering: command-aware filters that compress tool output before context insertion
- `OutputFilter` trait and `OutputFilterRegistry` with first-match-wins dispatch
- `sanitize_output()` ANSI escape and progress bar stripping (runs on all tool output)
Expand Down Expand Up @@ -37,7 +42,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
- Extract bootstrap logic from main.rs into `zeph-core::bootstrap::AppBuilder` (#393): main.rs reduced from 2313 to 978 lines
- `SecurityConfig` and `TimeoutConfig` gain `Clone + Copy`
- `AnyChannel` moved from main.rs to zeph-channels crate
- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index)
- Remove 8 lightweight feature gates, make always-on: openai, compatible, orchestrator, router, self-learning, qdrant, vault-age, mcp (#438)
- Default features reduced to minimal set (empty after M26)
- Skill matcher concurrency reduced from 50 to 20
- `String::with_capacity` in context building loops
- CI updated to use `--features full`
Expand Down
47 changes: 20 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingF

## Why Zeph

**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed.
**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. Smart output filtering further reduces token consumption by 70-99% for common tool outputs (test results, git logs, clippy diagnostics, directory listings, log deduplication) — per-command filter stats are shown inline in CLI chat and aggregated in the TUI dashboard.

**Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.

Expand Down Expand Up @@ -118,7 +118,7 @@ cargo build --release --features tui
| **Skill Trust & Quarantine** | 4-tier trust model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity verification, anomaly detection with automatic blocking, and restricted tool access for untrusted skills | |
| **Prompt Caching** | Automatic prompt caching for Anthropic and OpenAI providers, reducing latency and cost on repeated context | |
| **Graceful Shutdown** | Ctrl-C triggers ordered teardown with MCP server cleanup and pending task draining | |
| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics (including filter savings), message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
| **Multi-Channel I/O** | CLI, Discord, Slack, Telegram, and TUI with streaming support | [Channels](https://bug-ops.github.io/zeph/guide/channels.html) |
| **Defense-in-Depth** | Shell sandbox with relative path traversal detection, file sandbox, command filter, secret redaction (Google/GitLab patterns), audit log, SSRF protection (agent + MCP), rate limiter TTL eviction, doom-loop detection, skill trust quarantine | [Security](https://bug-ops.github.io/zeph/security.html) |

Expand Down Expand Up @@ -155,34 +155,27 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o

## Feature Flags

| Feature | Default | Description |
|---------|---------|-------------|
| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) |
| `openai` | On | OpenAI provider |
| `qdrant` | On | Qdrant vector search for skills and MCP tools |
| `self-learning` | On | Skill evolution system |
| `vault-age` | On | Age-encrypted secret storage |
| `a2a` | Off | A2A protocol client and server |
| `candle` | Off | Local HuggingFace inference (GGUF) |
| `index` | Off | AST-based code indexing and semantic retrieval |
| `mcp` | Off | MCP client for external tool servers |
| `orchestrator` | Off | Multi-model routing with fallback |
| `router` | Off | Prompt-based model selection via RouterProvider |
| `discord` | Off | Discord bot with Gateway v10 WebSocket |
| `slack` | Off | Slack bot with Events API webhook |
| `gateway` | Off | HTTP gateway for webhook ingestion |
| `daemon` | Off | Daemon supervisor for component lifecycle |
| `scheduler` | Off | Cron-based periodic task scheduler |
| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana |
| `metal` | Off | Metal GPU acceleration (macOS) |
| `tui` | Off | ratatui TUI dashboard with real-time metrics |
| `cuda` | Off | CUDA GPU acceleration (Linux) |
The following features are always compiled in (no flag needed): `openai`, `compatible`, `orchestrator`, `router`, `self-learning`, `qdrant`, `vault-age`, `mcp`.

| Feature | Description |
|---------|-------------|
| `a2a` | A2A protocol client and server |
| `candle` | Local HuggingFace inference (GGUF) |
| `index` | AST-based code indexing and semantic retrieval |
| `discord` | Discord bot with Gateway v10 WebSocket |
| `slack` | Slack bot with Events API webhook |
| `gateway` | HTTP gateway for webhook ingestion |
| `daemon` | Daemon supervisor for component lifecycle |
| `scheduler` | Cron-based periodic task scheduler |
| `otel` | OpenTelemetry OTLP export for Prometheus/Grafana |
| `metal` | Metal GPU acceleration (macOS) |
| `tui` | ratatui TUI dashboard with real-time metrics |
| `cuda` | CUDA GPU acceleration (Linux) |

```bash
cargo build --release # default features only
cargo build --release --features full # all non-platform features
cargo build --release # default build (all always-on features included)
cargo build --release --features full # all optional features
cargo build --release --features metal # macOS Metal GPU
cargo build --release --no-default-features # minimal binary (Ollama + Claude only)
cargo build --release --features tui # with TUI dashboard
```

Expand Down
7 changes: 7 additions & 0 deletions crates/zeph-core/src/agent/streaming.rs
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,13 @@ impl<C: Channel, T: ToolExecutor> Agent<C, T> {
let display = self.maybe_redact(&formatted_output);
self.channel.send(&display).await?;

if let Some(ref fs) = output.filter_stats
&& fs.filtered_lines < fs.raw_lines
{
let stats_line = fs.format_inline(&output.tool_name);
self.channel.send(&stats_line).await?;
}

self.push_message(Message::from_parts(
Role::User,
vec![MessagePart::ToolOutput {
Expand Down
33 changes: 33 additions & 0 deletions crates/zeph-tools/src/executor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ pub struct ToolCall {
pub struct FilterStats {
pub raw_chars: usize,
pub filtered_chars: usize,
pub raw_lines: usize,
pub filtered_lines: usize,
pub confidence: Option<crate::FilterConfidence>,
}

Expand All @@ -30,6 +32,16 @@ impl FilterStats {
pub fn estimated_tokens_saved(&self) -> usize {
self.raw_chars.saturating_sub(self.filtered_chars) / 4
}

#[must_use]
pub fn format_inline(&self, tool_name: &str) -> String {
format!(
"[{tool_name}] {} lines -> {} lines, {:.1}% filtered",
self.raw_lines,
self.filtered_lines,
self.savings_pct()
)
}
}

/// Structured result from tool execution.
Expand Down Expand Up @@ -85,6 +97,7 @@ pub enum ToolEvent {
command: String,
output: String,
success: bool,
filter_stats: Option<FilterStats>,
},
}

Expand Down Expand Up @@ -293,4 +306,24 @@ mod tests {
};
assert_eq!(fs.estimated_tokens_saved(), 200); // (1000 - 200) / 4
}

#[test]
fn filter_stats_format_inline() {
let fs = FilterStats {
raw_chars: 1000,
filtered_chars: 200,
raw_lines: 342,
filtered_lines: 28,
..Default::default()
};
let line = fs.format_inline("shell");
assert_eq!(line, "[shell] 342 lines -> 28 lines, 80.0% filtered");
}

#[test]
fn filter_stats_format_inline_zero() {
let fs = FilterStats::default();
let line = fs.format_inline("bash");
assert_eq!(line, "[bash] 0 lines -> 0 lines, 0.0% filtered");
}
}
33 changes: 33 additions & 0 deletions crates/zeph-tools/src/filter/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ pub struct FilterResult {
pub output: String,
pub raw_chars: usize,
pub filtered_chars: usize,
pub raw_lines: usize,
pub filtered_lines: usize,
pub confidence: FilterConfidence,
}

Expand Down Expand Up @@ -131,6 +133,8 @@ impl<'a> FilterPipeline<'a> {
FilterResult {
raw_chars: initial_len,
filtered_chars: current.len(),
raw_lines: count_lines(output),
filtered_lines: count_lines(&current),
output: current,
confidence: worst,
}
Expand Down Expand Up @@ -531,9 +535,15 @@ pub fn sanitize_output(raw: &str) -> String {
result
}

fn count_lines(s: &str) -> usize {
if s.is_empty() { 0 } else { s.lines().count() }
}

fn make_result(raw: &str, output: String, confidence: FilterConfidence) -> FilterResult {
let filtered_chars = output.len();
FilterResult {
raw_lines: count_lines(raw),
filtered_lines: count_lines(&output),
output,
raw_chars: raw.len(),
filtered_chars,
Expand Down Expand Up @@ -577,6 +587,8 @@ mod tests {
output: String::new(),
raw_chars: 1000,
filtered_chars: 200,
raw_lines: 0,
filtered_lines: 0,
confidence: FilterConfidence::Full,
};
assert!((r.savings_pct() - 80.0).abs() < 0.01);
Expand All @@ -588,11 +600,30 @@ mod tests {
output: String::new(),
raw_chars: 0,
filtered_chars: 0,
raw_lines: 0,
filtered_lines: 0,
confidence: FilterConfidence::Full,
};
assert!((r.savings_pct()).abs() < 0.01);
}

#[test]
fn count_lines_helper() {
assert_eq!(count_lines(""), 0);
assert_eq!(count_lines("one"), 1);
assert_eq!(count_lines("one\ntwo\nthree"), 3);
assert_eq!(count_lines("trailing\n"), 1);
}

#[test]
fn make_result_counts_lines() {
let raw = "line1\nline2\nline3\nline4\nline5";
let filtered = "line1\nline3".to_owned();
let r = make_result(raw, filtered, FilterConfidence::Full);
assert_eq!(r.raw_lines, 5);
assert_eq!(r.filtered_lines, 2);
}

#[test]
fn registry_disabled_returns_none() {
let r = OutputFilterRegistry::new(false);
Expand Down Expand Up @@ -751,6 +782,8 @@ extra_patterns = ["TODO: security review"]
output: "short".into(),
raw_chars: 100,
filtered_chars: 5,
raw_lines: 10,
filtered_lines: 1,
confidence: FilterConfidence::Full,
};
m.record(&r);
Expand Down
30 changes: 21 additions & 9 deletions crates/zeph-tools/src/shell.rs
Original file line number Diff line number Diff line change
Expand Up @@ -207,16 +207,8 @@ impl ShellExecutor {
};
self.log_audit(block, result, duration_ms).await;

if let Some(ref tx) = self.tool_event_tx {
let _ = tx.send(ToolEvent::Completed {
tool_name: "bash".to_owned(),
command: (*block).to_owned(),
output: out.clone(),
success: !out.contains("[error]"),
});
}

let sanitized = sanitize_output(&out);
let mut per_block_stats: Option<FilterStats> = None;
let filtered = if let Some(ref registry) = self.output_filter_registry {
match registry.apply(block, &sanitized, exit_code) {
Some(fr) => {
Expand All @@ -227,21 +219,41 @@ impl ShellExecutor {
savings_pct = fr.savings_pct(),
"output filter applied"
);
let block_fs = FilterStats {
raw_chars: fr.raw_chars,
filtered_chars: fr.filtered_chars,
raw_lines: fr.raw_lines,
filtered_lines: fr.filtered_lines,
confidence: Some(fr.confidence),
};
let stats =
cumulative_filter_stats.get_or_insert_with(FilterStats::default);
stats.raw_chars += fr.raw_chars;
stats.filtered_chars += fr.filtered_chars;
stats.raw_lines += fr.raw_lines;
stats.filtered_lines += fr.filtered_lines;
stats.confidence = Some(match (stats.confidence, fr.confidence) {
(Some(prev), cur) => crate::filter::worse_confidence(prev, cur),
(None, cur) => cur,
});
per_block_stats = Some(block_fs);
fr.output
}
None => sanitized,
}
} else {
sanitized
};

if let Some(ref tx) = self.tool_event_tx {
let _ = tx.send(ToolEvent::Completed {
tool_name: "bash".to_owned(),
command: (*block).to_owned(),
output: out.clone(),
success: !out.contains("[error]"),
filter_stats: per_block_stats,
});
}
outputs.push(format!("$ {block}\n{filtered}"));
}

Expand Down
24 changes: 24 additions & 0 deletions docs/src/architecture/token-efficiency.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,30 @@ MCP tools follow the same pipeline:

Prompt size stays constant as you add more capabilities. The only cost of more skills is a slightly larger embedding index in Qdrant or memory.

### Output Filter Pipeline

Tool output is compressed before it enters the LLM context. A command-aware filter pipeline matches each shell command against a set of built-in filters (test runner output, Clippy diagnostics, git log/diff, directory listings, log deduplication) and strips noise while preserving signal. The pipeline runs synchronously inside the tool executor, so the LLM never sees raw output.

Typical savings by command type:

| Command | Raw lines | Filtered lines | Savings |
|---------|-----------|----------------|---------|
| `cargo test` (100 passing, 2 failing) | ~340 | ~30 | ~91% |
| `cargo clippy` (many warnings) | ~200 | ~50 | ~75% |
| `git log --oneline -50` | 50 | 20 | 60% |

After each filtered execution, CLI mode prints a one-line stats summary and TUI mode accumulates the savings in the Resources panel. See [Tool System — Output Filter Pipeline](../guide/tools.md#output-filter-pipeline) for configuration details.

### Token Savings Tracking

`MetricsSnapshot` tracks cumulative filter metrics across the session:

- `filter_raw_tokens` / `filter_saved_tokens` — aggregate volume before and after filtering
- `filter_total_commands` / `filter_filtered_commands` — hit rate denominator/numerator
- `filter_confidence_full/partial/fallback` — distribution of filter confidence levels

These feed into the [TUI filter metrics display](../guide/tui.md#filter-metrics) and are emitted as `tracing::debug!` every 50 commands.

### Two-Tier Context Pruning

Long conversations accumulate tool outputs that consume significant context space. Zeph uses a two-tier strategy: Tier 1 selectively prunes old tool outputs (cheap, no LLM call), and Tier 2 falls back to full LLM compaction only when Tier 1 is insufficient. See [Context Engineering](../guide/context.md) for details.
Expand Down
Loading
Loading