diff --git a/CHANGELOG.md b/CHANGELOG.md index e87dc700..79e3237e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] ### Added +- Per-tool inline filter stats in CLI chat: `[shell] cargo test (342 lines -> 28 lines, 91.8% filtered)` (#449) +- Filter metrics in TUI Resources panel: confidence distribution, command hit rate, token savings (#448) +- Periodic 250ms tick in TUI event loop for real-time metrics refresh (#447) +- Output filter architecture improvements (M26.1): `CommandMatcher` enum, `FilterConfidence`, `FilterPipeline`, `SecurityPatterns`, per-filter TOML config (#452) +- Token savings tracking and metrics for output filtering (#445) - Smart tool output filtering: command-aware filters that compress tool output before context insertion - `OutputFilter` trait and `OutputFilterRegistry` with first-match-wins dispatch - `sanitize_output()` ANSI escape and progress bar stripping (runs on all tool output) @@ -37,7 +42,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). - Extract bootstrap logic from main.rs into `zeph-core::bootstrap::AppBuilder` (#393): main.rs reduced from 2313 to 978 lines - `SecurityConfig` and `TimeoutConfig` gain `Clone + Copy` - `AnyChannel` moved from main.rs to zeph-channels crate -- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index) +- Remove 8 lightweight feature gates, make always-on: openai, compatible, orchestrator, router, self-learning, qdrant, vault-age, mcp (#438) +- Default features reduced to minimal set (empty after M26) - Skill matcher concurrency reduced from 50 to 20 - `String::with_capacity` in context building loops - CI updated to use `--features full` diff --git a/README.md b/README.md index 07d75367..a79ff7ab 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingF ## Why Zeph -**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. +**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. Smart output filtering further reduces token consumption by 70-99% for common tool outputs (test results, git logs, clippy diagnostics, directory listings, log deduplication) — per-command filter stats are shown inline in CLI chat and aggregated in the TUI dashboard. **Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart. @@ -118,7 +118,7 @@ cargo build --release --features tui | **Skill Trust & Quarantine** | 4-tier trust model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity verification, anomaly detection with automatic blocking, and restricted tool access for untrusted skills | | | **Prompt Caching** | Automatic prompt caching for Anthropic and OpenAI providers, reducing latency and cost on repeated context | | | **Graceful Shutdown** | Ctrl-C triggers ordered teardown with MCP server cleanup and pending task draining | | -| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) | +| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics (including filter savings), message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) | | **Multi-Channel I/O** | CLI, Discord, Slack, Telegram, and TUI with streaming support | [Channels](https://bug-ops.github.io/zeph/guide/channels.html) | | **Defense-in-Depth** | Shell sandbox with relative path traversal detection, file sandbox, command filter, secret redaction (Google/GitLab patterns), audit log, SSRF protection (agent + MCP), rate limiter TTL eviction, doom-loop detection, skill trust quarantine | [Security](https://bug-ops.github.io/zeph/security.html) | @@ -155,34 +155,27 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o ## Feature Flags -| Feature | Default | Description | -|---------|---------|-------------| -| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) | -| `openai` | On | OpenAI provider | -| `qdrant` | On | Qdrant vector search for skills and MCP tools | -| `self-learning` | On | Skill evolution system | -| `vault-age` | On | Age-encrypted secret storage | -| `a2a` | Off | A2A protocol client and server | -| `candle` | Off | Local HuggingFace inference (GGUF) | -| `index` | Off | AST-based code indexing and semantic retrieval | -| `mcp` | Off | MCP client for external tool servers | -| `orchestrator` | Off | Multi-model routing with fallback | -| `router` | Off | Prompt-based model selection via RouterProvider | -| `discord` | Off | Discord bot with Gateway v10 WebSocket | -| `slack` | Off | Slack bot with Events API webhook | -| `gateway` | Off | HTTP gateway for webhook ingestion | -| `daemon` | Off | Daemon supervisor for component lifecycle | -| `scheduler` | Off | Cron-based periodic task scheduler | -| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana | -| `metal` | Off | Metal GPU acceleration (macOS) | -| `tui` | Off | ratatui TUI dashboard with real-time metrics | -| `cuda` | Off | CUDA GPU acceleration (Linux) | +The following features are always compiled in (no flag needed): `openai`, `compatible`, `orchestrator`, `router`, `self-learning`, `qdrant`, `vault-age`, `mcp`. + +| Feature | Description | +|---------|-------------| +| `a2a` | A2A protocol client and server | +| `candle` | Local HuggingFace inference (GGUF) | +| `index` | AST-based code indexing and semantic retrieval | +| `discord` | Discord bot with Gateway v10 WebSocket | +| `slack` | Slack bot with Events API webhook | +| `gateway` | HTTP gateway for webhook ingestion | +| `daemon` | Daemon supervisor for component lifecycle | +| `scheduler` | Cron-based periodic task scheduler | +| `otel` | OpenTelemetry OTLP export for Prometheus/Grafana | +| `metal` | Metal GPU acceleration (macOS) | +| `tui` | ratatui TUI dashboard with real-time metrics | +| `cuda` | CUDA GPU acceleration (Linux) | ```bash -cargo build --release # default features only -cargo build --release --features full # all non-platform features +cargo build --release # default build (all always-on features included) +cargo build --release --features full # all optional features cargo build --release --features metal # macOS Metal GPU -cargo build --release --no-default-features # minimal binary (Ollama + Claude only) cargo build --release --features tui # with TUI dashboard ``` diff --git a/crates/zeph-core/src/agent/streaming.rs b/crates/zeph-core/src/agent/streaming.rs index c0a35394..abfca456 100644 --- a/crates/zeph-core/src/agent/streaming.rs +++ b/crates/zeph-core/src/agent/streaming.rs @@ -313,6 +313,13 @@ impl Agent { let display = self.maybe_redact(&formatted_output); self.channel.send(&display).await?; + if let Some(ref fs) = output.filter_stats + && fs.filtered_lines < fs.raw_lines + { + let stats_line = fs.format_inline(&output.tool_name); + self.channel.send(&stats_line).await?; + } + self.push_message(Message::from_parts( Role::User, vec![MessagePart::ToolOutput { diff --git a/crates/zeph-tools/src/executor.rs b/crates/zeph-tools/src/executor.rs index afcda1d3..f38ad0b9 100644 --- a/crates/zeph-tools/src/executor.rs +++ b/crates/zeph-tools/src/executor.rs @@ -13,6 +13,8 @@ pub struct ToolCall { pub struct FilterStats { pub raw_chars: usize, pub filtered_chars: usize, + pub raw_lines: usize, + pub filtered_lines: usize, pub confidence: Option, } @@ -30,6 +32,16 @@ impl FilterStats { pub fn estimated_tokens_saved(&self) -> usize { self.raw_chars.saturating_sub(self.filtered_chars) / 4 } + + #[must_use] + pub fn format_inline(&self, tool_name: &str) -> String { + format!( + "[{tool_name}] {} lines -> {} lines, {:.1}% filtered", + self.raw_lines, + self.filtered_lines, + self.savings_pct() + ) + } } /// Structured result from tool execution. @@ -85,6 +97,7 @@ pub enum ToolEvent { command: String, output: String, success: bool, + filter_stats: Option, }, } @@ -293,4 +306,24 @@ mod tests { }; assert_eq!(fs.estimated_tokens_saved(), 200); // (1000 - 200) / 4 } + + #[test] + fn filter_stats_format_inline() { + let fs = FilterStats { + raw_chars: 1000, + filtered_chars: 200, + raw_lines: 342, + filtered_lines: 28, + ..Default::default() + }; + let line = fs.format_inline("shell"); + assert_eq!(line, "[shell] 342 lines -> 28 lines, 80.0% filtered"); + } + + #[test] + fn filter_stats_format_inline_zero() { + let fs = FilterStats::default(); + let line = fs.format_inline("bash"); + assert_eq!(line, "[bash] 0 lines -> 0 lines, 0.0% filtered"); + } } diff --git a/crates/zeph-tools/src/filter/mod.rs b/crates/zeph-tools/src/filter/mod.rs index e3bb5256..1db617e4 100644 --- a/crates/zeph-tools/src/filter/mod.rs +++ b/crates/zeph-tools/src/filter/mod.rs @@ -38,6 +38,8 @@ pub struct FilterResult { pub output: String, pub raw_chars: usize, pub filtered_chars: usize, + pub raw_lines: usize, + pub filtered_lines: usize, pub confidence: FilterConfidence, } @@ -131,6 +133,8 @@ impl<'a> FilterPipeline<'a> { FilterResult { raw_chars: initial_len, filtered_chars: current.len(), + raw_lines: count_lines(output), + filtered_lines: count_lines(¤t), output: current, confidence: worst, } @@ -531,9 +535,15 @@ pub fn sanitize_output(raw: &str) -> String { result } +fn count_lines(s: &str) -> usize { + if s.is_empty() { 0 } else { s.lines().count() } +} + fn make_result(raw: &str, output: String, confidence: FilterConfidence) -> FilterResult { let filtered_chars = output.len(); FilterResult { + raw_lines: count_lines(raw), + filtered_lines: count_lines(&output), output, raw_chars: raw.len(), filtered_chars, @@ -577,6 +587,8 @@ mod tests { output: String::new(), raw_chars: 1000, filtered_chars: 200, + raw_lines: 0, + filtered_lines: 0, confidence: FilterConfidence::Full, }; assert!((r.savings_pct() - 80.0).abs() < 0.01); @@ -588,11 +600,30 @@ mod tests { output: String::new(), raw_chars: 0, filtered_chars: 0, + raw_lines: 0, + filtered_lines: 0, confidence: FilterConfidence::Full, }; assert!((r.savings_pct()).abs() < 0.01); } + #[test] + fn count_lines_helper() { + assert_eq!(count_lines(""), 0); + assert_eq!(count_lines("one"), 1); + assert_eq!(count_lines("one\ntwo\nthree"), 3); + assert_eq!(count_lines("trailing\n"), 1); + } + + #[test] + fn make_result_counts_lines() { + let raw = "line1\nline2\nline3\nline4\nline5"; + let filtered = "line1\nline3".to_owned(); + let r = make_result(raw, filtered, FilterConfidence::Full); + assert_eq!(r.raw_lines, 5); + assert_eq!(r.filtered_lines, 2); + } + #[test] fn registry_disabled_returns_none() { let r = OutputFilterRegistry::new(false); @@ -751,6 +782,8 @@ extra_patterns = ["TODO: security review"] output: "short".into(), raw_chars: 100, filtered_chars: 5, + raw_lines: 10, + filtered_lines: 1, confidence: FilterConfidence::Full, }; m.record(&r); diff --git a/crates/zeph-tools/src/shell.rs b/crates/zeph-tools/src/shell.rs index da5321fd..a812bd8c 100644 --- a/crates/zeph-tools/src/shell.rs +++ b/crates/zeph-tools/src/shell.rs @@ -207,16 +207,8 @@ impl ShellExecutor { }; self.log_audit(block, result, duration_ms).await; - if let Some(ref tx) = self.tool_event_tx { - let _ = tx.send(ToolEvent::Completed { - tool_name: "bash".to_owned(), - command: (*block).to_owned(), - output: out.clone(), - success: !out.contains("[error]"), - }); - } - let sanitized = sanitize_output(&out); + let mut per_block_stats: Option = None; let filtered = if let Some(ref registry) = self.output_filter_registry { match registry.apply(block, &sanitized, exit_code) { Some(fr) => { @@ -227,14 +219,24 @@ impl ShellExecutor { savings_pct = fr.savings_pct(), "output filter applied" ); + let block_fs = FilterStats { + raw_chars: fr.raw_chars, + filtered_chars: fr.filtered_chars, + raw_lines: fr.raw_lines, + filtered_lines: fr.filtered_lines, + confidence: Some(fr.confidence), + }; let stats = cumulative_filter_stats.get_or_insert_with(FilterStats::default); stats.raw_chars += fr.raw_chars; stats.filtered_chars += fr.filtered_chars; + stats.raw_lines += fr.raw_lines; + stats.filtered_lines += fr.filtered_lines; stats.confidence = Some(match (stats.confidence, fr.confidence) { (Some(prev), cur) => crate::filter::worse_confidence(prev, cur), (None, cur) => cur, }); + per_block_stats = Some(block_fs); fr.output } None => sanitized, @@ -242,6 +244,16 @@ impl ShellExecutor { } else { sanitized }; + + if let Some(ref tx) = self.tool_event_tx { + let _ = tx.send(ToolEvent::Completed { + tool_name: "bash".to_owned(), + command: (*block).to_owned(), + output: out.clone(), + success: !out.contains("[error]"), + filter_stats: per_block_stats, + }); + } outputs.push(format!("$ {block}\n{filtered}")); } diff --git a/docs/src/architecture/token-efficiency.md b/docs/src/architecture/token-efficiency.md index 8a79b90a..0b573226 100644 --- a/docs/src/architecture/token-efficiency.md +++ b/docs/src/architecture/token-efficiency.md @@ -55,6 +55,30 @@ MCP tools follow the same pipeline: Prompt size stays constant as you add more capabilities. The only cost of more skills is a slightly larger embedding index in Qdrant or memory. +### Output Filter Pipeline + +Tool output is compressed before it enters the LLM context. A command-aware filter pipeline matches each shell command against a set of built-in filters (test runner output, Clippy diagnostics, git log/diff, directory listings, log deduplication) and strips noise while preserving signal. The pipeline runs synchronously inside the tool executor, so the LLM never sees raw output. + +Typical savings by command type: + +| Command | Raw lines | Filtered lines | Savings | +|---------|-----------|----------------|---------| +| `cargo test` (100 passing, 2 failing) | ~340 | ~30 | ~91% | +| `cargo clippy` (many warnings) | ~200 | ~50 | ~75% | +| `git log --oneline -50` | 50 | 20 | 60% | + +After each filtered execution, CLI mode prints a one-line stats summary and TUI mode accumulates the savings in the Resources panel. See [Tool System — Output Filter Pipeline](../guide/tools.md#output-filter-pipeline) for configuration details. + +### Token Savings Tracking + +`MetricsSnapshot` tracks cumulative filter metrics across the session: + +- `filter_raw_tokens` / `filter_saved_tokens` — aggregate volume before and after filtering +- `filter_total_commands` / `filter_filtered_commands` — hit rate denominator/numerator +- `filter_confidence_full/partial/fallback` — distribution of filter confidence levels + +These feed into the [TUI filter metrics display](../guide/tui.md#filter-metrics) and are emitted as `tracing::debug!` every 50 commands. + ### Two-Tier Context Pruning Long conversations accumulate tool outputs that consume significant context space. Zeph uses a two-tier strategy: Tier 1 selectively prunes old tool outputs (cheap, no LLM call), and Tier 2 falls back to full LLM compaction only when Tier 1 is insufficient. See [Context Engineering](../guide/context.md) for details. diff --git a/docs/src/feature-flags.md b/docs/src/feature-flags.md index 037b5b88..3f9c17f1 100644 --- a/docs/src/feature-flags.md +++ b/docs/src/feature-flags.md @@ -1,42 +1,49 @@ # Feature Flags -Zeph uses Cargo feature flags to control optional functionality. Default features cover common use cases; platform-specific and experimental features are opt-in. - -| Feature | Default | Description | -|---------|---------|-------------| -| `compatible` | Enabled | `CompatibleProvider` for OpenAI-compatible third-party APIs | -| `openai` | Enabled | OpenAI-compatible provider (GPT, Together, Groq, Fireworks, etc.) | -| `qdrant` | Enabled | Qdrant-backed vector storage for skill matching (`zeph-skills`) and MCP tool registry (`zeph-mcp`) | -| `self-learning` | Enabled | Skill evolution via failure detection, self-reflection, and LLM-generated improvements | -| `vault-age` | Enabled | Age-encrypted vault backend for file-based secret storage ([age](https://age-encryption.org/)) | -| `a2a` | Disabled | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication | -| `candle` | Disabled | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) | -| `index` | Disabled | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) | -| `mcp` | Disabled | MCP client for external tool servers via stdio/HTTP transport | -| `orchestrator` | Disabled | Multi-model routing with task-based classification and fallback chains | -| `router` | Disabled | `RouterProvider` for chaining multiple providers with fallback | -| `discord` | Disabled | Discord channel adapter with Gateway v10 WebSocket and slash commands ([guide](guide/channels.md#discord-channel)) | -| `slack` | Disabled | Slack channel adapter with Events API webhook and HMAC-SHA256 verification ([guide](guide/channels.md#slack-channel)) | -| `otel` | Disabled | OpenTelemetry tracing export via OTLP/gRPC ([guide](guide/observability.md)) | -| `gateway` | Disabled | HTTP gateway for webhook ingestion with bearer auth and rate limiting ([guide](guide/gateway.md)) | -| `daemon` | Disabled | Daemon supervisor with component lifecycle, PID file, and health monitoring ([guide](guide/daemon.md)) | -| `scheduler` | Disabled | Cron-based periodic task scheduler with SQLite persistence ([guide](guide/scheduler.md)) | -| `tui` | Disabled | ratatui-based TUI dashboard with real-time agent metrics | -| `metal` | Disabled | Metal GPU acceleration for candle on macOS (implies `candle`) | -| `cuda` | Disabled | CUDA GPU acceleration for candle on Linux (implies `candle`) | +Zeph uses Cargo feature flags to control optional functionality. As of M26, eight previously optional features are now always-on and compiled into every build. The remaining optional features are explicitly opt-in. + +## Always-On (compiled unconditionally) + +| Feature | Description | +|---------|-------------| +| `openai` | OpenAI-compatible provider (GPT, Together, Groq, Fireworks, etc.) | +| `compatible` | `CompatibleProvider` for OpenAI-compatible third-party APIs | +| `orchestrator` | Multi-model routing with task-based classification and fallback chains | +| `router` | `RouterProvider` for chaining multiple providers with fallback | +| `self-learning` | Skill evolution via failure detection, self-reflection, and LLM-generated improvements | +| `qdrant` | Qdrant-backed vector storage for skill matching and MCP tool registry | +| `vault-age` | Age-encrypted vault backend for file-based secret storage ([age](https://age-encryption.org/)) | +| `mcp` | MCP client for external tool servers via stdio/HTTP transport | + +## Optional Features + +| Feature | Description | +|---------|-------------| +| `tui` | ratatui-based TUI dashboard with real-time agent metrics | +| `candle` | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) | +| `metal` | Metal GPU acceleration for candle on macOS (implies `candle`) | +| `cuda` | CUDA GPU acceleration for candle on Linux (implies `candle`) | +| `discord` | Discord channel adapter with Gateway v10 WebSocket and slash commands ([guide](guide/channels.md#discord-channel)) | +| `slack` | Slack channel adapter with Events API webhook and HMAC-SHA256 verification ([guide](guide/channels.md#slack-channel)) | +| `a2a` | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication | +| `index` | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) | +| `gateway` | HTTP gateway for webhook ingestion with bearer auth and rate limiting ([guide](guide/gateway.md)) | +| `daemon` | Daemon supervisor with component lifecycle, PID file, and health monitoring ([guide](guide/daemon.md)) | +| `scheduler` | Cron-based periodic task scheduler with SQLite persistence ([guide](guide/scheduler.md)) | +| `otel` | OpenTelemetry tracing export via OTLP/gRPC ([guide](guide/observability.md)) | +| `mock` | Mock providers and channels for testing | ## Build Examples ```bash -cargo build --release # all default features -cargo build --release --features metal # macOS with Metal GPU -cargo build --release --features cuda # Linux with NVIDIA GPU -cargo build --release --features tui # with TUI dashboard -cargo build --release --features discord # with Discord bot -cargo build --release --features slack # with Slack bot +cargo build --release # default build (always-on features included) +cargo build --release --features metal # macOS with Metal GPU +cargo build --release --features cuda # Linux with NVIDIA GPU +cargo build --release --features tui # with TUI dashboard +cargo build --release --features discord # with Discord bot +cargo build --release --features slack # with Slack bot cargo build --release --features gateway,daemon,scheduler # with infrastructure components cargo build --release --features full # all optional features -cargo build --release --no-default-features # minimal binary ``` The `full` feature enables every optional feature except `metal`, `cuda`, and `otel`. diff --git a/docs/src/guide/tools.md b/docs/src/guide/tools.md index c7903b80..ce0f9d02 100644 --- a/docs/src/guide/tools.md +++ b/docs/src/guide/tools.md @@ -111,6 +111,80 @@ Tool output exceeding 30 000 characters is truncated (head + tail split) before Stale overflow files older than 24 hours are cleaned up automatically on startup. +## Output Filter Pipeline + +Before tool output reaches the LLM context, it passes through a command-aware filter pipeline that strips noise and reduces token consumption. Filters are matched by command pattern and composed in sequence. + +### Built-in Filters + +| Filter | Matches | What it removes | +|--------|---------|----------------| +| `TestOutputFilter` | `cargo test`, `cargo nextest`, `pytest`, `go test` | Passing test lines, verbose output; keeps failures and summary | +| `ClippyFilter` | `cargo clippy` | Duplicate diagnostic paths, redundant `help:` lines | +| `GitFilter` | `git log`, `git diff` | Limits log entries (default: 20), diff line count (default: 500) | +| `DirListingFilter` | `ls`, `find`, `tree` | Collapses redundant whitespace and deduplicates paths | +| `LogDedupFilter` | any command with repetitive log output | Deduplicates consecutive identical lines | + +All filters also strip ANSI escape sequences, carriage-return progress bars, and collapse consecutive blank lines (`sanitize_output`). + +### Security Pass + +After filtering, a security scan runs over the **raw** (pre-filter) output. If credential-shaped patterns are found (API keys, tokens, passwords), a warning is appended to the filtered output so the LLM is aware without exposing the value. Additional regex patterns can be configured via `[tools.filters.security] extra_patterns`. + +### FilterConfidence + +Each filter reports a confidence level: + +| Level | Meaning | +|-------|---------| +| `Full` | Filter is certain it handled this output correctly | +| `Partial` | Heuristic match; some content may have been over-filtered | +| `Fallback` | Pattern matched but output structure was unexpected | + +When multiple filters compose in a pipeline, the worst confidence across stages is propagated. Confidence distribution is tracked in [TUI filter metrics](tui.md#filter-metrics). + +### Inline Filter Stats (CLI) + +In CLI mode, after each filtered tool execution a one-line summary is printed to the conversation: + +``` +[shell] 342 lines -> 28 lines, 91.8% filtered +``` + +This appears only when lines were actually removed. It lets you verify the filter is working and estimate token savings without opening the TUI. + +### Configuration + +```toml +[tools.filters] +enabled = true # Master switch (default: true) + +[tools.filters.test] +enabled = true +max_failures = 10 # Max failing tests to show (default: 10) +truncate_stack_trace = 50 # Stack trace line limit (default: 50) + +[tools.filters.git] +enabled = true +max_log_entries = 20 # Max git log entries (default: 20) +max_diff_lines = 500 # Max diff lines (default: 500) + +[tools.filters.clippy] +enabled = true + +[tools.filters.dir_listing] +enabled = true + +[tools.filters.log_dedup] +enabled = true + +[tools.filters.security] +enabled = true +extra_patterns = [] # Additional regex patterns to flag as credentials +``` + +Individual filters can be disabled without affecting others. + ## Configuration ```toml diff --git a/docs/src/guide/tui.md b/docs/src/guide/tui.md index 3bc78612..63b565a1 100644 --- a/docs/src/guide/tui.md +++ b/docs/src/guide/tui.md @@ -150,13 +150,13 @@ The TUI adapts to terminal width: ## Live Metrics -The TUI dashboard displays real-time metrics collected from the agent loop via `tokio::sync::watch` channel: +The TUI dashboard displays real-time metrics collected from the agent loop via `tokio::sync::watch` channel. The render loop polls the watch receiver before every frame at 250 ms intervals, so the display updates continuously even without user input. | Panel | Metrics | |-------|---------| | **Skills** | Active/total skill count, matched skill names per query | | **Memory** | SQLite message count, conversation ID, Qdrant status, embeddings generated, summaries count, tool output prunes | -| **Resources** | Prompt/completion/total tokens, API calls, last LLM latency (ms), provider and model name | +| **Resources** | Prompt/completion/total tokens, API calls, last LLM latency (ms), provider and model name, prompt cache read/write tokens, filter stats | Metrics are updated at key instrumentation points in the agent loop: - After each LLM call (api_calls, latency, prompt tokens) @@ -164,9 +164,30 @@ Metrics are updated at key instrumentation points in the agent loop: - After skill matching (active skills, total skills) - After message persistence (sqlite message count) - After summarization (summaries count) +- After each tool execution with filter applied (filter metrics) Token counts use a `chars/4` estimation (sufficient for dashboard display). +### Filter Metrics + +When the output filter pipeline has processed at least one command, the Resources panel shows: + +``` +Filter: 8/10 commands (80% hit rate) +Filter saved: 1240 tok (72%) +Confidence: F/6 P/2 B/0 +``` + +| Field | Meaning | +|-------|---------| +| `N/M commands` | Filtered / total commands through the pipeline | +| `hit rate` | Percentage of commands where output was actually reduced | +| `saved tokens` | Cumulative estimated tokens saved (`chars_saved / 4`) | +| `%` | Token savings as a fraction of raw token volume | +| `F/P/B` | Confidence distribution: Full / Partial / Fallback counts | + +The filter section only appears when `filter_applications > 0` — it is hidden when no commands have been filtered. + ## Deferred Model Warmup When running with Ollama (or an orchestrator with Ollama sub-providers), model warmup is deferred until after the TUI interface renders. This means: diff --git a/src/main.rs b/src/main.rs index 1829d8e3..c74c7fce 100644 --- a/src/main.rs +++ b/src/main.rs @@ -501,6 +501,7 @@ async fn forward_tool_events_to_tui( command, output, success, + .. } => zeph_tui::AgentEvent::ToolOutput { tool_name, command,