bug-ops · bug-ops · Feb 17, 2026 · Feb 17, 2026 · Feb 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ## [Unreleased]
 
 ### Added
+- Per-tool inline filter stats in CLI chat: `[shell] cargo test (342 lines -> 28 lines, 91.8% filtered)` (#449)
+- Filter metrics in TUI Resources panel: confidence distribution, command hit rate, token savings (#448)
+- Periodic 250ms tick in TUI event loop for real-time metrics refresh (#447)
+- Output filter architecture improvements (M26.1): `CommandMatcher` enum, `FilterConfidence`, `FilterPipeline`, `SecurityPatterns`, per-filter TOML config (#452)
+- Token savings tracking and metrics for output filtering (#445)
 - Smart tool output filtering: command-aware filters that compress tool output before context insertion
 - `OutputFilter` trait and `OutputFilterRegistry` with first-match-wins dispatch
 - `sanitize_output()` ANSI escape and progress bar stripping (runs on all tool output)
@@ -37,7 +42,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 - Extract bootstrap logic from main.rs into `zeph-core::bootstrap::AppBuilder` (#393): main.rs reduced from 2313 to 978 lines
 - `SecurityConfig` and `TimeoutConfig` gain `Clone + Copy`
 - `AnyChannel` moved from main.rs to zeph-channels crate
-- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index)
+- Remove 8 lightweight feature gates, make always-on: openai, compatible, orchestrator, router, self-learning, qdrant, vault-age, mcp (#438)
+- Default features reduced to minimal set (empty after M26)
 - Skill matcher concurrency reduced from 50 to 20
 - `String::with_capacity` in context building loops
 - CI updated to use `--features full`

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingF
 
 ## Why Zeph
 
-**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed.
+**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. Smart output filtering further reduces token consumption by 70-99% for common tool outputs (test results, git logs, clippy diagnostics, directory listings, log deduplication) — per-command filter stats are shown inline in CLI chat and aggregated in the TUI dashboard.
 
 **Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
 
@@ -118,7 +118,7 @@ cargo build --release --features tui
 | **Skill Trust & Quarantine** | 4-tier trust model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity verification, anomaly detection with automatic blocking, and restricted tool access for untrusted skills | |
 | **Prompt Caching** | Automatic prompt caching for Anthropic and OpenAI providers, reducing latency and cost on repeated context | |
 | **Graceful Shutdown** | Ctrl-C triggers ordered teardown with MCP server cleanup and pending task draining | |
-| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
+| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics (including filter savings), message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
 | **Multi-Channel I/O** | CLI, Discord, Slack, Telegram, and TUI with streaming support | [Channels](https://bug-ops.github.io/zeph/guide/channels.html) |
 | **Defense-in-Depth** | Shell sandbox with relative path traversal detection, file sandbox, command filter, secret redaction (Google/GitLab patterns), audit log, SSRF protection (agent + MCP), rate limiter TTL eviction, doom-loop detection, skill trust quarantine | [Security](https://bug-ops.github.io/zeph/security.html) |
 
@@ -155,34 +155,27 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o
 
 ## Feature Flags
 
-| Feature | Default | Description |
-|---------|---------|-------------|
-| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) |
-| `openai` | On | OpenAI provider |
-| `qdrant` | On | Qdrant vector search for skills and MCP tools |
-| `self-learning` | On | Skill evolution system |
-| `vault-age` | On | Age-encrypted secret storage |
-| `a2a` | Off | A2A protocol client and server |
-| `candle` | Off | Local HuggingFace inference (GGUF) |
-| `index` | Off | AST-based code indexing and semantic retrieval |
-| `mcp` | Off | MCP client for external tool servers |
-| `orchestrator` | Off | Multi-model routing with fallback |
-| `router` | Off | Prompt-based model selection via RouterProvider |
-| `discord` | Off | Discord bot with Gateway v10 WebSocket |
-| `slack` | Off | Slack bot with Events API webhook |
-| `gateway` | Off | HTTP gateway for webhook ingestion |
-| `daemon` | Off | Daemon supervisor for component lifecycle |
-| `scheduler` | Off | Cron-based periodic task scheduler |
-| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana |
-| `metal` | Off | Metal GPU acceleration (macOS) |
-| `tui` | Off | ratatui TUI dashboard with real-time metrics |
-| `cuda` | Off | CUDA GPU acceleration (Linux) |
+The following features are always compiled in (no flag needed): `openai`, `compatible`, `orchestrator`, `router`, `self-learning`, `qdrant`, `vault-age`, `mcp`.
+
+| Feature | Description |
+|---------|-------------|
+| `a2a` | A2A protocol client and server |
+| `candle` | Local HuggingFace inference (GGUF) |
+| `index` | AST-based code indexing and semantic retrieval |
+| `discord` | Discord bot with Gateway v10 WebSocket |
+| `slack` | Slack bot with Events API webhook |
+| `gateway` | HTTP gateway for webhook ingestion |
+| `daemon` | Daemon supervisor for component lifecycle |
+| `scheduler` | Cron-based periodic task scheduler |
+| `otel` | OpenTelemetry OTLP export for Prometheus/Grafana |
+| `metal` | Metal GPU acceleration (macOS) |
+| `tui` | ratatui TUI dashboard with real-time metrics |
+| `cuda` | CUDA GPU acceleration (Linux) |
 
 ```bash
-cargo build --release                        # default features only
-cargo build --release --features full        # all non-platform features
+cargo build --release                        # default build (all always-on features included)
+cargo build --release --features full        # all optional features
 cargo build --release --features metal       # macOS Metal GPU
-cargo build --release --no-default-features  # minimal binary (Ollama + Claude only)
 cargo build --release --features tui         # with TUI dashboard
 ```
 

diff --git a/crates/zeph-core/src/agent/streaming.rs b/crates/zeph-core/src/agent/streaming.rs
@@ -313,6 +313,13 @@ impl<C: Channel, T: ToolExecutor> Agent<C, T> {
                 let display = self.maybe_redact(&formatted_output);
                 self.channel.send(&display).await?;
 
+                if let Some(ref fs) = output.filter_stats
+                    && fs.filtered_lines < fs.raw_lines
+                {
+                    let stats_line = fs.format_inline(&output.tool_name);
+                    self.channel.send(&stats_line).await?;
+                }
+
                 self.push_message(Message::from_parts(
                     Role::User,
                     vec![MessagePart::ToolOutput {

diff --git a/crates/zeph-tools/src/executor.rs b/crates/zeph-tools/src/executor.rs
@@ -13,6 +13,8 @@ pub struct ToolCall {
 pub struct FilterStats {
     pub raw_chars: usize,
     pub filtered_chars: usize,
+    pub raw_lines: usize,
+    pub filtered_lines: usize,
     pub confidence: Option<crate::FilterConfidence>,
 }
 
@@ -30,6 +32,16 @@ impl FilterStats {
     pub fn estimated_tokens_saved(&self) -> usize {
         self.raw_chars.saturating_sub(self.filtered_chars) / 4
     }
+
+    #[must_use]
+    pub fn format_inline(&self, tool_name: &str) -> String {
+        format!(
+            "[{tool_name}] {} lines -> {} lines, {:.1}% filtered",
+            self.raw_lines,
+            self.filtered_lines,
+            self.savings_pct()
+        )
+    }
 }
 
 /// Structured result from tool execution.
@@ -85,6 +97,7 @@ pub enum ToolEvent {
         command: String,
         output: String,
         success: bool,
+        filter_stats: Option<FilterStats>,
     },
 }
 
@@ -293,4 +306,24 @@ mod tests {
         };
         assert_eq!(fs.estimated_tokens_saved(), 200); // (1000 - 200) / 4
     }
+
+    #[test]
+    fn filter_stats_format_inline() {
+        let fs = FilterStats {
+            raw_chars: 1000,
+            filtered_chars: 200,
+            raw_lines: 342,
+            filtered_lines: 28,
+            ..Default::default()
+        };
+        let line = fs.format_inline("shell");
+        assert_eq!(line, "[shell] 342 lines -> 28 lines, 80.0% filtered");
+    }
+
+    #[test]
+    fn filter_stats_format_inline_zero() {
+        let fs = FilterStats::default();
+        let line = fs.format_inline("bash");
+        assert_eq!(line, "[bash] 0 lines -> 0 lines, 0.0% filtered");
+    }
 }
diff --git a/crates/zeph-tools/src/filter/mod.rs b/crates/zeph-tools/src/filter/mod.rs
@@ -38,6 +38,8 @@ pub struct FilterResult {
     pub output: String,
     pub raw_chars: usize,
     pub filtered_chars: usize,
+    pub raw_lines: usize,
+    pub filtered_lines: usize,
     pub confidence: FilterConfidence,
 }
 
@@ -131,6 +133,8 @@ impl<'a> FilterPipeline<'a> {
         FilterResult {
             raw_chars: initial_len,
             filtered_chars: current.len(),
+            raw_lines: count_lines(output),
+            filtered_lines: count_lines(&current),
             output: current,
             confidence: worst,
         }
@@ -531,9 +535,15 @@ pub fn sanitize_output(raw: &str) -> String {
     result
 }
 
+fn count_lines(s: &str) -> usize {
+    if s.is_empty() { 0 } else { s.lines().count() }
+}
+
 fn make_result(raw: &str, output: String, confidence: FilterConfidence) -> FilterResult {
     let filtered_chars = output.len();
     FilterResult {
+        raw_lines: count_lines(raw),
+        filtered_lines: count_lines(&output),
         output,
         raw_chars: raw.len(),
         filtered_chars,
@@ -577,6 +587,8 @@ mod tests {
             output: String::new(),
             raw_chars: 1000,
             filtered_chars: 200,
+            raw_lines: 0,
+            filtered_lines: 0,
             confidence: FilterConfidence::Full,
         };
         assert!((r.savings_pct() - 80.0).abs() < 0.01);
@@ -588,11 +600,30 @@ mod tests {
             output: String::new(),
             raw_chars: 0,
             filtered_chars: 0,
+            raw_lines: 0,
+            filtered_lines: 0,
             confidence: FilterConfidence::Full,
         };
         assert!((r.savings_pct()).abs() < 0.01);
     }
 
+    #[test]
+    fn count_lines_helper() {
+        assert_eq!(count_lines(""), 0);
+        assert_eq!(count_lines("one"), 1);
+        assert_eq!(count_lines("one\ntwo\nthree"), 3);
+        assert_eq!(count_lines("trailing\n"), 1);
+    }
+
+    #[test]
+    fn make_result_counts_lines() {
+        let raw = "line1\nline2\nline3\nline4\nline5";
+        let filtered = "line1\nline3".to_owned();
+        let r = make_result(raw, filtered, FilterConfidence::Full);
+        assert_eq!(r.raw_lines, 5);
+        assert_eq!(r.filtered_lines, 2);
+    }
+
     #[test]
     fn registry_disabled_returns_none() {
         let r = OutputFilterRegistry::new(false);
@@ -751,6 +782,8 @@ extra_patterns = ["TODO: security review"]
             output: "short".into(),
             raw_chars: 100,
             filtered_chars: 5,
+            raw_lines: 10,
+            filtered_lines: 1,
             confidence: FilterConfidence::Full,
         };
         m.record(&r);

diff --git a/crates/zeph-tools/src/shell.rs b/crates/zeph-tools/src/shell.rs
@@ -207,16 +207,8 @@ impl ShellExecutor {
             };
             self.log_audit(block, result, duration_ms).await;
 
-            if let Some(ref tx) = self.tool_event_tx {
-                let _ = tx.send(ToolEvent::Completed {
-                    tool_name: "bash".to_owned(),
-                    command: (*block).to_owned(),
-                    output: out.clone(),
-                    success: !out.contains("[error]"),
-                });
-            }
-
             let sanitized = sanitize_output(&out);
+            let mut per_block_stats: Option<FilterStats> = None;
             let filtered = if let Some(ref registry) = self.output_filter_registry {
                 match registry.apply(block, &sanitized, exit_code) {
                     Some(fr) => {
@@ -227,21 +219,41 @@ impl ShellExecutor {
                             savings_pct = fr.savings_pct(),
                             "output filter applied"
                         );
+                        let block_fs = FilterStats {
+                            raw_chars: fr.raw_chars,
+                            filtered_chars: fr.filtered_chars,
+                            raw_lines: fr.raw_lines,
+                            filtered_lines: fr.filtered_lines,
+                            confidence: Some(fr.confidence),
+                        };
                         let stats =
                             cumulative_filter_stats.get_or_insert_with(FilterStats::default);
                         stats.raw_chars += fr.raw_chars;
                         stats.filtered_chars += fr.filtered_chars;
+                        stats.raw_lines += fr.raw_lines;
+                        stats.filtered_lines += fr.filtered_lines;
                         stats.confidence = Some(match (stats.confidence, fr.confidence) {
                             (Some(prev), cur) => crate::filter::worse_confidence(prev, cur),
                             (None, cur) => cur,
                         });
+                        per_block_stats = Some(block_fs);
                         fr.output
                     }
                     None => sanitized,
                 }
             } else {
                 sanitized
             };
+
+            if let Some(ref tx) = self.tool_event_tx {
+                let _ = tx.send(ToolEvent::Completed {
+                    tool_name: "bash".to_owned(),
+                    command: (*block).to_owned(),
+                    output: out.clone(),
+                    success: !out.contains("[error]"),
+                    filter_stats: per_block_stats,
+                });
+            }
             outputs.push(format!("$ {block}\n{filtered}"));
         }
 

diff --git a/docs/src/architecture/token-efficiency.md b/docs/src/architecture/token-efficiency.md
@@ -55,6 +55,30 @@ MCP tools follow the same pipeline:
 
 Prompt size stays constant as you add more capabilities. The only cost of more skills is a slightly larger embedding index in Qdrant or memory.
 
+### Output Filter Pipeline
+
+Tool output is compressed before it enters the LLM context. A command-aware filter pipeline matches each shell command against a set of built-in filters (test runner output, Clippy diagnostics, git log/diff, directory listings, log deduplication) and strips noise while preserving signal. The pipeline runs synchronously inside the tool executor, so the LLM never sees raw output.
+
+Typical savings by command type:
+
+| Command | Raw lines | Filtered lines | Savings |
+|---------|-----------|----------------|---------|
+| `cargo test` (100 passing, 2 failing) | ~340 | ~30 | ~91% |
+| `cargo clippy` (many warnings) | ~200 | ~50 | ~75% |
+| `git log --oneline -50` | 50 | 20 | 60% |
+
+After each filtered execution, CLI mode prints a one-line stats summary and TUI mode accumulates the savings in the Resources panel. See [Tool System — Output Filter Pipeline](../guide/tools.md#output-filter-pipeline) for configuration details.
+
+### Token Savings Tracking
+
+`MetricsSnapshot` tracks cumulative filter metrics across the session:
+
+- `filter_raw_tokens` / `filter_saved_tokens` — aggregate volume before and after filtering
+- `filter_total_commands` / `filter_filtered_commands` — hit rate denominator/numerator
+- `filter_confidence_full/partial/fallback` — distribution of filter confidence levels
+
+These feed into the [TUI filter metrics display](../guide/tui.md#filter-metrics) and are emitted as `tracing::debug!` every 50 commands.
+
 ### Two-Tier Context Pruning
 
 Long conversations accumulate tool outputs that consume significant context space. Zeph uses a two-tier strategy: Tier 1 selectively prunes old tool outputs (cheap, no LLM call), and Tier 2 falls back to full LLM compaction only when Tier 1 is insufficient. See [Context Engineering](../guide/context.md) for details.