diff --git a/CHANGELOG.md b/CHANGELOG.md
index e87dc700..79e3237e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ## [Unreleased]
 
 ### Added
+- Per-tool inline filter stats in CLI chat: `[shell] cargo test (342 lines -> 28 lines, 91.8% filtered)` (#449)
+- Filter metrics in TUI Resources panel: confidence distribution, command hit rate, token savings (#448)
+- Periodic 250ms tick in TUI event loop for real-time metrics refresh (#447)
+- Output filter architecture improvements (M26.1): `CommandMatcher` enum, `FilterConfidence`, `FilterPipeline`, `SecurityPatterns`, per-filter TOML config (#452)
+- Token savings tracking and metrics for output filtering (#445)
 - Smart tool output filtering: command-aware filters that compress tool output before context insertion
 - `OutputFilter` trait and `OutputFilterRegistry` with first-match-wins dispatch
 - `sanitize_output()` ANSI escape and progress bar stripping (runs on all tool output)
@@ -37,7 +42,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 - Extract bootstrap logic from main.rs into `zeph-core::bootstrap::AppBuilder` (#393): main.rs reduced from 2313 to 978 lines
 - `SecurityConfig` and `TimeoutConfig` gain `Clone + Copy`
 - `AnyChannel` moved from main.rs to zeph-channels crate
-- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index)
+- Remove 8 lightweight feature gates, make always-on: openai, compatible, orchestrator, router, self-learning, qdrant, vault-age, mcp (#438)
+- Default features reduced to minimal set (empty after M26)
 - Skill matcher concurrency reduced from 50 to 20
 - `String::with_capacity` in context building loops
 - CI updated to use `--features full`
diff --git a/README.md b/README.md
index 07d75367..a79ff7ab 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingF
 
 ## Why Zeph
 
-**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed.
+**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. Smart output filtering further reduces token consumption by 70-99% for common tool outputs (test results, git logs, clippy diagnostics, directory listings, log deduplication) — per-command filter stats are shown inline in CLI chat and aggregated in the TUI dashboard.
 
 **Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
 
@@ -118,7 +118,7 @@ cargo build --release --features tui
 | **Skill Trust & Quarantine** | 4-tier trust model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity verification, anomaly detection with automatic blocking, and restricted tool access for untrusted skills | |
 | **Prompt Caching** | Automatic prompt caching for Anthropic and OpenAI providers, reducing latency and cost on repeated context | |
 | **Graceful Shutdown** | Ctrl-C triggers ordered teardown with MCP server cleanup and pending task draining | |
-| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
+| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics (including filter savings), message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
 | **Multi-Channel I/O** | CLI, Discord, Slack, Telegram, and TUI with streaming support | [Channels](https://bug-ops.github.io/zeph/guide/channels.html) |
 | **Defense-in-Depth** | Shell sandbox with relative path traversal detection, file sandbox, command filter, secret redaction (Google/GitLab patterns), audit log, SSRF protection (agent + MCP), rate limiter TTL eviction, doom-loop detection, skill trust quarantine | [Security](https://bug-ops.github.io/zeph/security.html) |
 
@@ -155,34 +155,27 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o
 
 ## Feature Flags
 
-| Feature | Default | Description |
-|---------|---------|-------------|
-| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) |
-| `openai` | On | OpenAI provider |
-| `qdrant` | On | Qdrant vector search for skills and MCP tools |
-| `self-learning` | On | Skill evolution system |
-| `vault-age` | On | Age-encrypted secret storage |
-| `a2a` | Off | A2A protocol client and server |
-| `candle` | Off | Local HuggingFace inference (GGUF) |
-| `index` | Off | AST-based code indexing and semantic retrieval |
-| `mcp` | Off | MCP client for external tool servers |
-| `orchestrator` | Off | Multi-model routing with fallback |
-| `router` | Off | Prompt-based model selection via RouterProvider |
-| `discord` | Off | Discord bot with Gateway v10 WebSocket |
-| `slack` | Off | Slack bot with Events API webhook |
-| `gateway` | Off | HTTP gateway for webhook ingestion |
-| `daemon` | Off | Daemon supervisor for component lifecycle |
-| `scheduler` | Off | Cron-based periodic task scheduler |
-| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana |
-| `metal` | Off | Metal GPU acceleration (macOS) |
-| `tui` | Off | ratatui TUI dashboard with real-time metrics |
-| `cuda` | Off | CUDA GPU acceleration (Linux) |
+The following features are always compiled in (no flag needed): `openai`, `compatible`, `orchestrator`, `router`, `self-learning`, `qdrant`, `vault-age`, `mcp`.
+
+| Feature | Description |
+|---------|-------------|
+| `a2a` | A2A protocol client and server |
+| `candle` | Local HuggingFace inference (GGUF) |
+| `index` | AST-based code indexing and semantic retrieval |
+| `discord` | Discord bot with Gateway v10 WebSocket |
+| `slack` | Slack bot with Events API webhook |
+| `gateway` | HTTP gateway for webhook ingestion |
+| `daemon` | Daemon supervisor for component lifecycle |
+| `scheduler` | Cron-based periodic task scheduler |
+| `otel` | OpenTelemetry OTLP export for Prometheus/Grafana |
+| `metal` | Metal GPU acceleration (macOS) |
+| `tui` | ratatui TUI dashboard with real-time metrics |
+| `cuda` | CUDA GPU acceleration (Linux) |
 
 ```bash
-cargo build --release                        # default features only
-cargo build --release --features full        # all non-platform features
+cargo build --release                        # default build (all always-on features included)
+cargo build --release --features full        # all optional features
 cargo build --release --features metal       # macOS Metal GPU
-cargo build --release --no-default-features  # minimal binary (Ollama + Claude only)
 cargo build --release --features tui         # with TUI dashboard
 ```
 
diff --git a/crates/zeph-core/src/agent/streaming.rs b/crates/zeph-core/src/agent/streaming.rs
index c0a35394..abfca456 100644
--- a/crates/zeph-core/src/agent/streaming.rs
+++ b/crates/zeph-core/src/agent/streaming.rs
@@ -313,6 +313,13 @@ impl<C: Channel, T: ToolExecutor> Agent<C, T> {
                 let display = self.maybe_redact(&formatted_output);
                 self.channel.send(&display).await?;
 
+                if let Some(ref fs) = output.filter_stats
+                    && fs.filtered_lines < fs.raw_lines
+                {
+                    let stats_line = fs.format_inline(&output.tool_name);
+                    self.channel.send(&stats_line).await?;
+                }
+
                 self.push_message(Message::from_parts(
                     Role::User,
                     vec![MessagePart::ToolOutput {
diff --git a/crates/zeph-tools/src/executor.rs b/crates/zeph-tools/src/executor.rs
index afcda1d3..f38ad0b9 100644
--- a/crates/zeph-tools/src/executor.rs
+++ b/crates/zeph-tools/src/executor.rs
@@ -13,6 +13,8 @@ pub struct ToolCall {
 pub struct FilterStats {
     pub raw_chars: usize,
     pub filtered_chars: usize,
+    pub raw_lines: usize,
+    pub filtered_lines: usize,
     pub confidence: Option<crate::FilterConfidence>,
 }
 
@@ -30,6 +32,16 @@ impl FilterStats {
     pub fn estimated_tokens_saved(&self) -> usize {
         self.raw_chars.saturating_sub(self.filtered_chars) / 4
     }
+
+    #[must_use]
+    pub fn format_inline(&self, tool_name: &str) -> String {
+        format!(
+            "[{tool_name}] {} lines -> {} lines, {:.1}% filtered",
+            self.raw_lines,
+            self.filtered_lines,
+            self.savings_pct()
+        )
+    }
 }
 
 /// Structured result from tool execution.
@@ -85,6 +97,7 @@ pub enum ToolEvent {
         command: String,
         output: String,
         success: bool,
+        filter_stats: Option<FilterStats>,
     },
 }
 
@@ -293,4 +306,24 @@ mod tests {
         };
         assert_eq!(fs.estimated_tokens_saved(), 200); // (1000 - 200) / 4
     }
+
+    #[test]
+    fn filter_stats_format_inline() {
+        let fs = FilterStats {
+            raw_chars: 1000,
+            filtered_chars: 200,
+            raw_lines: 342,
+            filtered_lines: 28,
+            ..Default::default()
+        };
+        let line = fs.format_inline("shell");
+        assert_eq!(line, "[shell] 342 lines -> 28 lines, 80.0% filtered");
+    }
+
+    #[test]
+    fn filter_stats_format_inline_zero() {
+        let fs = FilterStats::default();
+        let line = fs.format_inline("bash");
+        assert_eq!(line, "[bash] 0 lines -> 0 lines, 0.0% filtered");
+    }
 }
diff --git a/crates/zeph-tools/src/filter/mod.rs b/crates/zeph-tools/src/filter/mod.rs
index e3bb5256..1db617e4 100644
--- a/crates/zeph-tools/src/filter/mod.rs
+++ b/crates/zeph-tools/src/filter/mod.rs
@@ -38,6 +38,8 @@ pub struct FilterResult {
     pub output: String,
     pub raw_chars: usize,
     pub filtered_chars: usize,
+    pub raw_lines: usize,
+    pub filtered_lines: usize,
     pub confidence: FilterConfidence,
 }
 
@@ -131,6 +133,8 @@ impl<'a> FilterPipeline<'a> {
         FilterResult {
             raw_chars: initial_len,
             filtered_chars: current.len(),
+            raw_lines: count_lines(output),
+            filtered_lines: count_lines(&current),
             output: current,
             confidence: worst,
         }
@@ -531,9 +535,15 @@ pub fn sanitize_output(raw: &str) -> String {
     result
 }
 
+fn count_lines(s: &str) -> usize {
+    if s.is_empty() { 0 } else { s.lines().count() }
+}
+
 fn make_result(raw: &str, output: String, confidence: FilterConfidence) -> FilterResult {
     let filtered_chars = output.len();
     FilterResult {
+        raw_lines: count_lines(raw),
+        filtered_lines: count_lines(&output),
         output,
         raw_chars: raw.len(),
         filtered_chars,
@@ -577,6 +587,8 @@ mod tests {
             output: String::new(),
             raw_chars: 1000,
             filtered_chars: 200,
+            raw_lines: 0,
+            filtered_lines: 0,
             confidence: FilterConfidence::Full,
         };
         assert!((r.savings_pct() - 80.0).abs() < 0.01);
@@ -588,11 +600,30 @@ mod tests {
             output: String::new(),
             raw_chars: 0,
             filtered_chars: 0,
+            raw_lines: 0,
+            filtered_lines: 0,
             confidence: FilterConfidence::Full,
         };
         assert!((r.savings_pct()).abs() < 0.01);
     }
 
+    #[test]
+    fn count_lines_helper() {
+        assert_eq!(count_lines(""), 0);
+        assert_eq!(count_lines("one"), 1);
+        assert_eq!(count_lines("one\ntwo\nthree"), 3);
+        assert_eq!(count_lines("trailing\n"), 1);
+    }
+
+    #[test]
+    fn make_result_counts_lines() {
+        let raw = "line1\nline2\nline3\nline4\nline5";
+        let filtered = "line1\nline3".to_owned();
+        let r = make_result(raw, filtered, FilterConfidence::Full);
+        assert_eq!(r.raw_lines, 5);
+        assert_eq!(r.filtered_lines, 2);
+    }
+
     #[test]
     fn registry_disabled_returns_none() {
         let r = OutputFilterRegistry::new(false);
@@ -751,6 +782,8 @@ extra_patterns = ["TODO: security review"]
             output: "short".into(),
             raw_chars: 100,
             filtered_chars: 5,
+            raw_lines: 10,
+            filtered_lines: 1,
             confidence: FilterConfidence::Full,
         };
         m.record(&r);
diff --git a/crates/zeph-tools/src/shell.rs b/crates/zeph-tools/src/shell.rs
index da5321fd..a812bd8c 100644
--- a/crates/zeph-tools/src/shell.rs
+++ b/crates/zeph-tools/src/shell.rs
@@ -207,16 +207,8 @@ impl ShellExecutor {
             };
             self.log_audit(block, result, duration_ms).await;
 
-            if let Some(ref tx) = self.tool_event_tx {
-                let _ = tx.send(ToolEvent::Completed {
-                    tool_name: "bash".to_owned(),
-                    command: (*block).to_owned(),
-                    output: out.clone(),
-                    success: !out.contains("[error]"),
-                });
-            }
-
             let sanitized = sanitize_output(&out);
+            let mut per_block_stats: Option<FilterStats> = None;
             let filtered = if let Some(ref registry) = self.output_filter_registry {
                 match registry.apply(block, &sanitized, exit_code) {
                     Some(fr) => {
@@ -227,14 +219,24 @@ impl ShellExecutor {
                             savings_pct = fr.savings_pct(),
                             "output filter applied"
                         );
+                        let block_fs = FilterStats {
+                            raw_chars: fr.raw_chars,
+                            filtered_chars: fr.filtered_chars,
+                            raw_lines: fr.raw_lines,
+                            filtered_lines: fr.filtered_lines,
+                            confidence: Some(fr.confidence),
+                        };
                         let stats =
                             cumulative_filter_stats.get_or_insert_with(FilterStats::default);
                         stats.raw_chars += fr.raw_chars;
                         stats.filtered_chars += fr.filtered_chars;
+                        stats.raw_lines += fr.raw_lines;
+                        stats.filtered_lines += fr.filtered_lines;
                         stats.confidence = Some(match (stats.confidence, fr.confidence) {
                             (Some(prev), cur) => crate::filter::worse_confidence(prev, cur),
                             (None, cur) => cur,
                         });
+                        per_block_stats = Some(block_fs);
                         fr.output
                     }
                     None => sanitized,
@@ -242,6 +244,16 @@ impl ShellExecutor {
             } else {
                 sanitized
             };
+
+            if let Some(ref tx) = self.tool_event_tx {
+                let _ = tx.send(ToolEvent::Completed {
+                    tool_name: "bash".to_owned(),
+                    command: (*block).to_owned(),
+                    output: out.clone(),
+                    success: !out.contains("[error]"),
+                    filter_stats: per_block_stats,
+                });
+            }
             outputs.push(format!("$ {block}\n{filtered}"));
         }
 
diff --git a/docs/src/architecture/token-efficiency.md b/docs/src/architecture/token-efficiency.md
index 8a79b90a..0b573226 100644
--- a/docs/src/architecture/token-efficiency.md
+++ b/docs/src/architecture/token-efficiency.md
@@ -55,6 +55,30 @@ MCP tools follow the same pipeline:
 
 Prompt size stays constant as you add more capabilities. The only cost of more skills is a slightly larger embedding index in Qdrant or memory.
 
+### Output Filter Pipeline
+
+Tool output is compressed before it enters the LLM context. A command-aware filter pipeline matches each shell command against a set of built-in filters (test runner output, Clippy diagnostics, git log/diff, directory listings, log deduplication) and strips noise while preserving signal. The pipeline runs synchronously inside the tool executor, so the LLM never sees raw output.
+
+Typical savings by command type:
+
+| Command | Raw lines | Filtered lines | Savings |
+|---------|-----------|----------------|---------|
+| `cargo test` (100 passing, 2 failing) | ~340 | ~30 | ~91% |
+| `cargo clippy` (many warnings) | ~200 | ~50 | ~75% |
+| `git log --oneline -50` | 50 | 20 | 60% |
+
+After each filtered execution, CLI mode prints a one-line stats summary and TUI mode accumulates the savings in the Resources panel. See [Tool System — Output Filter Pipeline](../guide/tools.md#output-filter-pipeline) for configuration details.
+
+### Token Savings Tracking
+
+`MetricsSnapshot` tracks cumulative filter metrics across the session:
+
+- `filter_raw_tokens` / `filter_saved_tokens` — aggregate volume before and after filtering
+- `filter_total_commands` / `filter_filtered_commands` — hit rate denominator/numerator
+- `filter_confidence_full/partial/fallback` — distribution of filter confidence levels
+
+These feed into the [TUI filter metrics display](../guide/tui.md#filter-metrics) and are emitted as `tracing::debug!` every 50 commands.
+
 ### Two-Tier Context Pruning
 
 Long conversations accumulate tool outputs that consume significant context space. Zeph uses a two-tier strategy: Tier 1 selectively prunes old tool outputs (cheap, no LLM call), and Tier 2 falls back to full LLM compaction only when Tier 1 is insufficient. See [Context Engineering](../guide/context.md) for details.
diff --git a/docs/src/feature-flags.md b/docs/src/feature-flags.md
index 037b5b88..3f9c17f1 100644
--- a/docs/src/feature-flags.md
+++ b/docs/src/feature-flags.md
@@ -1,42 +1,49 @@
 # Feature Flags
 
-Zeph uses Cargo feature flags to control optional functionality. Default features cover common use cases; platform-specific and experimental features are opt-in.
-
-| Feature | Default | Description |
-|---------|---------|-------------|
-| `compatible` | Enabled | `CompatibleProvider` for OpenAI-compatible third-party APIs |
-| `openai` | Enabled | OpenAI-compatible provider (GPT, Together, Groq, Fireworks, etc.) |
-| `qdrant` | Enabled | Qdrant-backed vector storage for skill matching (`zeph-skills`) and MCP tool registry (`zeph-mcp`) |
-| `self-learning` | Enabled | Skill evolution via failure detection, self-reflection, and LLM-generated improvements |
-| `vault-age` | Enabled | Age-encrypted vault backend for file-based secret storage ([age](https://age-encryption.org/)) |
-| `a2a` | Disabled | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication |
-| `candle` | Disabled | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) |
-| `index` | Disabled | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) |
-| `mcp` | Disabled | MCP client for external tool servers via stdio/HTTP transport |
-| `orchestrator` | Disabled | Multi-model routing with task-based classification and fallback chains |
-| `router` | Disabled | `RouterProvider` for chaining multiple providers with fallback |
-| `discord` | Disabled | Discord channel adapter with Gateway v10 WebSocket and slash commands ([guide](guide/channels.md#discord-channel)) |
-| `slack` | Disabled | Slack channel adapter with Events API webhook and HMAC-SHA256 verification ([guide](guide/channels.md#slack-channel)) |
-| `otel` | Disabled | OpenTelemetry tracing export via OTLP/gRPC ([guide](guide/observability.md)) |
-| `gateway` | Disabled | HTTP gateway for webhook ingestion with bearer auth and rate limiting ([guide](guide/gateway.md)) |
-| `daemon` | Disabled | Daemon supervisor with component lifecycle, PID file, and health monitoring ([guide](guide/daemon.md)) |
-| `scheduler` | Disabled | Cron-based periodic task scheduler with SQLite persistence ([guide](guide/scheduler.md)) |
-| `tui` | Disabled | ratatui-based TUI dashboard with real-time agent metrics |
-| `metal` | Disabled | Metal GPU acceleration for candle on macOS (implies `candle`) |
-| `cuda` | Disabled | CUDA GPU acceleration for candle on Linux (implies `candle`) |
+Zeph uses Cargo feature flags to control optional functionality. As of M26, eight previously optional features are now always-on and compiled into every build. The remaining optional features are explicitly opt-in.
+
+## Always-On (compiled unconditionally)
+
+| Feature | Description |
+|---------|-------------|
+| `openai` | OpenAI-compatible provider (GPT, Together, Groq, Fireworks, etc.) |
+| `compatible` | `CompatibleProvider` for OpenAI-compatible third-party APIs |
+| `orchestrator` | Multi-model routing with task-based classification and fallback chains |
+| `router` | `RouterProvider` for chaining multiple providers with fallback |
+| `self-learning` | Skill evolution via failure detection, self-reflection, and LLM-generated improvements |
+| `qdrant` | Qdrant-backed vector storage for skill matching and MCP tool registry |
+| `vault-age` | Age-encrypted vault backend for file-based secret storage ([age](https://age-encryption.org/)) |
+| `mcp` | MCP client for external tool servers via stdio/HTTP transport |
+
+## Optional Features
+
+| Feature | Description |
+|---------|-------------|
+| `tui` | ratatui-based TUI dashboard with real-time agent metrics |
+| `candle` | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) |
+| `metal` | Metal GPU acceleration for candle on macOS (implies `candle`) |
+| `cuda` | CUDA GPU acceleration for candle on Linux (implies `candle`) |
+| `discord` | Discord channel adapter with Gateway v10 WebSocket and slash commands ([guide](guide/channels.md#discord-channel)) |
+| `slack` | Slack channel adapter with Events API webhook and HMAC-SHA256 verification ([guide](guide/channels.md#slack-channel)) |
+| `a2a` | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication |
+| `index` | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) |
+| `gateway` | HTTP gateway for webhook ingestion with bearer auth and rate limiting ([guide](guide/gateway.md)) |
+| `daemon` | Daemon supervisor with component lifecycle, PID file, and health monitoring ([guide](guide/daemon.md)) |
+| `scheduler` | Cron-based periodic task scheduler with SQLite persistence ([guide](guide/scheduler.md)) |
+| `otel` | OpenTelemetry tracing export via OTLP/gRPC ([guide](guide/observability.md)) |
+| `mock` | Mock providers and channels for testing |
 
 ## Build Examples
 
 ```bash
-cargo build --release                                     # all default features
-cargo build --release --features metal                    # macOS with Metal GPU
-cargo build --release --features cuda                     # Linux with NVIDIA GPU
-cargo build --release --features tui                      # with TUI dashboard
-cargo build --release --features discord                    # with Discord bot
-cargo build --release --features slack                      # with Slack bot
+cargo build --release                                      # default build (always-on features included)
+cargo build --release --features metal                     # macOS with Metal GPU
+cargo build --release --features cuda                      # Linux with NVIDIA GPU
+cargo build --release --features tui                       # with TUI dashboard
+cargo build --release --features discord                   # with Discord bot
+cargo build --release --features slack                     # with Slack bot
 cargo build --release --features gateway,daemon,scheduler  # with infrastructure components
 cargo build --release --features full                      # all optional features
-cargo build --release --no-default-features               # minimal binary
 ```
 
 The `full` feature enables every optional feature except `metal`, `cuda`, and `otel`.
diff --git a/docs/src/guide/tools.md b/docs/src/guide/tools.md
index c7903b80..ce0f9d02 100644
--- a/docs/src/guide/tools.md
+++ b/docs/src/guide/tools.md
@@ -111,6 +111,80 @@ Tool output exceeding 30 000 characters is truncated (head + tail split) before
 
 Stale overflow files older than 24 hours are cleaned up automatically on startup.
 
+## Output Filter Pipeline
+
+Before tool output reaches the LLM context, it passes through a command-aware filter pipeline that strips noise and reduces token consumption. Filters are matched by command pattern and composed in sequence.
+
+### Built-in Filters
+
+| Filter | Matches | What it removes |
+|--------|---------|----------------|
+| `TestOutputFilter` | `cargo test`, `cargo nextest`, `pytest`, `go test` | Passing test lines, verbose output; keeps failures and summary |
+| `ClippyFilter` | `cargo clippy` | Duplicate diagnostic paths, redundant `help:` lines |
+| `GitFilter` | `git log`, `git diff` | Limits log entries (default: 20), diff line count (default: 500) |
+| `DirListingFilter` | `ls`, `find`, `tree` | Collapses redundant whitespace and deduplicates paths |
+| `LogDedupFilter` | any command with repetitive log output | Deduplicates consecutive identical lines |
+
+All filters also strip ANSI escape sequences, carriage-return progress bars, and collapse consecutive blank lines (`sanitize_output`).
+
+### Security Pass
+
+After filtering, a security scan runs over the **raw** (pre-filter) output. If credential-shaped patterns are found (API keys, tokens, passwords), a warning is appended to the filtered output so the LLM is aware without exposing the value. Additional regex patterns can be configured via `[tools.filters.security] extra_patterns`.
+
+### FilterConfidence
+
+Each filter reports a confidence level:
+
+| Level | Meaning |
+|-------|---------|
+| `Full` | Filter is certain it handled this output correctly |
+| `Partial` | Heuristic match; some content may have been over-filtered |
+| `Fallback` | Pattern matched but output structure was unexpected |
+
+When multiple filters compose in a pipeline, the worst confidence across stages is propagated. Confidence distribution is tracked in [TUI filter metrics](tui.md#filter-metrics).
+
+### Inline Filter Stats (CLI)
+
+In CLI mode, after each filtered tool execution a one-line summary is printed to the conversation:
+
+```
+[shell] 342 lines -> 28 lines, 91.8% filtered
+```
+
+This appears only when lines were actually removed. It lets you verify the filter is working and estimate token savings without opening the TUI.
+
+### Configuration
+
+```toml
+[tools.filters]
+enabled = true            # Master switch (default: true)
+
+[tools.filters.test]
+enabled = true
+max_failures = 10         # Max failing tests to show (default: 10)
+truncate_stack_trace = 50 # Stack trace line limit (default: 50)
+
+[tools.filters.git]
+enabled = true
+max_log_entries = 20      # Max git log entries (default: 20)
+max_diff_lines = 500      # Max diff lines (default: 500)
+
+[tools.filters.clippy]
+enabled = true
+
+[tools.filters.dir_listing]
+enabled = true
+
+[tools.filters.log_dedup]
+enabled = true
+
+[tools.filters.security]
+enabled = true
+extra_patterns = []       # Additional regex patterns to flag as credentials
+```
+
+Individual filters can be disabled without affecting others.
+
 ## Configuration
 
 ```toml
diff --git a/docs/src/guide/tui.md b/docs/src/guide/tui.md
index 3bc78612..63b565a1 100644
--- a/docs/src/guide/tui.md
+++ b/docs/src/guide/tui.md
@@ -150,13 +150,13 @@ The TUI adapts to terminal width:
 
 ## Live Metrics
 
-The TUI dashboard displays real-time metrics collected from the agent loop via `tokio::sync::watch` channel:
+The TUI dashboard displays real-time metrics collected from the agent loop via `tokio::sync::watch` channel. The render loop polls the watch receiver before every frame at 250 ms intervals, so the display updates continuously even without user input.
 
 | Panel | Metrics |
 |-------|---------|
 | **Skills** | Active/total skill count, matched skill names per query |
 | **Memory** | SQLite message count, conversation ID, Qdrant status, embeddings generated, summaries count, tool output prunes |
-| **Resources** | Prompt/completion/total tokens, API calls, last LLM latency (ms), provider and model name |
+| **Resources** | Prompt/completion/total tokens, API calls, last LLM latency (ms), provider and model name, prompt cache read/write tokens, filter stats |
 
 Metrics are updated at key instrumentation points in the agent loop:
 - After each LLM call (api_calls, latency, prompt tokens)
@@ -164,9 +164,30 @@ Metrics are updated at key instrumentation points in the agent loop:
 - After skill matching (active skills, total skills)
 - After message persistence (sqlite message count)
 - After summarization (summaries count)
+- After each tool execution with filter applied (filter metrics)
 
 Token counts use a `chars/4` estimation (sufficient for dashboard display).
 
+### Filter Metrics
+
+When the output filter pipeline has processed at least one command, the Resources panel shows:
+
+```
+Filter: 8/10 commands (80% hit rate)
+Filter saved: 1240 tok (72%)
+Confidence: F/6 P/2 B/0
+```
+
+| Field | Meaning |
+|-------|---------|
+| `N/M commands` | Filtered / total commands through the pipeline |
+| `hit rate` | Percentage of commands where output was actually reduced |
+| `saved tokens` | Cumulative estimated tokens saved (`chars_saved / 4`) |
+| `%` | Token savings as a fraction of raw token volume |
+| `F/P/B` | Confidence distribution: Full / Partial / Fallback counts |
+
+The filter section only appears when `filter_applications > 0` — it is hidden when no commands have been filtered.
+
 ## Deferred Model Warmup
 
 When running with Ollama (or an orchestrator with Ollama sub-providers), model warmup is deferred until after the TUI interface renders. This means:
diff --git a/src/main.rs b/src/main.rs
index 1829d8e3..c74c7fce 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -501,6 +501,7 @@ async fn forward_tool_events_to_tui(
                 command,
                 output,
                 success,
+                ..
             } => zeph_tui::AgentEvent::ToolOutput {
                 tool_name,
                 command,