diff --git a/CHANGELOG.md b/CHANGELOG.md index 578342c1..dc7ddab1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] +### Fixed +- Filter metrics not appearing in TUI Resources panel when using native tool_use providers (#480) +- Output filter matchers not matching compound shell commands like `cd /path && cargo test 2>&1 | tail` (#481) +- Duplicate `ToolEvent::Completed` emission in shell executor before filtering was applied (#480) + ### Added - Syntax-highlighted diff view for write/edit tool output in TUI (#451) - Diff rendering with green/red backgrounds for added/removed lines diff --git a/crates/zeph-tools/src/filter/mod.rs b/crates/zeph-tools/src/filter/mod.rs index 1db617e4..d4abe0c0 100644 --- a/crates/zeph-tools/src/filter/mod.rs +++ b/crates/zeph-tools/src/filter/mod.rs @@ -68,6 +68,11 @@ pub enum CommandMatcher { impl CommandMatcher { #[must_use] pub fn matches(&self, command: &str) -> bool { + self.matches_single(command) + || extract_last_command(command).is_some_and(|last| self.matches_single(last)) + } + + fn matches_single(&self, command: &str) -> bool { match self { Self::Exact(s) => command == *s, Self::Prefix(s) => command.starts_with(s), @@ -77,6 +82,29 @@ impl CommandMatcher { } } +/// Extract the last command segment from compound shell expressions +/// like `cd /path && cargo test` or `cmd1 ; cmd2`. Strips trailing +/// redirections and pipes (e.g. `2>&1 | tail -50`). +fn extract_last_command(command: &str) -> Option<&str> { + let last = command + .rsplit("&&") + .next() + .or_else(|| command.rsplit(';').next())?; + let last = last.trim(); + if last == command.trim() { + return None; + } + // Strip trailing pipe chain and redirections: take content before first `|` or `2>` + let last = last.split('|').next().unwrap_or(last); + let last = last.split("2>").next().unwrap_or(last); + let trimmed = last.trim(); + if trimmed.is_empty() { + None + } else { + Some(trimmed) + } +} + impl std::fmt::Debug for CommandMatcher { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { @@ -754,6 +782,38 @@ extra_patterns = ["TODO: security review"] assert!(!m.matches("goodbye")); } + #[test] + fn command_matcher_compound_cd_and() { + let m = CommandMatcher::Prefix("cargo "); + assert!(m.matches("cd /some/path && cargo test --workspace --lib")); + assert!(m.matches("cd /path && cargo clippy --workspace -- -D warnings 2>&1")); + } + + #[test] + fn command_matcher_compound_with_pipe() { + let m = CommandMatcher::Custom(Box::new(|cmd| cmd.split_whitespace().any(|t| t == "test"))); + assert!(m.matches("cd /path && cargo test --workspace --lib 2>&1 | tail -80")); + } + + #[test] + fn command_matcher_compound_no_false_positive() { + let m = CommandMatcher::Exact("ls"); + assert!(!m.matches("cd /path && cargo test")); + } + + #[test] + fn extract_last_command_basic() { + assert_eq!( + extract_last_command("cd /path && cargo test --lib"), + Some("cargo test --lib") + ); + assert_eq!( + extract_last_command("cd /p && cargo clippy 2>&1 | tail -20"), + Some("cargo clippy") + ); + assert!(extract_last_command("cargo test").is_none()); + } + // FilterConfidence derives #[test] fn filter_confidence_derives() { diff --git a/docs/src/guide/tools.md b/docs/src/guide/tools.md index ce0f9d02..c3645a93 100644 --- a/docs/src/guide/tools.md +++ b/docs/src/guide/tools.md @@ -115,6 +115,10 @@ Stale overflow files older than 24 hours are cleaned up automatically on startup Before tool output reaches the LLM context, it passes through a command-aware filter pipeline that strips noise and reduces token consumption. Filters are matched by command pattern and composed in sequence. +### Compound Command Matching + +LLMs often generate compound shell expressions like `cd /path && cargo test 2>&1 | tail -80`. Filter matchers automatically extract the last command segment after `&&` or `;` separators and strip trailing pipes and redirections before matching. This means `cd /Users/me/project && cargo clippy --workspace -- -D warnings 2>&1` correctly matches the `ClippyFilter` — no special configuration needed. + ### Built-in Filters | Filter | Matches | What it removes | @@ -141,7 +145,7 @@ Each filter reports a confidence level: | `Partial` | Heuristic match; some content may have been over-filtered | | `Fallback` | Pattern matched but output structure was unexpected | -When multiple filters compose in a pipeline, the worst confidence across stages is propagated. Confidence distribution is tracked in [TUI filter metrics](tui.md#filter-metrics). +When multiple filters compose in a pipeline, the worst confidence across stages is propagated. Confidence distribution is tracked in the [TUI Resources panel](tui.md#confidence-levels-explained) as `F/P/B` counters. ### Inline Filter Stats (CLI) diff --git a/docs/src/guide/tui.md b/docs/src/guide/tui.md index 7a5635cf..cff29381 100644 --- a/docs/src/guide/tui.md +++ b/docs/src/guide/tui.md @@ -204,10 +204,24 @@ Confidence: F/6 P/2 B/0 | `hit rate` | Percentage of commands where output was actually reduced | | `saved tokens` | Cumulative estimated tokens saved (`chars_saved / 4`) | | `%` | Token savings as a fraction of raw token volume | -| `F/P/B` | Confidence distribution: Full / Partial / Fallback counts | +| `F/P/B` | Confidence distribution: Full / Partial / Fallback counts (see below) | The filter section only appears when `filter_applications > 0` — it is hidden when no commands have been filtered. +#### Confidence Levels Explained + +Each filter reports how confident it is in the result. The `Confidence: F/1 P/0 B/3` line shows cumulative counts across all filtered commands: + +| Level | Abbreviation | When assigned | What it means for the output | +|-------|-------------|---------------|------------------------------| +| **Full** | `F` | Filter recognized the output structure completely (e.g. `cargo test` with standard `test result:` summary) | Output is reliably compressed — no useful information lost | +| **Partial** | `P` | Filter matched the command but output had unexpected sections mixed in (e.g. warnings interleaved with test results) | Most noise removed, but some relevant content may have been stripped — inspect if results look incomplete | +| **Fallback** | `B` | Command pattern matched but output structure was unrecognized (e.g. `cargo audit` matched a cargo-prefix filter but has no dedicated handler) | Output returned unchanged or with minimal sanitization only (ANSI stripping, blank line collapse) | + +**Example:** `Confidence: F/1 P/0 B/3` means 1 command was filtered with Full confidence (e.g. `cargo test` — 99% savings) and 3 commands fell through to Fallback (e.g. `cargo audit`, `cargo doc`, `cargo tree` — matched the filter pattern but output was passed through as-is). + +When multiple filters compose in a [pipeline](tools.md#output-filter-pipeline), the worst confidence across stages is propagated. A `Full` + `Partial` composition yields `Partial`. + ## Deferred Model Warmup When running with Ollama (or an orchestrator with Ollama sub-providers), model warmup is deferred until after the TUI interface renders. This means: