diff --git a/CHANGELOG.md b/CHANGELOG.md index 13029e3b..b9046407 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] +### Added +- Declarative TOML-based output filter engine with 9 strategy types: `strip_noise`, `truncate`, `keep_matching`, `strip_annotated`, `test_summary`, `group_by_rule`, `git_status`, `git_diff`, `dedup` +- Embedded `default-filters.toml` with 19 pre-configured rules for CLI tools (cargo, git, docker, npm, pip, make, pytest, go, terraform, kubectl, brew, ls, journalctl) +- `filters_path` option in `FilterConfig` for user-provided filter rules override +- ReDoS protection: RegexBuilder with size_limit, 512-char pattern cap, 1 MiB file size limit +- Dedup strategy with configurable normalization patterns and HashMap pre-allocation +- NormalizeEntry replacement validation (rejects unescaped `$` capture group refs) + +### Changed +- Migrated all 6 hardcoded filters (cargo_build, test_output, clippy, git, dir_listing, log_dedup) into the declarative TOML engine + +### Removed +- `FilterConfig` per-filter config structs (`TestFilterConfig`, `GitFilterConfig`, `ClippyFilterConfig`, `CargoBuildFilterConfig`, `DirListingFilterConfig`, `LogDedupFilterConfig`) — filter params now in TOML strategy fields + ## [0.11.4] - 2026-02-21 ### Added diff --git a/README.md b/README.md index ed56791f..ada0fa50 100644 --- a/README.md +++ b/README.md @@ -133,15 +133,19 @@ When two candidates score within a configurable threshold of each other, structu ### Smart Output Filtering — 70-99% Token Savings -Raw tool output is the #1 context window polluter. A `cargo test` run produces 300+ lines; the model needs 3. Zeph applies command-aware filters **before** context injection: - -| Filter | What It Does | Typical Savings | -|--------|-------------|-----------------| -| **Test** | Cargo test/nextest — failures-only mode | 94-99% | -| **Git** | Compact status/diff/log/push | 80-99% | -| **Clippy** | Group warnings by lint rule | 70-90% | -| **Directory** | Hide noise dirs (target, node_modules, .git) | 60-80% | -| **Log dedup** | Normalize timestamps/UUIDs, count repeats | 70-85% | +Raw tool output is the #1 context window polluter. A `cargo test` run produces 300+ lines; the model needs 3. Zeph applies command-aware filters **before** context injection via a unified declarative TOML engine with 9 strategy types: + +| Strategy | What It Does | Typical Savings | +|----------|-------------|-----------------| +| `test_summary` | Cargo test/nextest/pytest/Go test — failures-only mode | 94-99% | +| `git_status` / `git_diff` | Compact status and bounded diff/log output | 80-99% | +| `group_by_rule` | Group Clippy warnings by lint rule | 70-90% | +| `dedup` | Normalize timestamps/UUIDs, count repeats | 70-85% | +| `strip_noise` / `keep_matching` | Remove or retain lines by regex pattern | varies | +| `truncate` | Head+tail window with configurable limits | varies | +| `strip_annotated` | Drop annotated diagnostic lines (e.g. `help:`) | varies | + +19 built-in rules ship embedded, covering Cargo test/nextest, Clippy, git, directory listings, Docker, npm/yarn/pnpm, pip, Make, pytest, Go test, Terraform, kubectl, and Homebrew. Drop a custom `filters.toml` next to your config to add or override rules without code changes. Per-command stats shown inline, so you see exactly what was saved: diff --git a/crates/zeph-core/src/config/snapshots/zeph_core__config__types__tests__config_default_snapshot.snap b/crates/zeph-core/src/config/snapshots/zeph_core__config__types__tests__config_default_snapshot.snap index da9541e3..71de4589 100644 --- a/crates/zeph-core/src/config/snapshots/zeph_core__config__types__tests__config_default_snapshot.snap +++ b/crates/zeph-core/src/config/snapshots/zeph_core__config__types__tests__config_default_snapshot.snap @@ -83,28 +83,6 @@ destination = "stdout" [tools.filters] enabled = true -[tools.filters.test] -enabled = true -max_failures = 10 -truncate_stack_trace = 50 - -[tools.filters.git] -enabled = true -max_log_entries = 20 -max_diff_lines = 500 - -[tools.filters.clippy] -enabled = true - -[tools.filters.cargo_build] -enabled = true - -[tools.filters.dir_listing] -enabled = true - -[tools.filters.log_dedup] -enabled = true - [tools.filters.security] enabled = true extra_patterns = [] diff --git a/crates/zeph-tools/Cargo.toml b/crates/zeph-tools/Cargo.toml index c4c4584a..921a359b 100644 --- a/crates/zeph-tools/Cargo.toml +++ b/crates/zeph-tools/Cargo.toml @@ -24,6 +24,7 @@ serde = { workspace = true, features = ["derive"] } serde_json.workspace = true thiserror.workspace = true tokio = { workspace = true, features = ["fs", "io-util", "macros", "process", "rt", "sync", "time"] } +toml.workspace = true tokio-util.workspace = true tracing.workspace = true url.workspace = true diff --git a/crates/zeph-tools/README.md b/crates/zeph-tools/README.md index 0d6fcb74..d8b0e955 100644 --- a/crates/zeph-tools/README.md +++ b/crates/zeph-tools/README.md @@ -20,7 +20,7 @@ Defines the `ToolExecutor` trait for sandboxed tool invocation and ships concret | `file` | File operation executor | | `scrape` | Web scraping executor with SSRF protection (post-DNS private IP validation, pinned address client) | | `composite` | `CompositeExecutor` — chains executors with middleware | -| `filter` | Output filtering pipeline | +| `filter` | Output filtering pipeline — unified declarative TOML engine with 9 strategy types (`strip_noise`, `truncate`, `keep_matching`, `strip_annotated`, `test_summary`, `group_by_rule`, `git_status`, `git_diff`, `dedup`) and 19 embedded built-in rules; user-configurable via `filters.toml` | | `permissions` | Permission checks for tool invocation | | `audit` | `AuditLogger` — tool execution audit trail | | `registry` | Tool registry and discovery | diff --git a/crates/zeph-tools/src/filter/cargo_build.rs b/crates/zeph-tools/src/filter/cargo_build.rs deleted file mode 100644 index d9e37a4f..00000000 --- a/crates/zeph-tools/src/filter/cargo_build.rs +++ /dev/null @@ -1,289 +0,0 @@ -use std::fmt::Write; -use std::sync::LazyLock; - -use super::{ - CargoBuildFilterConfig, CommandMatcher, FilterConfidence, FilterResult, OutputFilter, - make_result, -}; - -static CARGO_BUILD_MATCHER: LazyLock = LazyLock::new(|| { - CommandMatcher::Custom(Box::new(|cmd| { - let c = cmd.to_lowercase(); - let tokens: Vec<&str> = c.split_whitespace().collect(); - if tokens.first() != Some(&"cargo") { - return false; - } - let dominated = ["test", "nextest", "clippy"]; - !tokens.iter().skip(1).any(|t| dominated.contains(t)) - })) -}); - -const NOISE_PREFIXES: &[&str] = &[ - "Compiling ", - "Downloading ", - "Downloaded ", - "Updating ", - "Fetching ", - "Fresh ", - "Packaging ", - "Verifying ", - "Archiving ", - "Locking ", - "Adding ", - "Removing ", - "Checking ", - "Documenting ", - "Running ", - "Loaded ", - "Blocking ", - "Unpacking ", -]; - -/// Max lines to keep when output has no recognizable noise pattern. -const LONG_OUTPUT_THRESHOLD: usize = 30; -const KEEP_HEAD: usize = 10; -const KEEP_TAIL: usize = 5; - -fn is_noise(line: &str) -> bool { - let trimmed = line.trim_start(); - NOISE_PREFIXES.iter().any(|p| trimmed.starts_with(p)) -} - -/// Check if a line is cargo build/fetch noise (for reuse by other filters). -pub fn is_cargo_noise(line: &str) -> bool { - let trimmed = line.trim_start(); - trimmed.starts_with("Finished ") || is_noise(line) -} - -pub struct CargoBuildFilter; - -impl CargoBuildFilter { - #[must_use] - pub fn new(_config: CargoBuildFilterConfig) -> Self { - Self - } -} - -impl OutputFilter for CargoBuildFilter { - fn name(&self) -> &'static str { - "cargo_build" - } - - fn matcher(&self) -> &CommandMatcher { - &CARGO_BUILD_MATCHER - } - - fn filter(&self, _command: &str, raw_output: &str, exit_code: i32) -> FilterResult { - let mut noise_count = 0usize; - let mut kept = Vec::new(); - let mut finished_line: Option<&str> = None; - - for line in raw_output.lines() { - let trimmed = line.trim_start(); - if trimmed.starts_with("Finished ") { - finished_line = Some(trimmed); - noise_count += 1; - } else if is_noise(line) { - noise_count += 1; - } else { - kept.push(line); - } - } - - if noise_count > 0 { - return build_noise_result(raw_output, &kept, finished_line, noise_count); - } - - if exit_code != 0 { - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - // No recognizable noise — apply generic long-output truncation - let lines: Vec<&str> = raw_output.lines().collect(); - if lines.len() > LONG_OUTPUT_THRESHOLD { - return truncate_long(raw_output, &lines); - } - - make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ) - } -} - -fn build_noise_result( - raw: &str, - kept: &[&str], - finished_line: Option<&str>, - noise_count: usize, -) -> FilterResult { - let mut output = String::new(); - if let Some(fin) = finished_line { - let _ = writeln!(output, "{fin}"); - } - let _ = writeln!(output, "({noise_count} compile/fetch lines removed)"); - if !kept.is_empty() { - output.push('\n'); - if kept.len() > LONG_OUTPUT_THRESHOLD { - let omitted = kept.len() - KEEP_HEAD - KEEP_TAIL; - for line in &kept[..KEEP_HEAD] { - let _ = writeln!(output, "{line}"); - } - let _ = writeln!(output, "\n... ({omitted} lines omitted) ...\n"); - for line in &kept[kept.len() - KEEP_TAIL..] { - let _ = writeln!(output, "{line}"); - } - } else { - for line in kept { - let _ = writeln!(output, "{line}"); - } - } - } - make_result(raw, output.trim_end().to_owned(), FilterConfidence::Full) -} - -fn truncate_long(raw: &str, lines: &[&str]) -> FilterResult { - let total = lines.len(); - let omitted = total - KEEP_HEAD - KEEP_TAIL; - let mut output = String::new(); - for line in &lines[..KEEP_HEAD] { - let _ = writeln!(output, "{line}"); - } - let _ = writeln!(output, "\n... ({omitted} lines omitted) ...\n"); - for line in &lines[total - KEEP_TAIL..] { - let _ = writeln!(output, "{line}"); - } - make_result(raw, output.trim_end().to_owned(), FilterConfidence::Partial) -} - -#[cfg(test)] -mod tests { - use super::*; - - fn make_filter() -> CargoBuildFilter { - CargoBuildFilter::new(CargoBuildFilterConfig::default()) - } - - #[test] - fn matches_cargo_build_commands() { - let f = make_filter(); - assert!(f.matcher().matches("cargo build")); - assert!(f.matcher().matches("cargo build --release")); - assert!(f.matcher().matches("cargo doc --no-deps")); - assert!(f.matcher().matches("cargo +nightly fmt --check")); - assert!(f.matcher().matches("cargo audit")); - assert!(f.matcher().matches("cargo tree --duplicates")); - assert!(f.matcher().matches("cargo bench")); - } - - #[test] - fn skips_test_and_clippy() { - let f = make_filter(); - assert!(!f.matcher().matches("cargo test")); - assert!(!f.matcher().matches("cargo nextest run")); - assert!(!f.matcher().matches("cargo clippy --workspace")); - } - - #[test] - fn filters_compile_noise() { - let f = make_filter(); - let raw = " Compiling serde v1.0.200\n Compiling zeph-core v0.9.9\n Compiling zeph-tools v0.9.9\n Finished `dev` profile [unoptimized + debuginfo] target(s) in 5.32s"; - let result = f.filter("cargo build", raw, 0); - assert_eq!(result.confidence, FilterConfidence::Full); - assert!(result.output.contains("Finished")); - assert!(result.output.contains("4 compile/fetch lines removed")); - assert!(!result.output.contains("Compiling")); - } - - #[test] - fn filters_audit_noise() { - let f = make_filter(); - let raw = " Fetching advisory database from `https://github.com/RustSec/advisory-db.git`\n Loaded 920 security advisories (from /Users/rabax/.cargo/advisory-db)\n Updating crates.io index\n0 vulnerabilities found"; - let result = f.filter("cargo audit", raw, 1); - assert_eq!(result.confidence, FilterConfidence::Full); - assert!(result.output.contains("3 compile/fetch lines removed")); - assert!(result.output.contains("0 vulnerabilities found")); - assert!(!result.output.contains("Fetching")); - } - - #[test] - fn truncates_long_tree_output() { - let f = make_filter(); - let mut lines = Vec::new(); - for i in 0..80 { - lines.push(format!("├── dep-{i} v0.1.{i}")); - } - let raw = lines.join("\n"); - let result = f.filter("cargo tree", &raw, 0); - assert_eq!(result.confidence, FilterConfidence::Partial); - assert!(result.output.contains("lines omitted")); - assert!(result.output.contains("dep-0")); - assert!(result.output.contains("dep-79")); - } - - #[test] - fn preserves_full_on_error() { - let f = make_filter(); - let raw = "error[E0308]: mismatched types\n --> src/main.rs:10:5"; - let result = f.filter("cargo build", raw, 1); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn passthrough_short_output() { - let f = make_filter(); - let raw = "some short output\nonly two lines"; - let result = f.filter("cargo build", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn keeps_non_noise_lines() { - let f = make_filter(); - let raw = " Compiling zeph-core v0.9.9\nwarning: unused import\n --> src/lib.rs:5:1\n Finished `dev` profile target(s) in 2.00s"; - let result = f.filter("cargo build", raw, 0); - assert!(result.output.contains("warning: unused import")); - assert!(result.output.contains("src/lib.rs:5:1")); - assert!(!result.output.contains("Compiling")); - } - - #[test] - fn cargo_build_filter_snapshot() { - let f = make_filter(); - let raw = "\ - Compiling zeph-core v0.11.0 - Compiling zeph-tools v0.11.0 - Compiling zeph-llm v0.11.0 -warning: unused import: `std::fmt` - --> crates/zeph-core/src/lib.rs:3:5 - | -3 | use std::fmt; - | ^^^^^^^^ - = note: `#[warn(unused_imports)]` on by default - Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.23s"; - let result = f.filter("cargo build", raw, 0); - insta::assert_snapshot!(result.output); - } - - #[test] - fn cargo_build_error_snapshot() { - let f = make_filter(); - let raw = "\ - Compiling zeph-core v0.11.0 -error[E0308]: mismatched types - --> crates/zeph-core/src/lib.rs:10:5 - | -10 | return 42; - | ^^ expected `()`, found integer -error: could not compile `zeph-core` due to 1 previous error"; - let result = f.filter("cargo build", raw, 1); - insta::assert_snapshot!(result.output); - } -} diff --git a/crates/zeph-tools/src/filter/clippy.rs b/crates/zeph-tools/src/filter/clippy.rs deleted file mode 100644 index bb3e464f..00000000 --- a/crates/zeph-tools/src/filter/clippy.rs +++ /dev/null @@ -1,203 +0,0 @@ -use std::collections::BTreeMap; -use std::fmt::Write; -use std::sync::LazyLock; - -use regex::Regex; - -use super::{ - ClippyFilterConfig, CommandMatcher, FilterConfidence, FilterResult, OutputFilter, - cargo_build::is_cargo_noise, make_result, -}; - -static CLIPPY_MATCHER: LazyLock = LazyLock::new(|| { - CommandMatcher::Custom(Box::new(|cmd| { - let c = cmd.to_lowercase(); - let tokens: Vec<&str> = c.split_whitespace().collect(); - tokens.first() == Some(&"cargo") && tokens.iter().skip(1).any(|t| *t == "clippy") - })) -}); - -static LINT_RULE_RE: LazyLock = - LazyLock::new(|| Regex::new(r"#\[warn\(([^)]+)\)\]").unwrap()); - -static LOCATION_RE: LazyLock = LazyLock::new(|| Regex::new(r"^\s*-->\s*(.+:\d+)").unwrap()); - -pub struct ClippyFilter; - -impl ClippyFilter { - #[must_use] - pub fn new(_config: ClippyFilterConfig) -> Self { - Self - } -} - -impl OutputFilter for ClippyFilter { - fn name(&self) -> &'static str { - "clippy" - } - - fn matcher(&self) -> &CommandMatcher { - &CLIPPY_MATCHER - } - - fn filter(&self, _command: &str, raw_output: &str, exit_code: i32) -> FilterResult { - let has_error = raw_output.contains("error[") || raw_output.contains("error:"); - if has_error && exit_code != 0 { - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - let mut warnings: BTreeMap> = BTreeMap::new(); - let mut pending_location: Option = None; - - for line in raw_output.lines() { - if let Some(caps) = LOCATION_RE.captures(line) { - pending_location = Some(caps[1].to_owned()); - } - - if let Some(caps) = LINT_RULE_RE.captures(line) { - let rule = caps[1].to_owned(); - if let Some(loc) = pending_location.take() { - warnings.entry(rule).or_default().push(loc); - } - } - } - - if warnings.is_empty() { - let kept: Vec<&str> = raw_output.lines().filter(|l| !is_cargo_noise(l)).collect(); - if kept.len() < raw_output.lines().count() { - let output = kept.join("\n"); - return make_result(raw_output, output, FilterConfidence::Partial); - } - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - let total: usize = warnings.values().map(Vec::len).sum(); - let rules = warnings.len(); - let mut output = String::new(); - - for (rule, locations) in &warnings { - let count = locations.len(); - let label = if count == 1 { "warning" } else { "warnings" }; - let _ = writeln!(output, "{rule} ({count} {label}):"); - for loc in locations { - let _ = writeln!(output, " {loc}"); - } - output.push('\n'); - } - let _ = write!(output, "{total} warnings total ({rules} rules)"); - - make_result(raw_output, output, FilterConfidence::Full) - } -} - -#[cfg(test)] -mod tests { - use super::*; - - fn make_filter() -> ClippyFilter { - ClippyFilter::new(ClippyFilterConfig::default()) - } - - #[test] - fn matches_clippy() { - let f = make_filter(); - assert!(f.matcher().matches("cargo clippy --workspace")); - assert!(f.matcher().matches("cargo clippy -- -D warnings")); - assert!(f.matcher().matches("cargo +nightly clippy")); - assert!(!f.matcher().matches("cargo build")); - assert!(!f.matcher().matches("cargo test")); - } - - #[test] - fn filter_groups_warnings() { - let f = make_filter(); - let raw = "\ -warning: needless pass by value - --> src/foo.rs:12:5 - | - = help: ... - = note: `#[warn(clippy::needless_pass_by_value)]` on by default - -warning: needless pass by value - --> src/bar.rs:45:10 - | - = help: ... - = note: `#[warn(clippy::needless_pass_by_value)]` on by default - -warning: unused import - --> src/main.rs:5:1 - | - = note: `#[warn(clippy::unused_imports)]` on by default - -warning: `my-crate` (lib) generated 3 warnings -"; - let result = f.filter("cargo clippy", raw, 0); - assert!( - result - .output - .contains("clippy::needless_pass_by_value (2 warnings):") - ); - assert!(result.output.contains("src/foo.rs:12")); - assert!(result.output.contains("src/bar.rs:45")); - assert!( - result - .output - .contains("clippy::unused_imports (1 warning):") - ); - assert!(result.output.contains("3 warnings total (2 rules)")); - assert_eq!(result.confidence, FilterConfidence::Full); - } - - #[test] - fn filter_error_preserves_full() { - let f = make_filter(); - let raw = "error[E0308]: mismatched types\n --> src/main.rs:10:5\nfull details here"; - let result = f.filter("cargo clippy", raw, 1); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn filter_no_warnings_strips_noise() { - let f = make_filter(); - let raw = "Checking my-crate v0.1.0\n Finished dev [unoptimized] target(s)"; - let result = f.filter("cargo clippy", raw, 0); - assert!(result.output.is_empty()); - assert_eq!(result.confidence, FilterConfidence::Partial); - } - - #[test] - fn clippy_grouped_warnings_snapshot() { - let f = make_filter(); - let raw = "\ -warning: needless pass by value - --> src/foo.rs:12:5 - | - = help: use a reference instead - = note: `#[warn(clippy::needless_pass_by_value)]` on by default - -warning: needless pass by value - --> src/bar.rs:45:10 - | - = help: use a reference instead - = note: `#[warn(clippy::needless_pass_by_value)]` on by default - -warning: unused import - --> src/main.rs:5:1 - | - = note: `#[warn(clippy::unused_imports)]` on by default - -warning: `my-crate` (lib) generated 3 warnings -"; - let result = f.filter("cargo clippy", raw, 0); - insta::assert_snapshot!(result.output); - } -} diff --git a/crates/zeph-tools/src/filter/declarative.rs b/crates/zeph-tools/src/filter/declarative.rs new file mode 100644 index 00000000..03e69417 --- /dev/null +++ b/crates/zeph-tools/src/filter/declarative.rs @@ -0,0 +1,2314 @@ +//! Declarative TOML-based output filter engine. +//! +//! Loads filter rules from a TOML file and compiles them into [`OutputFilter`] +//! implementations at startup. + +use std::collections::{BTreeMap, HashMap}; +use std::fmt::Write as _; +use std::path::Path; + +use regex::{Regex, RegexBuilder}; +use serde::Deserialize; + +use super::{ + CommandMatcher, FilterConfidence, FilterResult, OutputFilter, make_result, sanitize_output, +}; + +// --------------------------------------------------------------------------- +// Deserialization types +// --------------------------------------------------------------------------- + +#[derive(Deserialize)] +pub(crate) struct DeclarativeFilterFile { + #[serde(default)] + pub rules: Vec, +} + +#[derive(Deserialize)] +pub(crate) struct RuleConfig { + pub name: String, + #[serde(rename = "match")] + pub match_config: MatchConfig, + pub strategy: StrategyConfig, + #[serde(default = "super::default_true")] + pub enabled: bool, +} + +#[derive(Deserialize)] +#[serde(rename_all = "snake_case")] +pub(crate) struct MatchConfig { + pub exact: Option, + pub prefix: Option, + pub regex: Option, +} + +#[derive(Deserialize)] +pub(crate) struct NormalizeEntry { + pub pattern: String, + pub replacement: String, +} + +fn default_head() -> usize { + 20 +} + +fn default_tail() -> usize { + 20 +} + +fn default_long_threshold() -> usize { + 30 +} + +fn default_keep_head() -> usize { + 10 +} + +fn default_keep_tail() -> usize { + 5 +} + +fn default_max_failures() -> usize { + 10 +} + +fn default_truncate_stack_trace() -> usize { + 50 +} + +fn default_max_diff_lines() -> usize { + 500 +} + +fn default_max_unique() -> usize { + 10_000 +} + +fn default_normalize_patterns() -> Vec { + vec![ + NormalizeEntry { + pattern: r"\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}([.\d]*)?([Z+-][\d:]*)?".into(), + replacement: "".into(), + }, + NormalizeEntry { + pattern: r"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}".into(), + replacement: "".into(), + }, + NormalizeEntry { + pattern: r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}".into(), + replacement: "".into(), + }, + NormalizeEntry { + pattern: r"(?:port|pid|PID)[=: ]+\d+".into(), + replacement: "".into(), + }, + ] +} + +#[derive(Deserialize)] +#[serde(tag = "type", rename_all = "snake_case")] +pub(crate) enum StrategyConfig { + StripNoise { + patterns: Vec, + }, + Truncate { + max_lines: usize, + #[serde(default = "default_head")] + head: usize, + #[serde(default = "default_tail")] + tail: usize, + }, + KeepMatching { + patterns: Vec, + }, + StripAnnotated { + patterns: Vec, + #[serde(default)] + summary_pattern: Option, + #[serde(default = "default_long_threshold")] + long_output_threshold: usize, + #[serde(default = "default_keep_head")] + keep_head: usize, + #[serde(default = "default_keep_tail")] + keep_tail: usize, + }, + TestSummary { + #[serde(default = "default_max_failures")] + max_failures: usize, + #[serde(default = "default_truncate_stack_trace")] + truncate_stack_trace: usize, + }, + GroupByRule { + location_pattern: String, + rule_pattern: String, + }, + GitStatus {}, + GitDiff { + #[serde(default = "default_max_diff_lines")] + max_diff_lines: usize, + }, + Dedup { + #[serde(default = "default_normalize_patterns")] + normalize_patterns: Vec, + #[serde(default = "default_max_unique")] + max_unique_patterns: usize, + }, +} + +// --------------------------------------------------------------------------- +// Compiled runtime types +// --------------------------------------------------------------------------- + +pub(crate) enum CompiledStrategy { + StripNoise { + patterns: Vec, + }, + Truncate { + max_lines: usize, + head: usize, + tail: usize, + }, + KeepMatching { + patterns: Vec, + }, + StripAnnotated { + patterns: Vec, + summary_pattern: Option, + long_output_threshold: usize, + keep_head: usize, + keep_tail: usize, + }, + TestSummary { + max_failures: usize, + truncate_stack_trace: usize, + }, + GroupByRule { + location_re: Regex, + rule_re: Regex, + }, + GitStatus, + GitDiff { + max_diff_lines: usize, + }, + Dedup { + normalize_patterns: Vec<(Regex, String)>, + max_unique_patterns: usize, + }, +} + +pub(crate) struct DeclarativeFilter { + name: &'static str, + matcher: CommandMatcher, + strategy: CompiledStrategy, +} + +impl DeclarativeFilter { + pub fn compile(rule: RuleConfig) -> Result { + let name: &'static str = Box::leak(rule.name.into_boxed_str()); + let matcher = compile_match(&rule.match_config)?; + let strategy = compile_strategy(rule.strategy)?; + Ok(Self { + name, + matcher, + strategy, + }) + } +} + +fn compile_regex(pattern: &str) -> Result { + if pattern.len() > 512 { + return Err(format!("pattern '{pattern}': exceeds 512 character limit")); + } + RegexBuilder::new(pattern) + .size_limit(1 << 20) + .build() + .map_err(|e| format!("pattern '{pattern}': {e}")) +} + +fn compile_match(m: &MatchConfig) -> Result { + if let Some(ref exact) = m.exact { + let s: &'static str = Box::leak(exact.clone().into_boxed_str()); + Ok(CommandMatcher::Exact(s)) + } else if let Some(ref prefix) = m.prefix { + let s: &'static str = Box::leak(prefix.clone().into_boxed_str()); + Ok(CommandMatcher::Prefix(s)) + } else if let Some(ref regex) = m.regex { + if regex.len() > 512 { + return Err("regex pattern exceeds 512 character limit".into()); + } + let re = RegexBuilder::new(regex) + .size_limit(1 << 20) + .build() + .map_err(|e| format!("invalid regex: {e}"))?; + Ok(CommandMatcher::Regex(re)) + } else { + Err("match config must have exactly one of: exact, prefix, regex".into()) + } +} + +fn contains_unescaped_dollar(s: &str) -> bool { + let mut chars = s.chars().peekable(); + while let Some(c) = chars.next() { + if c == '\\' { + chars.next(); // skip escaped char + } else if c == '$' { + return true; + } + } + false +} + +fn compile_patterns(patterns: &[String]) -> Result, String> { + patterns + .iter() + .map(|p| compile_regex(p)) + .collect::, _>>() +} + +fn compile_dedup_entry(e: NormalizeEntry) -> Result<(Regex, String), String> { + if contains_unescaped_dollar(&e.replacement) { + return Err(format!( + "replacement '{}': unescaped '$' is not allowed (use plain text like )", + e.replacement + )); + } + compile_regex(&e.pattern).map(|re| (re, e.replacement)) +} + +fn compile_strategy(s: StrategyConfig) -> Result { + match s { + StrategyConfig::StripNoise { patterns } => { + if patterns.is_empty() { + tracing::warn!("rule has empty patterns list"); + return Err("strip_noise rule has empty patterns list".into()); + } + Ok(CompiledStrategy::StripNoise { + patterns: compile_patterns(&patterns)?, + }) + } + StrategyConfig::Truncate { + max_lines, + head, + tail, + } => { + if head + tail > max_lines { + return Err("head + tail must not exceed max_lines".into()); + } + Ok(CompiledStrategy::Truncate { + max_lines, + head, + tail, + }) + } + StrategyConfig::KeepMatching { patterns } => { + if patterns.is_empty() { + tracing::warn!("rule has empty patterns list"); + return Err("keep_matching rule has empty patterns list".into()); + } + Ok(CompiledStrategy::KeepMatching { + patterns: compile_patterns(&patterns)?, + }) + } + StrategyConfig::StripAnnotated { + patterns, + summary_pattern, + long_output_threshold, + keep_head, + keep_tail, + } => { + if patterns.is_empty() { + tracing::warn!("rule has empty patterns list"); + return Err("strip_annotated rule has empty patterns list".into()); + } + let summary_re = summary_pattern.as_deref().map(compile_regex).transpose()?; + Ok(CompiledStrategy::StripAnnotated { + patterns: compile_patterns(&patterns)?, + summary_pattern: summary_re, + long_output_threshold, + keep_head, + keep_tail, + }) + } + StrategyConfig::TestSummary { + max_failures, + truncate_stack_trace, + } => Ok(CompiledStrategy::TestSummary { + max_failures, + truncate_stack_trace, + }), + StrategyConfig::GroupByRule { + location_pattern, + rule_pattern, + } => { + let location_re = compile_regex(&location_pattern)?; + let rule_re = compile_regex(&rule_pattern)?; + Ok(CompiledStrategy::GroupByRule { + location_re, + rule_re, + }) + } + StrategyConfig::GitStatus {} => Ok(CompiledStrategy::GitStatus), + StrategyConfig::GitDiff { max_diff_lines } => { + Ok(CompiledStrategy::GitDiff { max_diff_lines }) + } + StrategyConfig::Dedup { + normalize_patterns, + max_unique_patterns, + } => { + let compiled = normalize_patterns + .into_iter() + .map(compile_dedup_entry) + .collect::, _>>()?; + Ok(CompiledStrategy::Dedup { + normalize_patterns: compiled, + max_unique_patterns, + }) + } + } +} + +// --------------------------------------------------------------------------- +// is_cargo_noise helper (used by GroupByRule) +// --------------------------------------------------------------------------- + +const CARGO_NOISE_PREFIXES: &[&str] = &[ + "Compiling ", + "Downloading ", + "Downloaded ", + "Updating ", + "Fetching ", + "Fresh ", + "Packaging ", + "Verifying ", + "Archiving ", + "Locking ", + "Adding ", + "Removing ", + "Checking ", + "Documenting ", + "Running ", + "Loaded ", + "Blocking ", + "Unpacking ", + "Finished ", +]; + +pub(crate) fn is_cargo_noise(line: &str) -> bool { + let trimmed = line.trim_start(); + CARGO_NOISE_PREFIXES.iter().any(|p| trimmed.starts_with(p)) +} + +// --------------------------------------------------------------------------- +// Strategy implementations +// --------------------------------------------------------------------------- + +fn apply_strip_annotated( + raw: &str, + patterns: &[Regex], + summary_pattern: Option<&Regex>, + long_output_threshold: usize, + keep_head: usize, + keep_tail: usize, + exit_code: i32, +) -> FilterResult { + let clean = sanitize_output(raw); + let mut noise_count = 0usize; + let mut kept: Vec<&str> = Vec::new(); + let mut summary_line: Option = None; + + for line in clean.lines() { + if summary_pattern.is_some_and(|sp| sp.is_match(line)) { + summary_line = Some(line.trim_start().to_owned()); + noise_count += 1; + continue; + } + if patterns.iter().any(|p| p.is_match(line)) { + noise_count += 1; + } else { + kept.push(line); + } + } + + if noise_count == 0 { + if exit_code != 0 { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + let lines: Vec<&str> = clean.lines().collect(); + if lines.len() > long_output_threshold { + return truncate_kept(raw, &lines, keep_head, keep_tail, FilterConfidence::Partial); + } + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let mut output = String::new(); + if let Some(ref fin) = summary_line { + let _ = writeln!(output, "{fin}"); + } + let _ = writeln!(output, "({noise_count} noise lines removed)"); + if !kept.is_empty() { + output.push('\n'); + if kept.len() > long_output_threshold { + let actual_head = keep_head.min(kept.len()); + let actual_tail = keep_tail.min(kept.len().saturating_sub(actual_head)); + let omitted = kept.len() - actual_head - actual_tail; + for line in &kept[..actual_head] { + let _ = writeln!(output, "{line}"); + } + let _ = writeln!(output, "\n... ({omitted} lines omitted) ...\n"); + for line in &kept[kept.len() - actual_tail..] { + let _ = writeln!(output, "{line}"); + } + } else { + for line in &kept { + let _ = writeln!(output, "{line}"); + } + } + } + make_result(raw, output.trim_end().to_owned(), FilterConfidence::Full) +} + +fn truncate_kept( + raw: &str, + lines: &[&str], + keep_head: usize, + keep_tail: usize, + confidence: FilterConfidence, +) -> FilterResult { + let total = lines.len(); + let omitted = total - keep_head - keep_tail; + let mut output = String::new(); + for line in &lines[..keep_head] { + let _ = writeln!(output, "{line}"); + } + let _ = writeln!(output, "\n... ({omitted} lines omitted) ...\n"); + for line in &lines[total - keep_tail..] { + let _ = writeln!(output, "{line}"); + } + make_result(raw, output.trim_end().to_owned(), confidence) +} + +fn apply_test_summary( + raw: &str, + exit_code: i32, + max_failures: usize, + truncate_stack_trace: usize, +) -> FilterResult { + let mut passed = 0u64; + let mut failed = 0u64; + let mut ignored = 0u64; + let mut filtered_out = 0u64; + let mut failure_blocks: Vec = Vec::new(); + let mut in_failure_block = false; + let mut current_block = String::new(); + let mut has_summary = false; + + for line in raw.lines() { + let trimmed = line.trim(); + + if trimmed.starts_with("FAIL [") || trimmed.starts_with("FAIL [") { + failed += 1; + continue; + } + if trimmed.starts_with("PASS [") || trimmed.starts_with("PASS [") { + passed += 1; + continue; + } + + if trimmed.starts_with("---- ") && trimmed.ends_with(" stdout ----") { + in_failure_block = true; + current_block.clear(); + current_block.push_str(line); + current_block.push('\n'); + continue; + } + + if in_failure_block { + current_block.push_str(line); + current_block.push('\n'); + if trimmed == "failures:" || trimmed.starts_with("---- ") { + failure_blocks.push(current_block.clone()); + in_failure_block = trimmed.starts_with("---- "); + if in_failure_block { + current_block.clear(); + current_block.push_str(line); + current_block.push('\n'); + } + } + continue; + } + + if trimmed == "failures:" && !current_block.is_empty() { + failure_blocks.push(current_block.clone()); + current_block.clear(); + } + + if trimmed.starts_with("test result:") { + has_summary = true; + for part in trimmed.split(';') { + let part = part.trim(); + if let Some(n) = extract_count(part, "passed") { + passed += n; + } else if let Some(n) = extract_count(part, "failed") { + failed += n; + } else if let Some(n) = extract_count(part, "ignored") { + ignored += n; + } else if let Some(n) = extract_count(part, "filtered out") { + filtered_out += n; + } + } + } + + if trimmed.contains("tests run:") { + has_summary = true; + } + } + + if in_failure_block && !current_block.is_empty() { + failure_blocks.push(current_block); + } + + if !has_summary && passed == 0 && failed == 0 { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let mut output = String::new(); + + if exit_code != 0 && !failure_blocks.is_empty() { + output.push_str("FAILURES:\n\n"); + for block in failure_blocks.iter().take(max_failures) { + let lines: Vec<&str> = block.lines().collect(); + if lines.len() > truncate_stack_trace { + for line in &lines[..truncate_stack_trace] { + output.push_str(line); + output.push('\n'); + } + let remaining = lines.len() - truncate_stack_trace; + let _ = writeln!(output, "... ({remaining} more lines)"); + } else { + output.push_str(block); + } + output.push('\n'); + } + if failure_blocks.len() > max_failures { + let _ = writeln!( + output, + "... and {} more failure(s)", + failure_blocks.len() - max_failures + ); + } + } + + let status = if failed > 0 { "FAILED" } else { "ok" }; + let _ = write!( + output, + "test result: {status}. {passed} passed; {failed} failed; \ + {ignored} ignored; {filtered_out} filtered out" + ); + + make_result(raw, output, FilterConfidence::Full) +} + +fn extract_count(s: &str, label: &str) -> Option { + let idx = s.find(label)?; + let before = s[..idx].trim(); + let num_str = before.rsplit_once(' ').map_or(before, |(_, n)| n); + let num_str = num_str.trim_end_matches('.'); + let num_str = num_str.rsplit('.').next().unwrap_or(num_str).trim(); + num_str.parse().ok() +} + +fn apply_group_by_rule( + raw: &str, + exit_code: i32, + location_re: &Regex, + rule_re: &Regex, +) -> FilterResult { + let has_error = raw.contains("error[") || raw.contains("error:"); + if has_error && exit_code != 0 { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let mut warnings: BTreeMap> = BTreeMap::new(); + let mut pending_location: Option = None; + + for line in raw.lines() { + if let Some(caps) = location_re.captures(line) { + pending_location = Some(caps[1].to_owned()); + } + if let Some(caps) = rule_re.captures(line) { + let rule = caps[1].to_owned(); + if let Some(loc) = pending_location.take() { + warnings.entry(rule).or_default().push(loc); + } + } + } + + if warnings.is_empty() { + let kept: Vec<&str> = raw.lines().filter(|l| !is_cargo_noise(l)).collect(); + if kept.len() < raw.lines().count() { + let output = kept.join("\n"); + return make_result(raw, output, FilterConfidence::Partial); + } + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let total: usize = warnings.values().map(Vec::len).sum(); + let rules = warnings.len(); + let mut output = String::new(); + + for (rule, locations) in &warnings { + let count = locations.len(); + let label = if count == 1 { "warning" } else { "warnings" }; + let _ = writeln!(output, "{rule} ({count} {label}):"); + for loc in locations { + let _ = writeln!(output, " {loc}"); + } + output.push('\n'); + } + let _ = write!(output, "{total} warnings total ({rules} rules)"); + + make_result(raw, output, FilterConfidence::Full) +} + +fn apply_git_status(raw: &str) -> FilterResult { + let mut modified = 0u32; + let mut added = 0u32; + let mut deleted = 0u32; + let mut untracked = 0u32; + + for line in raw.lines() { + let trimmed = line.trim(); + if trimmed.starts_with("M ") || trimmed.starts_with("MM") || trimmed.starts_with(" M") { + modified += 1; + } else if trimmed.starts_with("A ") || trimmed.starts_with("AM") { + added += 1; + } else if trimmed.starts_with("D ") || trimmed.starts_with(" D") { + deleted += 1; + } else if trimmed.starts_with("??") { + untracked += 1; + } else if trimmed.starts_with("modified:") { + modified += 1; + } else if trimmed.starts_with("new file:") { + added += 1; + } else if trimmed.starts_with("deleted:") { + deleted += 1; + } + } + + let total = modified + added + deleted + untracked; + if total == 0 { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let mut output = String::new(); + let _ = write!( + output, + "M {modified} files | A {added} files | D {deleted} files | ?? {untracked} files" + ); + make_result(raw, output, FilterConfidence::Full) +} + +fn apply_git_diff(raw: &str, max_diff_lines: usize) -> FilterResult { + let mut files: Vec<(String, i32, i32)> = Vec::new(); + let mut current_file = String::new(); + let mut additions = 0i32; + let mut deletions = 0i32; + + for line in raw.lines() { + if line.starts_with("diff --git ") { + if !current_file.is_empty() { + files.push((current_file.clone(), additions, deletions)); + } + line.strip_prefix("diff --git a/") + .and_then(|s| s.split(" b/").next()) + .unwrap_or("unknown") + .clone_into(&mut current_file); + additions = 0; + deletions = 0; + } else if line.starts_with('+') && !line.starts_with("+++") { + additions += 1; + } else if line.starts_with('-') && !line.starts_with("---") { + deletions += 1; + } + } + if !current_file.is_empty() { + files.push((current_file, additions, deletions)); + } + + if files.is_empty() { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let total_lines: usize = raw.lines().count(); + let total_add: i32 = files.iter().map(|(_, a, _)| a).sum(); + let total_del: i32 = files.iter().map(|(_, _, d)| d).sum(); + let mut output = String::new(); + for (file, add, del) in &files { + let _ = writeln!(output, "{file} | +{add} -{del}"); + } + let _ = write!( + output, + "{} files changed, {} insertions(+), {} deletions(-)", + files.len(), + total_add, + total_del + ); + if total_lines > max_diff_lines { + let _ = write!(output, " (truncated from {total_lines} lines)"); + } + make_result(raw, output, FilterConfidence::Full) +} + +fn apply_dedup( + raw: &str, + normalize_patterns: &[(Regex, String)], + max_unique_patterns: usize, +) -> FilterResult { + let lines: Vec<&str> = raw.lines().collect(); + if lines.len() < 3 { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let mut pattern_counts: HashMap = + HashMap::with_capacity(max_unique_patterns.min(4096)); + let mut order: Vec = Vec::new(); + let mut capped = false; + + for line in &lines { + let normalized = dedup_normalize(line, normalize_patterns); + if let Some(entry) = pattern_counts.get_mut(&normalized) { + entry.0 += 1; + } else if pattern_counts.len() < max_unique_patterns { + order.push(normalized.clone()); + pattern_counts.insert(normalized, (1, (*line).to_owned())); + } else { + capped = true; + } + } + + let unique = order.len(); + let total = lines.len(); + + if unique == total && !capped { + return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); + } + + let mut output = String::new(); + for key in &order { + let (count, example) = &pattern_counts[key]; + if *count > 1 { + let _ = writeln!(output, "{example} (x{count})"); + } else { + let _ = writeln!(output, "{example}"); + } + } + let _ = write!(output, "{unique} unique patterns ({total} total lines)"); + if capped { + let _ = write!(output, " (capped at {max_unique_patterns})"); + } + + make_result(raw, output, FilterConfidence::Full) +} + +fn dedup_normalize(line: &str, patterns: &[(Regex, String)]) -> String { + let mut s = line.to_owned(); + for (re, replacement) in patterns { + s = re.replace_all(&s, replacement.as_str()).into_owned(); + } + s +} + +// --------------------------------------------------------------------------- +// OutputFilter impl +// --------------------------------------------------------------------------- + +impl OutputFilter for DeclarativeFilter { + fn name(&self) -> &'static str { + self.name + } + + fn matcher(&self) -> &CommandMatcher { + &self.matcher + } + + fn filter(&self, _command: &str, raw_output: &str, exit_code: i32) -> FilterResult { + let clean = sanitize_output(raw_output); + match &self.strategy { + CompiledStrategy::StripNoise { patterns } => { + let filtered: String = clean + .lines() + .filter(|line| !patterns.iter().any(|p| p.is_match(line))) + .collect::>() + .join("\n"); + if filtered.len() < clean.len() { + make_result(raw_output, filtered, FilterConfidence::Full) + } else { + make_result(raw_output, clean, FilterConfidence::Fallback) + } + } + CompiledStrategy::Truncate { + max_lines, + head, + tail, + } => { + let lines: Vec<&str> = clean.lines().collect(); + if lines.len() <= *max_lines { + return make_result(raw_output, clean, FilterConfidence::Fallback); + } + let omitted = lines.len() - head - tail; + let mut output = String::new(); + for line in &lines[..*head] { + output.push_str(line); + output.push('\n'); + } + let _ = write!(output, "\n... ({omitted} lines omitted) ...\n\n"); + for line in &lines[lines.len() - tail..] { + output.push_str(line); + output.push('\n'); + } + make_result( + raw_output, + output.trim_end().to_owned(), + FilterConfidence::Partial, + ) + } + CompiledStrategy::KeepMatching { patterns } => { + let kept: Vec<&str> = clean + .lines() + .filter(|line| patterns.iter().any(|p| p.is_match(line))) + .collect(); + if kept.is_empty() { + return make_result(raw_output, clean, FilterConfidence::Fallback); + } + make_result(raw_output, kept.join("\n"), FilterConfidence::Full) + } + CompiledStrategy::StripAnnotated { + patterns, + summary_pattern, + long_output_threshold, + keep_head, + keep_tail, + } => apply_strip_annotated( + raw_output, + patterns, + summary_pattern.as_ref(), + *long_output_threshold, + *keep_head, + *keep_tail, + exit_code, + ), + CompiledStrategy::TestSummary { + max_failures, + truncate_stack_trace, + } => apply_test_summary(raw_output, exit_code, *max_failures, *truncate_stack_trace), + CompiledStrategy::GroupByRule { + location_re, + rule_re, + } => apply_group_by_rule(raw_output, exit_code, location_re, rule_re), + CompiledStrategy::GitStatus => apply_git_status(raw_output), + CompiledStrategy::GitDiff { max_diff_lines } => { + apply_git_diff(raw_output, *max_diff_lines) + } + CompiledStrategy::Dedup { + normalize_patterns, + max_unique_patterns, + } => apply_dedup(raw_output, normalize_patterns, *max_unique_patterns), + } + } +} + +// --------------------------------------------------------------------------- +// Loading +// --------------------------------------------------------------------------- + +/// Load declarative filters from `config_dir/filters.toml`, falling back to +/// embedded defaults when the file is absent or `config_dir` is `None`. +pub(crate) fn load_declarative_filters(config_dir: Option<&Path>) -> Vec> { + let file_content = if let Some(dir) = config_dir { + let path = dir.join("filters.toml"); + let load_result = std::fs::metadata(&path) + .map_err(|e| e.to_string()) + .and_then(|meta| { + if meta.len() >= 1_048_576 { + Err(format!( + "filters.toml exceeds 1 MiB limit ({} bytes)", + meta.len() + )) + } else { + std::fs::read_to_string(&path).map_err(|e| e.to_string()) + } + }); + match load_result { + Ok(content) => { + tracing::debug!(path = %path.display(), "loaded user filters.toml"); + content + } + Err(e) => { + tracing::warn!(path = %path.display(), "failed to load filters.toml: {e}"); + include_str!("default-filters.toml").to_owned() + } + } + } else { + include_str!("default-filters.toml").to_owned() + }; + + let parsed: DeclarativeFilterFile = match toml::from_str(&file_content) { + Ok(f) => f, + Err(e) => { + tracing::warn!("failed to parse filters.toml: {e}"); + return Vec::new(); + } + }; + + let mut filters: Vec> = Vec::new(); + for rule in parsed.rules { + if !rule.enabled { + continue; + } + let name = rule.name.clone(); + match DeclarativeFilter::compile(rule) { + Ok(f) => filters.push(Box::new(f)), + Err(e) => tracing::warn!("skipping rule '{name}': {e}"), + } + } + filters +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + + fn strip_noise_filter(patterns: &[&str]) -> DeclarativeFilter { + DeclarativeFilter { + name: "test-strip", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::StripNoise { + patterns: patterns.iter().map(|p| Regex::new(p).unwrap()).collect(), + }, + } + } + + fn truncate_filter(max_lines: usize, head: usize, tail: usize) -> DeclarativeFilter { + DeclarativeFilter { + name: "test-truncate", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::Truncate { + max_lines, + head, + tail, + }, + } + } + + fn keep_matching_filter(patterns: &[&str]) -> DeclarativeFilter { + DeclarativeFilter { + name: "test-keep", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::KeepMatching { + patterns: patterns.iter().map(|p| Regex::new(p).unwrap()).collect(), + }, + } + } + + fn strip_annotated_filter( + patterns: &[&str], + summary_pattern: Option<&str>, + ) -> DeclarativeFilter { + DeclarativeFilter { + name: "test-annotated", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::StripAnnotated { + patterns: patterns.iter().map(|p| Regex::new(p).unwrap()).collect(), + summary_pattern: summary_pattern.map(|p| Regex::new(p).unwrap()), + long_output_threshold: 30, + keep_head: 10, + keep_tail: 5, + }, + } + } + + fn test_summary_filter() -> DeclarativeFilter { + DeclarativeFilter { + name: "test-summary", + matcher: CommandMatcher::Prefix("cargo test"), + strategy: CompiledStrategy::TestSummary { + max_failures: 10, + truncate_stack_trace: 50, + }, + } + } + + fn group_by_rule_filter(location_pattern: &str, rule_pattern: &str) -> DeclarativeFilter { + DeclarativeFilter { + name: "test-group", + matcher: CommandMatcher::Prefix("cargo clippy"), + strategy: CompiledStrategy::GroupByRule { + location_re: Regex::new(location_pattern).unwrap(), + rule_re: Regex::new(rule_pattern).unwrap(), + }, + } + } + + fn git_status_filter() -> DeclarativeFilter { + DeclarativeFilter { + name: "test-git-status", + matcher: CommandMatcher::Prefix("git status"), + strategy: CompiledStrategy::GitStatus, + } + } + + fn git_diff_filter(max_diff_lines: usize) -> DeclarativeFilter { + DeclarativeFilter { + name: "test-git-diff", + matcher: CommandMatcher::Prefix("git diff"), + strategy: CompiledStrategy::GitDiff { max_diff_lines }, + } + } + + fn dedup_filter() -> DeclarativeFilter { + DeclarativeFilter { + name: "test-dedup", + matcher: CommandMatcher::Prefix("journalctl"), + strategy: CompiledStrategy::Dedup { + normalize_patterns: vec![ + ( + Regex::new( + r"\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}([.\d]*)?([Z+-][\d:]*)?", + ) + .unwrap(), + "".into(), + ), + ( + Regex::new(r"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}") + .unwrap(), + "".into(), + ), + ( + Regex::new(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}").unwrap(), + "".into(), + ), + ( + Regex::new(r"(?:port|pid|PID)[=: ]+\d+").unwrap(), + "".into(), + ), + ], + max_unique_patterns: 10_000, + }, + } + } + + // --- compile_match --- + + #[test] + fn compile_match_exact() { + let m = MatchConfig { + exact: Some("ls".into()), + prefix: None, + regex: None, + }; + let matcher = compile_match(&m).unwrap(); + assert!(matches!(matcher, CommandMatcher::Exact("ls"))); + } + + #[test] + fn compile_match_prefix() { + let m = MatchConfig { + exact: None, + prefix: Some("docker ".into()), + regex: None, + }; + let matcher = compile_match(&m).unwrap(); + assert!(matches!(matcher, CommandMatcher::Prefix(_))); + assert!(matcher.matches("docker build .")); + } + + #[test] + fn compile_match_regex() { + let m = MatchConfig { + exact: None, + prefix: None, + regex: Some(r"^npm\s+install".into()), + }; + let matcher = compile_match(&m).unwrap(); + assert!(matcher.matches("npm install")); + assert!(!matcher.matches("yarn install")); + } + + #[test] + fn compile_match_invalid_regex_returns_error() { + let m = MatchConfig { + exact: None, + prefix: None, + regex: Some("[invalid".into()), + }; + assert!(compile_match(&m).is_err()); + } + + #[test] + fn compile_match_empty_returns_error() { + let m = MatchConfig { + exact: None, + prefix: None, + regex: None, + }; + assert!(compile_match(&m).is_err()); + } + + // --- compile_strategy --- + + #[test] + fn compile_strategy_strip_noise_valid() { + let s = StrategyConfig::StripNoise { + patterns: vec![r"^\s*$".into(), r"^noise".into()], + }; + let compiled = compile_strategy(s).unwrap(); + assert!(matches!(compiled, CompiledStrategy::StripNoise { .. })); + } + + #[test] + fn compile_strategy_strip_noise_invalid_pattern() { + let s = StrategyConfig::StripNoise { + patterns: vec!["[broken".into()], + }; + assert!(compile_strategy(s).is_err()); + } + + #[test] + fn compile_strategy_truncate_valid() { + let s = StrategyConfig::Truncate { + max_lines: 50, + head: 10, + tail: 10, + }; + let compiled = compile_strategy(s).unwrap(); + assert!(matches!( + compiled, + CompiledStrategy::Truncate { + max_lines: 50, + head: 10, + tail: 10 + } + )); + } + + #[test] + fn compile_strategy_truncate_head_tail_exceeds_max() { + let s = StrategyConfig::Truncate { + max_lines: 10, + head: 8, + tail: 5, + }; + assert!(compile_strategy(s).is_err()); + } + + #[test] + fn compile_strategy_keep_matching_valid() { + let s = StrategyConfig::KeepMatching { + patterns: vec!["->".into(), r"^To ".into()], + }; + assert!(compile_strategy(s).is_ok()); + } + + #[test] + fn compile_strategy_group_by_rule_invalid_regex() { + let s = StrategyConfig::GroupByRule { + location_pattern: "[broken".into(), + rule_pattern: r"#\[warn\(([^)]+)\)\]".into(), + }; + assert!(compile_strategy(s).is_err()); + } + + // --- DeclarativeFilter::filter (strip_noise) --- + + #[test] + fn strip_noise_removes_matching_lines() { + let f = strip_noise_filter(&[r"^noise:", r"^\s*$"]); + let raw = "noise: ignore this\nkeep this\nnoise: also ignore\nkeep too"; + let result = f.filter("cmd", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(result.output.contains("keep this")); + assert!(result.output.contains("keep too")); + assert!(!result.output.contains("noise:")); + } + + #[test] + fn strip_noise_returns_fallback_when_nothing_removed() { + let f = strip_noise_filter(&[r"^NOMATCH"]); + let raw = "line one\nline two"; + let result = f.filter("cmd", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + assert!(result.output.contains("line one")); + } + + #[test] + fn strip_noise_strips_ansi_before_matching() { + let f = strip_noise_filter(&[r"^noise"]); + let raw = "\x1b[32mnoise\x1b[0m: colored noise\nclean line"; + let result = f.filter("cmd", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(!result.output.contains("noise")); + assert!(result.output.contains("clean line")); + } + + // --- DeclarativeFilter::filter (truncate) --- + + #[test] + fn truncate_short_output_passthrough() { + let f = truncate_filter(50, 10, 10); + let raw = "line1\nline2\nline3"; + let result = f.filter("cmd", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + assert!(result.output.contains("line1")); + assert!(result.output.contains("line3")); + } + + #[test] + fn truncate_long_output_applies_head_tail() { + let f = truncate_filter(10, 3, 3); + let lines: Vec = (0..20).map(|i| format!("line {i}")).collect(); + let raw = lines.join("\n"); + let result = f.filter("cmd", &raw, 0); + assert_eq!(result.confidence, FilterConfidence::Partial); + assert!(result.output.contains("line 0")); + assert!(result.output.contains("line 1")); + assert!(result.output.contains("line 2")); + assert!(result.output.contains("line 17")); + assert!(result.output.contains("line 18")); + assert!(result.output.contains("line 19")); + assert!(result.output.contains("lines omitted")); + assert!(!result.output.contains("line 3")); + } + + #[test] + fn truncate_omitted_count_correct() { + let f = truncate_filter(10, 2, 2); + let lines: Vec = (0..20).map(|i| format!("L{i}")).collect(); + let raw = lines.join("\n"); + let result = f.filter("cmd", &raw, 0); + assert!(result.output.contains("16 lines omitted")); + } + + // --- keep_matching --- + + #[test] + fn keep_matching_keeps_only_matching_lines() { + let f = keep_matching_filter(&["->", r"^To "]); + let raw = "\ +Enumerating objects: 5, done. +To github.com:user/repo.git + abc1234..def5678 main -> main +"; + let result = f.filter("cmd", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(result.output.contains("->")); + assert!(result.output.contains("To github.com")); + assert!(!result.output.contains("Enumerating")); + } + + #[test] + fn keep_matching_fallback_when_nothing_matches() { + let f = keep_matching_filter(&[r"^NOMATCH"]); + let raw = "some output\nno matches here"; + let result = f.filter("cmd", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- strip_annotated --- + + #[test] + fn strip_annotated_removes_noise_with_count() { + let f = strip_annotated_filter( + &[r"^\s*Compiling ", r"^\s*Checking "], + Some(r"^\s*Finished "), + ); + let raw = " Compiling serde v1.0\n Checking foo\n Finished dev in 1s\nerror: oops"; + let result = f.filter("cargo build", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(result.output.contains("noise lines removed")); + assert!(result.output.contains("Finished")); + assert!(!result.output.contains("Compiling")); + } + + #[test] + fn strip_annotated_passthrough_on_error_no_noise() { + let f = strip_annotated_filter(&[r"^\s*Compiling "], None); + let raw = "error[E0308]: mismatched types\n --> src/main.rs:10:5"; + let result = f.filter("cargo build", raw, 1); + assert_eq!(result.confidence, FilterConfidence::Fallback); + assert_eq!(result.output, raw); + } + + #[test] + fn strip_annotated_passthrough_short_no_noise() { + let f = strip_annotated_filter(&[r"^\s*Compiling "], None); + let raw = "short output\nno noise"; + let result = f.filter("cargo build", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- test_summary --- + + #[test] + fn test_summary_success_compresses() { + let f = test_summary_filter(); + let raw = "\ +running 3 tests +test foo::test_a ... ok +test foo::test_b ... ok +test foo::test_c ... ok + +test result: ok. 3 passed; 0 failed; 0 ignored; 0 filtered out; finished in 0.01s +"; + let result = f.filter("cargo test", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(result.output.contains("3 passed")); + assert!(result.output.contains("0 failed")); + assert!(!result.output.contains("test_a")); + assert!(result.savings_pct() > 30.0); + } + + #[test] + fn test_summary_failure_preserves_details() { + let f = test_summary_filter(); + let raw = "\ +running 2 tests +test foo::test_a ... ok +test foo::test_b ... FAILED + +---- foo::test_b stdout ---- +thread 'foo::test_b' panicked at 'assertion failed: false' + +failures: + foo::test_b + +test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 filtered out; finished in 0.01s +"; + let result = f.filter("cargo test", raw, 1); + assert!(result.output.contains("FAILURES:")); + assert!(result.output.contains("assertion failed")); + assert!(result.output.contains("1 failed")); + } + + #[test] + fn test_summary_no_summary_passthrough() { + let f = test_summary_filter(); + let raw = "some random output with no test results"; + let result = f.filter("cargo test", raw, 0); + assert_eq!(result.output, raw); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- group_by_rule (clippy) --- + + #[test] + fn group_by_rule_groups_warnings() { + let f = group_by_rule_filter(r"^\s*-->\s*(.+:\d+)", r"#\[warn\(([^)]+)\)\]"); + let raw = "\ +warning: needless pass by value + --> src/foo.rs:12:5 + | + = note: `#[warn(clippy::needless_pass_by_value)]` on by default + +warning: needless pass by value + --> src/bar.rs:45:10 + | + = note: `#[warn(clippy::needless_pass_by_value)]` on by default + +warning: unused import + --> src/main.rs:5:1 + | + = note: `#[warn(clippy::unused_imports)]` on by default +"; + let result = f.filter("cargo clippy", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!( + result + .output + .contains("clippy::needless_pass_by_value (2 warnings):") + ); + assert!(result.output.contains("src/foo.rs:12")); + assert!( + result + .output + .contains("clippy::unused_imports (1 warning):") + ); + assert!(result.output.contains("3 warnings total (2 rules)")); + } + + #[test] + fn group_by_rule_error_passthrough() { + let f = group_by_rule_filter(r"^\s*-->\s*(.+:\d+)", r"#\[warn\(([^)]+)\)\]"); + let raw = "error[E0308]: mismatched types\n --> src/main.rs:10:5\nfull details here"; + let result = f.filter("cargo clippy", raw, 1); + assert_eq!(result.output, raw); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn group_by_rule_no_warnings_strips_cargo_noise() { + let f = group_by_rule_filter(r"^\s*-->\s*(.+:\d+)", r"#\[warn\(([^)]+)\)\]"); + let raw = "Checking my-crate v0.1.0\n Finished dev [unoptimized] target(s)"; + let result = f.filter("cargo clippy", raw, 0); + assert!(result.output.is_empty()); + assert_eq!(result.confidence, FilterConfidence::Partial); + } + + // --- git_status --- + + #[test] + fn git_status_summarizes_short_format() { + let f = git_status_filter(); + let raw = " M src/main.rs\n M src/lib.rs\n?? new_file.txt\nA added.rs\n"; + let result = f.filter("git status --short", raw, 0); + assert!(result.output.contains("M 2 files")); + assert!(result.output.contains("?? 1 files")); + assert!(result.output.contains("A 1 files")); + assert_eq!(result.confidence, FilterConfidence::Full); + } + + #[test] + fn git_status_summarizes_long_format() { + let f = git_status_filter(); + let raw = "\ +On branch main +Changes not staged for commit: + modified: src/main.rs + modified: src/lib.rs + deleted: old_file.rs + +Untracked files: + new_file.txt +"; + let result = f.filter("git status", raw, 0); + assert!(result.output.contains("M 2 files")); + assert!(result.output.contains("D 1 files")); + } + + #[test] + fn git_status_empty_fallback() { + let f = git_status_filter(); + let raw = "nothing to commit, working tree clean"; + let result = f.filter("git status", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- git_diff --- + + #[test] + fn git_diff_compresses() { + let f = git_diff_filter(500); + let raw = "\ +diff --git a/src/main.rs b/src/main.rs +index abc..def 100644 +--- a/src/main.rs ++++ b/src/main.rs ++new line 1 ++new line 2 +-old line 1 +diff --git a/src/lib.rs b/src/lib.rs +index ghi..jkl 100644 +--- a/src/lib.rs ++++ b/src/lib.rs ++added +"; + let result = f.filter("git diff", raw, 0); + assert!(result.output.contains("src/main.rs")); + assert!(result.output.contains("src/lib.rs")); + assert!(result.output.contains("2 files changed")); + assert!(result.output.contains("3 insertions(+)")); + assert!(result.output.contains("1 deletions(-)")); + assert_eq!(result.confidence, FilterConfidence::Full); + } + + #[test] + fn git_diff_empty_fallback() { + let f = git_diff_filter(500); + let result = f.filter("git diff", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn git_diff_truncation_note() { + let f = git_diff_filter(5); + // Build a diff with more than 5 lines + let mut raw = "diff --git a/f b/f\n--- a/f\n+++ b/f\n".to_owned(); + for i in 0..10 { + raw.push_str(&format!("+line {i}\n")); + } + let result = f.filter("git diff", &raw, 0); + assert!(result.output.contains("truncated from")); + } + + // --- dedup --- + + #[test] + fn dedup_deduplicates_log_lines() { + let f = dedup_filter(); + let raw = "\ +2024-01-15T12:00:01Z INFO request handled path=/api/health +2024-01-15T12:00:02Z INFO request handled path=/api/health +2024-01-15T12:00:03Z INFO request handled path=/api/health +2024-01-15T12:00:04Z WARN connection timeout addr=10.0.0.1 +2024-01-15T12:00:05Z WARN connection timeout addr=10.0.0.2 +2024-01-15T12:00:06Z ERROR database unreachable +"; + let result = f.filter("journalctl -u app", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(result.output.contains("(x3)")); + assert!(result.output.contains("(x2)")); + assert!(result.output.contains("3 unique patterns (6 total lines)")); + assert!(result.savings_pct() > 20.0); + } + + #[test] + fn dedup_all_unique_fallback() { + let f = dedup_filter(); + let raw = "line one\nline two\nline three"; + let result = f.filter("cat app.log", raw, 0); + assert_eq!(result.output, raw); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn dedup_short_fallback() { + let f = dedup_filter(); + let raw = "single line"; + let result = f.filter("cat app.log", raw, 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn dedup_normalize_replaces_patterns() { + let patterns = vec![ + ( + Regex::new(r"\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}([.\d]*)?([Z+-][\d:]*)?") + .unwrap(), + "".into(), + ), + ( + Regex::new(r"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}") + .unwrap(), + "".into(), + ), + ( + Regex::new(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}").unwrap(), + "".into(), + ), + ( + Regex::new(r"(?:port|pid|PID)[=: ]+\d+").unwrap(), + "".into(), + ), + ]; + let line = "2024-01-15T12:00:00Z req=abc12345-1234-1234-1234-123456789012 addr=192.168.1.1 pid=1234"; + let n = dedup_normalize(line, &patterns); + assert!(n.contains("")); + assert!(n.contains("")); + assert!(n.contains("")); + assert!(n.contains("")); + } + + // --- is_cargo_noise --- + + #[test] + fn is_cargo_noise_detects_prefixes() { + assert!(is_cargo_noise(" Compiling foo v1.0")); + assert!(is_cargo_noise(" Finished dev profile")); + assert!(is_cargo_noise(" Checking foo v1.0")); + assert!(!is_cargo_noise("error[E0308]: mismatched types")); + assert!(!is_cargo_noise("warning: unused import")); + } + + // --- load_declarative_filters --- + + #[test] + fn embedded_defaults_parse_without_error() { + let filters = load_declarative_filters(None); + assert!( + !filters.is_empty(), + "embedded defaults should produce at least one filter" + ); + } + + #[test] + fn load_declarative_filters_from_missing_dir_uses_defaults() { + let tmp = std::path::Path::new("/tmp/zeph-test-nonexistent-99999"); + let filters = load_declarative_filters(Some(tmp)); + assert!(!filters.is_empty()); + } + + #[test] + fn load_declarative_filters_from_custom_file() { + let toml = r#" +[[rules]] +name = "custom-test" +match = { prefix = "myapp" } +strategy = { type = "strip_noise", patterns = ["^DEBUG"] } +"#; + let dir = tempfile::tempdir().unwrap(); + std::fs::write(dir.path().join("filters.toml"), toml).unwrap(); + let filters = load_declarative_filters(Some(dir.path())); + assert_eq!(filters.len(), 1); + assert_eq!(filters[0].name(), "custom-test"); + } + + #[test] + fn load_declarative_filters_skips_disabled_rules() { + let toml = r#" +[[rules]] +name = "enabled-rule" +match = { prefix = "cmd1" } +strategy = { type = "strip_noise", patterns = ["^noise"] } +enabled = true + +[[rules]] +name = "disabled-rule" +match = { prefix = "cmd2" } +strategy = { type = "strip_noise", patterns = ["^noise"] } +enabled = false +"#; + let dir = tempfile::tempdir().unwrap(); + std::fs::write(dir.path().join("filters.toml"), toml).unwrap(); + let filters = load_declarative_filters(Some(dir.path())); + assert_eq!(filters.len(), 1); + assert_eq!(filters[0].name(), "enabled-rule"); + } + + #[test] + fn compile_match_regex_over_512_chars_rejected() { + let long_pattern = "a".repeat(513); + let m = MatchConfig { + exact: None, + prefix: None, + regex: Some(long_pattern), + }; + let err = compile_match(&m).unwrap_err(); + assert!(err.contains("512"), "error should mention limit: {err}"); + } + + #[test] + fn compile_match_regex_exactly_512_chars_accepted() { + let pattern = "a".repeat(512); + let m = MatchConfig { + exact: None, + prefix: None, + regex: Some(pattern), + }; + assert!(compile_match(&m).is_ok()); + } + + #[test] + fn compile_strategy_strip_noise_pattern_over_512_chars_rejected() { + let long_pattern = "b".repeat(513); + let s = StrategyConfig::StripNoise { + patterns: vec![long_pattern], + }; + match compile_strategy(s) { + Err(e) => assert!(e.contains("512"), "error should mention limit: {e}"), + Ok(_) => panic!("expected error for oversized pattern"), + } + } + + #[test] + fn load_declarative_filters_oversized_file_uses_defaults() { + let dir = tempfile::tempdir().unwrap(); + let path = dir.path().join("filters.toml"); + let chunk = "# filler\n".repeat(120_000); + std::fs::write(&path, chunk).unwrap(); + let filters = load_declarative_filters(Some(dir.path())); + assert!(!filters.is_empty(), "should fall back to embedded defaults"); + } + + #[test] + fn load_declarative_filters_invalid_toml_returns_empty() { + let dir = tempfile::tempdir().unwrap(); + std::fs::write(dir.path().join("filters.toml"), "[[invalid toml {{{").unwrap(); + let filters = load_declarative_filters(Some(dir.path())); + assert!(filters.is_empty()); + } + + #[test] + fn load_declarative_filters_skips_invalid_regex() { + let toml = r#" +[[rules]] +name = "bad-rule" +match = { prefix = "cmd" } +strategy = { type = "strip_noise", patterns = ["[broken"] } + +[[rules]] +name = "good-rule" +match = { prefix = "cmd" } +strategy = { type = "strip_noise", patterns = ["^noise"] } +"#; + let dir = tempfile::tempdir().unwrap(); + std::fs::write(dir.path().join("filters.toml"), toml).unwrap(); + let filters = load_declarative_filters(Some(dir.path())); + assert_eq!(filters.len(), 1); + assert_eq!(filters[0].name(), "good-rule"); + } + + // --- TOML parsing round-trips --- + + #[test] + fn toml_parse_strip_noise_rule() { + let toml = r#" +[[rules]] +name = "docker-build" +match = { prefix = "docker build" } +strategy = { type = "strip_noise", patterns = ["^Step \\d+", "^\\s*$"] } +"#; + let f: DeclarativeFilterFile = toml::from_str(toml).unwrap(); + assert_eq!(f.rules.len(), 1); + assert_eq!(f.rules[0].name, "docker-build"); + assert!(f.rules[0].enabled); + assert!(matches!( + f.rules[0].strategy, + StrategyConfig::StripNoise { .. } + )); + } + + #[test] + fn toml_parse_truncate_rule() { + let toml = r#" +[[rules]] +name = "make" +match = { prefix = "make" } +strategy = { type = "truncate", max_lines = 80, head = 15, tail = 15 } +"#; + let f: DeclarativeFilterFile = toml::from_str(toml).unwrap(); + assert_eq!(f.rules.len(), 1); + if let StrategyConfig::Truncate { + max_lines, + head, + tail, + } = f.rules[0].strategy + { + assert_eq!(max_lines, 80); + assert_eq!(head, 15); + assert_eq!(tail, 15); + } else { + panic!("expected truncate strategy"); + } + } + + #[test] + fn toml_parse_truncate_default_head_tail() { + let toml = r#" +[[rules]] +name = "big-output" +match = { exact = "big" } +strategy = { type = "truncate", max_lines = 100 } +"#; + let f: DeclarativeFilterFile = toml::from_str(toml).unwrap(); + if let StrategyConfig::Truncate { head, tail, .. } = f.rules[0].strategy { + assert_eq!(head, 20); + assert_eq!(tail, 20); + } else { + panic!("expected truncate strategy"); + } + } + + #[test] + fn toml_parse_test_summary_rule() { + let toml = r#" +[[rules]] +name = "cargo-test" +match = { regex = "^cargo\\s+test" } +strategy = { type = "test_summary", max_failures = 5, truncate_stack_trace = 30 } +"#; + let f: DeclarativeFilterFile = toml::from_str(toml).unwrap(); + if let StrategyConfig::TestSummary { + max_failures, + truncate_stack_trace, + } = f.rules[0].strategy + { + assert_eq!(max_failures, 5); + assert_eq!(truncate_stack_trace, 30); + } else { + panic!("expected test_summary strategy"); + } + } + + #[test] + fn toml_parse_git_status_rule() { + let toml = r#" +[[rules]] +name = "git-status" +match = { regex = "^git\\s+status" } +strategy = { type = "git_status" } +"#; + let f: DeclarativeFilterFile = toml::from_str(toml).unwrap(); + assert!(matches!(f.rules[0].strategy, StrategyConfig::GitStatus {})); + } + + #[test] + fn toml_parse_dedup_default_patterns() { + let toml = r#" +[[rules]] +name = "log-dedup" +match = { regex = "journalctl" } +strategy = { type = "dedup" } +"#; + let f: DeclarativeFilterFile = toml::from_str(toml).unwrap(); + if let StrategyConfig::Dedup { + normalize_patterns, + max_unique_patterns, + } = &f.rules[0].strategy + { + assert_eq!(normalize_patterns.len(), 4); + assert_eq!(*max_unique_patterns, 10_000); + } else { + panic!("expected dedup strategy"); + } + } + + #[test] + fn toml_parse_empty_rules() { + let f: DeclarativeFilterFile = toml::from_str("").unwrap(); + assert!(f.rules.is_empty()); + } + + // --- Integration: register in registry and apply --- + + #[test] + fn registry_applies_declarative_filter() { + use super::super::{FilterConfig, OutputFilterRegistry}; + + let toml = r#" +[[rules]] +name = "custom-npm" +match = { prefix = "npm install" } +strategy = { type = "strip_noise", patterns = ["^npm warn", "^npm notice"] } +"#; + let dir = tempfile::tempdir().unwrap(); + std::fs::write(dir.path().join("filters.toml"), toml).unwrap(); + + let mut config = FilterConfig::default(); + config.filters_path = Some(dir.path().to_path_buf()); + + let registry = OutputFilterRegistry::default_filters(&config); + let raw = "npm warn deprecated pkg\nnpm notice created tarball\nDone installing"; + let result = registry.apply("npm install lodash", raw, 0); + assert!(result.is_some()); + let out = result.unwrap(); + assert!(!out.output.contains("npm warn")); + assert!(!out.output.contains("npm notice")); + assert!(out.output.contains("Done installing")); + } + + // --- REQ-1: HashMap::with_capacity for dedup --- + + #[test] + fn dedup_cap_respected_does_not_panic_with_large_max_unique() { + // Validates that HashMap::with_capacity(max_unique.min(4096)) doesn't OOM + let f = DeclarativeFilter { + name: "test-dedup-cap", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::Dedup { + normalize_patterns: vec![], + max_unique_patterns: usize::MAX, + }, + }; + let raw = "line a\nline b\nline c\nline d"; + let result = f.filter("cmd", raw, 0); + // All unique → fallback + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- REQ-2: reject unescaped $ in Dedup replacement --- + + #[test] + fn compile_dedup_rejects_dollar_replacement() { + let s = StrategyConfig::Dedup { + normalize_patterns: vec![NormalizeEntry { + pattern: r"\d+".into(), + replacement: "$1".into(), + }], + max_unique_patterns: 100, + }; + match compile_strategy(s) { + Err(e) => assert!(e.contains("unescaped '$'"), "got: {e}"), + Ok(_) => panic!("expected error for unescaped '$' in replacement"), + } + } + + #[test] + fn compile_dedup_rejects_dollar_brace_replacement() { + let s = StrategyConfig::Dedup { + normalize_patterns: vec![NormalizeEntry { + pattern: r"\w+".into(), + replacement: "${name}".into(), + }], + max_unique_patterns: 100, + }; + assert!(compile_strategy(s).is_err()); + } + + #[test] + fn compile_dedup_accepts_plain_text_replacement() { + let s = StrategyConfig::Dedup { + normalize_patterns: vec![NormalizeEntry { + pattern: r"\d{4}-\d{2}-\d{2}".into(), + replacement: "".into(), + }], + max_unique_patterns: 100, + }; + assert!(compile_strategy(s).is_ok()); + } + + // --- REQ-3: empty patterns rejected for strip_noise, keep_matching, strip_annotated --- + + #[test] + fn compile_strip_noise_empty_patterns_rejected() { + let s = StrategyConfig::StripNoise { patterns: vec![] }; + assert!(compile_strategy(s).is_err()); + } + + #[test] + fn compile_keep_matching_empty_patterns_rejected() { + let s = StrategyConfig::KeepMatching { patterns: vec![] }; + assert!(compile_strategy(s).is_err()); + } + + #[test] + fn compile_strip_annotated_empty_patterns_rejected() { + let s = StrategyConfig::StripAnnotated { + patterns: vec![], + summary_pattern: None, + long_output_threshold: 30, + keep_head: 10, + keep_tail: 5, + }; + assert!(compile_strategy(s).is_err()); + } + + // --- ADV-2: no panic when head+tail > remaining non-noise lines --- + + #[test] + fn strip_annotated_no_panic_when_head_tail_exceeds_kept() { + // keep_head=10, keep_tail=5, but only 3 non-noise lines remain after filtering + // long_output_threshold=2 so truncation path is triggered + let f = DeclarativeFilter { + name: "test-adv2", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::StripAnnotated { + patterns: vec![Regex::new(r"^NOISE").unwrap()], + summary_pattern: None, + long_output_threshold: 2, + keep_head: 10, + keep_tail: 5, + }, + }; + // 10 noise lines + 3 kept → kept.len()=3 < long_output_threshold=2? No, 3>2, so truncation + let mut raw = String::new(); + for i in 0..10 { + raw.push_str(&format!("NOISE line {i}\n")); + } + raw.push_str("kept 1\nkept 2\nkept 3\n"); + // Must not panic + let result = f.filter("cmd", &raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + } + + #[test] + fn strip_annotated_no_panic_single_kept_line_large_head_tail() { + let f = DeclarativeFilter { + name: "test-adv2-single", + matcher: CommandMatcher::Prefix("cmd"), + strategy: CompiledStrategy::StripAnnotated { + patterns: vec![Regex::new(r"^NOISE").unwrap()], + summary_pattern: None, + long_output_threshold: 0, + keep_head: 20, + keep_tail: 20, + }, + }; + let raw = "NOISE a\nNOISE b\nNOISE c\nonly kept line\n"; + let result = f.filter("cmd", &raw, 0); + assert_eq!(result.confidence, FilterConfidence::Full); + assert!(result.output.contains("only kept line")); + } + + // --- edge cases --- + + #[test] + fn strip_noise_empty_input_returns_fallback() { + let f = strip_noise_filter(&[r"^noise"]); + let result = f.filter("cmd", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn truncate_empty_input_returns_fallback() { + let f = truncate_filter(10, 3, 3); + let result = f.filter("cmd", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- snapshot tests (migrated from deleted modules) --- + + #[test] + fn cargo_build_filter_snapshot() { + let f = strip_annotated_filter( + &[ + r"^\s*Compiling ", + r"^\s*Downloading ", + r"^\s*Downloaded ", + r"^\s*Updating ", + r"^\s*Fetching ", + r"^\s*Fresh ", + r"^\s*Packaging ", + r"^\s*Verifying ", + r"^\s*Archiving ", + r"^\s*Locking ", + r"^\s*Adding ", + r"^\s*Removing ", + r"^\s*Checking ", + r"^\s*Documenting ", + r"^\s*Running ", + r"^\s*Loaded ", + r"^\s*Blocking ", + r"^\s*Unpacking ", + ], + Some(r"^\s*Finished "), + ); + let raw = "\ + Compiling zeph-core v0.11.0 + Compiling zeph-tools v0.11.0 + Compiling zeph-llm v0.11.0 +warning: unused import: `std::fmt` + --> crates/zeph-core/src/lib.rs:3:5 + | +3 | use std::fmt; + | ^^^^^^^^ + = note: `#[warn(unused_imports)]` on by default + Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.23s"; + let result = f.filter("cargo build", raw, 0); + insta::assert_snapshot!(result.output); + } + + #[test] + fn cargo_build_error_snapshot() { + let f = strip_annotated_filter( + &[ + r"^\s*Compiling ", + r"^\s*Downloading ", + r"^\s*Downloaded ", + r"^\s*Updating ", + r"^\s*Fetching ", + r"^\s*Fresh ", + r"^\s*Packaging ", + r"^\s*Verifying ", + r"^\s*Archiving ", + r"^\s*Locking ", + r"^\s*Adding ", + r"^\s*Removing ", + r"^\s*Checking ", + r"^\s*Documenting ", + r"^\s*Running ", + r"^\s*Loaded ", + r"^\s*Blocking ", + r"^\s*Unpacking ", + ], + Some(r"^\s*Finished "), + ); + let raw = "\ + Compiling zeph-core v0.11.0 +error[E0308]: mismatched types + --> crates/zeph-core/src/lib.rs:10:5 + | +10 | return 42; + | ^^ expected `()`, found integer +error: could not compile `zeph-core` due to 1 previous error"; + let result = f.filter("cargo build", raw, 1); + insta::assert_snapshot!(result.output); + } + + #[test] + fn clippy_grouped_warnings_snapshot() { + let f = group_by_rule_filter(r"^\s*-->\s*(.+:\d+)", r"#\[warn\(([^)]+)\)\]"); + let raw = "\ +warning: needless pass by value + --> src/foo.rs:12:5 + | + = help: use a reference instead + = note: `#[warn(clippy::needless_pass_by_value)]` on by default + +warning: needless pass by value + --> src/bar.rs:45:10 + | + = help: use a reference instead + = note: `#[warn(clippy::needless_pass_by_value)]` on by default + +warning: unused import + --> src/main.rs:5:1 + | + = note: `#[warn(clippy::unused_imports)]` on by default + +warning: `my-crate` (lib) generated 3 warnings +"; + let result = f.filter("cargo clippy", raw, 0); + insta::assert_snapshot!(result.output); + } + + #[test] + fn filter_diff_snapshot() { + let f = git_diff_filter(500); + let raw = "\ +diff --git a/src/main.rs b/src/main.rs +index abc..def 100644 +--- a/src/main.rs ++++ b/src/main.rs ++new line 1 +-old line 1 +diff --git a/src/lib.rs b/src/lib.rs +index ghi..jkl 100644 +--- a/src/lib.rs ++++ b/src/lib.rs ++added line +"; + let result = f.filter("git diff", raw, 0); + insta::assert_snapshot!(result.output); + } + + #[test] + fn filter_status_snapshot() { + let f = git_status_filter(); + let raw = " M src/main.rs\n M src/lib.rs\n?? new_file.txt\nA added.rs\n"; + let result = f.filter("git status --short", raw, 0); + insta::assert_snapshot!(result.output); + } + + // --- empty input edge cases --- + + #[test] + fn keep_matching_empty_input_returns_fallback() { + let f = keep_matching_filter(&[r"->"]); + let result = f.filter("cmd", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn strip_annotated_empty_input_returns_fallback() { + let f = strip_annotated_filter(&[r"^\s*Compiling "], None); + let result = f.filter("cargo build", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn test_summary_empty_input_returns_fallback() { + let f = test_summary_filter(); + let result = f.filter("cargo test", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + #[test] + fn group_by_rule_empty_input_returns_fallback() { + let f = group_by_rule_filter(r"^\s*-->\s*(.+:\d+)", r"#\[warn\(([^)]+)\)\]"); + let result = f.filter("cargo clippy", "", 0); + assert_eq!(result.confidence, FilterConfidence::Fallback); + } + + // --- compound command matching --- + + #[test] + fn compound_command_prefix_matches_last_segment() { + // "cd /path && cargo test" extracts "cargo test" as last segment, + // so a Prefix("cargo test") filter should apply to compound commands. + let f = DeclarativeFilter { + name: "test-compound", + matcher: CommandMatcher::Prefix("cargo test"), + strategy: CompiledStrategy::StripNoise { + patterns: vec![Regex::new(r"^NOISE").unwrap()], + }, + }; + assert!(f.matcher().matches("cd /path && cargo test")); + assert!(f.matcher().matches("cargo test --lib")); + assert!(!f.matcher().matches("cd /path && npm test")); + } + + #[test] + fn compound_command_regex_match() { + // A regex matcher can be written to match compound commands. + let m = MatchConfig { + exact: None, + prefix: None, + regex: Some(r"cargo\s+test".into()), + }; + let matcher = compile_match(&m).unwrap(); + assert!(matcher.matches("cd /workspace && cargo test --lib")); + assert!(matcher.matches("cargo test --workspace")); + } + + // --- test_summary snapshot --- + + #[test] + fn test_summary_failures_snapshot() { + let f = test_summary_filter(); + let raw = "\ +running 3 tests +test foo::test_a ... ok +test foo::test_b ... FAILED +test foo::test_c ... ok + +---- foo::test_b stdout ---- +thread 'foo::test_b' panicked at 'assertion `left == right` failed + left: 1 + right: 2', src/foo.rs:42:9 + +failures: + foo::test_b + +test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 filtered out; finished in 0.02s +"; + let result = f.filter("cargo test", raw, 1); + insta::assert_snapshot!(result.output); + } + + use proptest::prelude::*; + + proptest! { + #[test] + fn declarative_filter_never_panics_strip_noise( + input in ".*", + cmd in ".*", + exit_code in -1i32..=255, + ) { + let f = strip_noise_filter(&[r"^noise", r"^\s*$"]); + let _ = f.filter(&cmd, &input, exit_code); + } + + #[test] + fn declarative_filter_never_panics_truncate( + input in ".*", + cmd in ".*", + exit_code in -1i32..=255, + ) { + let f = truncate_filter(10, 3, 3); + let _ = f.filter(&cmd, &input, exit_code); + } + + #[test] + fn declarative_filter_never_panics_test_summary( + input in ".*", + cmd in ".*", + exit_code in -1i32..=255, + ) { + let f = test_summary_filter(); + let _ = f.filter(&cmd, &input, exit_code); + } + + #[test] + fn declarative_filter_never_panics_dedup( + input in ".*", + cmd in ".*", + exit_code in -1i32..=255, + ) { + let f = dedup_filter(); + let _ = f.filter(&cmd, &input, exit_code); + } + } +} diff --git a/crates/zeph-tools/src/filter/default-filters.toml b/crates/zeph-tools/src/filter/default-filters.toml new file mode 100644 index 00000000..9bf9f044 --- /dev/null +++ b/crates/zeph-tools/src/filter/default-filters.toml @@ -0,0 +1,164 @@ +# Default declarative filter rules (embedded fallback). +# User can override by placing filters.toml next to config.toml. + +# --- Cargo ecosystem --- + +[[rules]] +name = "cargo-build" +match = { regex = "^cargo\\s+(?!test\\b|nextest\\b|clippy\\b)" } +strategy = { type = "strip_annotated", patterns = [ + "^\\s*Compiling ", + "^\\s*Downloading ", + "^\\s*Downloaded ", + "^\\s*Updating ", + "^\\s*Fetching ", + "^\\s*Fresh ", + "^\\s*Packaging ", + "^\\s*Verifying ", + "^\\s*Archiving ", + "^\\s*Locking ", + "^\\s*Adding ", + "^\\s*Removing ", + "^\\s*Checking ", + "^\\s*Documenting ", + "^\\s*Running ", + "^\\s*Loaded ", + "^\\s*Blocking ", + "^\\s*Unpacking ", +], summary_pattern = "^\\s*Finished " } + +[[rules]] +name = "cargo-test" +match = { regex = "^cargo\\s+(\\+\\S+\\s+)?(test|nextest)" } +strategy = { type = "test_summary", max_failures = 10, truncate_stack_trace = 50 } + +[[rules]] +name = "cargo-clippy" +match = { regex = "^cargo\\s+(\\+\\S+\\s+)?clippy" } +strategy = { type = "group_by_rule", location_pattern = "^\\s*-->\\s*(.+:\\d+)", rule_pattern = "#\\[warn\\(([^)]+)\\)\\]" } + +# --- Git --- + +[[rules]] +name = "git-status" +match = { regex = "^git\\s+status" } +strategy = { type = "git_status" } + +[[rules]] +name = "git-diff" +match = { regex = "^git\\s+diff" } +strategy = { type = "git_diff", max_diff_lines = 500 } + +[[rules]] +name = "git-log" +match = { regex = "^git\\s+log" } +strategy = { type = "truncate", max_lines = 20, head = 20, tail = 0 } + +[[rules]] +name = "git-push" +match = { regex = "^git\\s+push" } +strategy = { type = "keep_matching", patterns = ["->", "^To ", "^Branch"] } + +# --- Directory listing --- + +[[rules]] +name = "ls" +match = { regex = "^ls(\\s|$)" } +strategy = { type = "strip_annotated", patterns = [ + "node_modules", + "^target$", + "^\\.git$", + "__pycache__", + "^\\.venv$", + "^venv$", + "^dist$", + "^build$", + "^\\.next$", + "^\\.cache$", +] } + +# --- Log deduplication --- + +[[rules]] +name = "log-dedup" +match = { regex = "(journalctl|tail -f|docker logs|cat .+\\.log)" } +strategy = { type = "dedup" } + +# --- External tools --- + +[[rules]] +name = "docker-build" +match = { prefix = "docker build" } +strategy = { type = "strip_noise", patterns = [ + "^Step \\d+/\\d+ : ", + "^ ---> [a-f0-9]+$", + "^Removing intermediate container", + "^\\s*$", +] } + +[[rules]] +name = "docker-compose" +match = { prefix = "docker compose" } +strategy = { type = "strip_noise", patterns = [ + "^\\s*Network\\s+\\S+\\s+Created$", + "^\\s*Container\\s+\\S+\\s+(Creating|Created|Starting|Started)$", +] } + +[[rules]] +name = "npm-install" +match = { regex = "^(npm|yarn|pnpm)\\s+(install|ci|add)" } +strategy = { type = "strip_noise", patterns = [ + "^npm warn", + "^npm notice", + "^added \\d+ packages", + "^up to date", + "^\\s*$", +] } + +[[rules]] +name = "pip-install" +match = { regex = "^pip3?\\s+install" } +strategy = { type = "strip_noise", patterns = [ + "^\\s*Downloading\\s", + "^\\s*Installing collected", + "^\\s*Using cached", + "^\\s*Collecting\\s", + "^\\s*$", +] } + +[[rules]] +name = "make" +match = { prefix = "make" } +strategy = { type = "truncate", max_lines = 80, head = 15, tail = 15 } + +[[rules]] +name = "pytest" +match = { regex = "^(pytest|python -m pytest)" } +strategy = { type = "truncate", max_lines = 100, head = 20, tail = 30 } + +[[rules]] +name = "go-test" +match = { regex = "^go\\s+test" } +strategy = { type = "truncate", max_lines = 80, head = 15, tail = 20 } + +[[rules]] +name = "terraform-plan" +match = { regex = "^terraform\\s+(plan|apply)" } +strategy = { type = "truncate", max_lines = 60, head = 10, tail = 15 } + +[[rules]] +name = "kubectl-get" +match = { regex = "^kubectl\\s+(get|describe)" } +strategy = { type = "truncate", max_lines = 50, head = 10, tail = 10 } + +[[rules]] +name = "brew-install" +match = { regex = "^brew\\s+(install|upgrade)" } +strategy = { type = "strip_noise", patterns = [ + "^==> Downloading", + "^==> Fetching", + "^==> Installing", + "^==> Pouring", + "^Already downloaded", + "^\\s*$", +] } diff --git a/crates/zeph-tools/src/filter/dir_listing.rs b/crates/zeph-tools/src/filter/dir_listing.rs deleted file mode 100644 index a981d2f5..00000000 --- a/crates/zeph-tools/src/filter/dir_listing.rs +++ /dev/null @@ -1,135 +0,0 @@ -use std::fmt::Write; -use std::sync::LazyLock; - -use super::{ - CommandMatcher, DirListingFilterConfig, FilterConfidence, FilterResult, OutputFilter, - make_result, -}; - -const NOISE_DIRS: &[&str] = &[ - "node_modules", - "target", - ".git", - "__pycache__", - ".venv", - "venv", - "dist", - "build", - ".next", - ".cache", -]; - -static DIR_LISTING_MATCHER: LazyLock = LazyLock::new(|| { - CommandMatcher::Custom(Box::new(|cmd| { - let c = cmd.trim_start(); - c == "ls" || c.starts_with("ls ") - })) -}); - -pub struct DirListingFilter; - -impl DirListingFilter { - #[must_use] - pub fn new(_config: DirListingFilterConfig) -> Self { - Self - } -} - -impl OutputFilter for DirListingFilter { - fn name(&self) -> &'static str { - "dir_listing" - } - - fn matcher(&self) -> &CommandMatcher { - &DIR_LISTING_MATCHER - } - - fn filter(&self, _command: &str, raw_output: &str, _exit_code: i32) -> FilterResult { - let mut kept = Vec::new(); - let mut hidden: Vec<&str> = Vec::new(); - - for line in raw_output.lines() { - let entry = line.split_whitespace().last().unwrap_or(line); - let name = entry.trim_end_matches('/'); - - if NOISE_DIRS.contains(&name) { - hidden.push(name); - } else { - kept.push(line); - } - } - - if hidden.is_empty() { - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - let mut output = kept.join("\n"); - let names = hidden.join(", "); - let _ = write!(output, "\n(+ {} hidden: {names})", hidden.len()); - - make_result(raw_output, output, FilterConfidence::Full) - } -} - -#[cfg(test)] -mod tests { - use super::*; - - fn make_filter() -> DirListingFilter { - DirListingFilter::new(DirListingFilterConfig::default()) - } - - #[test] - fn matches_ls() { - let f = make_filter(); - assert!(f.matcher().matches("ls")); - assert!(f.matcher().matches("ls -la")); - assert!(f.matcher().matches("ls /tmp")); - assert!(!f.matcher().matches("lsof")); - assert!(!f.matcher().matches("cargo build")); - } - - #[test] - fn filter_hides_noise_dirs() { - let f = make_filter(); - let raw = "Cargo.toml\nsrc\ntarget\nnode_modules\nREADME.md\n.git"; - let result = f.filter("ls", raw, 0); - assert!(result.output.contains("Cargo.toml")); - assert!(result.output.contains("src")); - assert!(result.output.contains("README.md")); - assert!(!result.output.contains("\ntarget\n")); - assert!( - result - .output - .contains("(+ 3 hidden: target, node_modules, .git)") - ); - assert_eq!(result.confidence, FilterConfidence::Full); - } - - #[test] - fn filter_no_noise_passthrough() { - let f = make_filter(); - let raw = "Cargo.toml\nsrc\nREADME.md"; - let result = f.filter("ls", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn filter_ls_la_format() { - let f = make_filter(); - let raw = "\ -drwxr-xr-x 5 user staff 160 Jan 1 12:00 src -drwxr-xr-x 20 user staff 640 Jan 1 12:00 node_modules --rw-r--r-- 1 user staff 200 Jan 1 12:00 Cargo.toml -drwxr-xr-x 8 user staff 256 Jan 1 12:00 target"; - let result = f.filter("ls -la", raw, 0); - assert!(result.output.contains("src")); - assert!(result.output.contains("Cargo.toml")); - assert!(result.output.contains("(+ 2 hidden: node_modules, target)")); - } -} diff --git a/crates/zeph-tools/src/filter/git.rs b/crates/zeph-tools/src/filter/git.rs deleted file mode 100644 index 446e83c2..00000000 --- a/crates/zeph-tools/src/filter/git.rs +++ /dev/null @@ -1,332 +0,0 @@ -use std::fmt::Write; -use std::sync::LazyLock; - -use super::{ - CommandMatcher, FilterConfidence, FilterResult, GitFilterConfig, OutputFilter, make_result, -}; - -static GIT_MATCHER: LazyLock = - LazyLock::new(|| CommandMatcher::Custom(Box::new(|cmd| cmd.trim_start().starts_with("git ")))); - -pub struct GitFilter { - config: GitFilterConfig, -} - -impl GitFilter { - #[must_use] - pub fn new(config: GitFilterConfig) -> Self { - Self { config } - } -} - -impl OutputFilter for GitFilter { - fn name(&self) -> &'static str { - "git" - } - - fn matcher(&self) -> &CommandMatcher { - &GIT_MATCHER - } - - fn filter(&self, command: &str, raw_output: &str, _exit_code: i32) -> FilterResult { - let subcmd = command - .trim_start() - .strip_prefix("git ") - .unwrap_or("") - .split_whitespace() - .next() - .unwrap_or(""); - - match subcmd { - "status" => filter_status(raw_output), - "diff" => filter_diff(raw_output, self.config.max_diff_lines), - "log" => filter_log(raw_output, self.config.max_log_entries), - "push" => filter_push(raw_output), - _ => make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ), - } - } -} - -fn filter_status(raw: &str) -> FilterResult { - let mut modified = 0u32; - let mut added = 0u32; - let mut deleted = 0u32; - let mut untracked = 0u32; - - for line in raw.lines() { - let trimmed = line.trim(); - if trimmed.starts_with("M ") || trimmed.starts_with("MM") || trimmed.starts_with(" M") { - modified += 1; - } else if trimmed.starts_with("A ") || trimmed.starts_with("AM") { - added += 1; - } else if trimmed.starts_with("D ") || trimmed.starts_with(" D") { - deleted += 1; - } else if trimmed.starts_with("??") { - untracked += 1; - } else if trimmed.starts_with("modified:") { - modified += 1; - } else if trimmed.starts_with("new file:") { - added += 1; - } else if trimmed.starts_with("deleted:") { - deleted += 1; - } - } - - let total = modified + added + deleted + untracked; - if total == 0 { - return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); - } - - let mut output = String::new(); - let _ = write!( - output, - "M {modified} files | A {added} files | D {deleted} files | ?? {untracked} files" - ); - make_result(raw, output, FilterConfidence::Full) -} - -fn filter_diff(raw: &str, max_diff_lines: usize) -> FilterResult { - let mut files: Vec<(String, i32, i32)> = Vec::new(); - let mut current_file = String::new(); - let mut additions = 0i32; - let mut deletions = 0i32; - - for line in raw.lines() { - if line.starts_with("diff --git ") { - if !current_file.is_empty() { - files.push((current_file.clone(), additions, deletions)); - } - line.strip_prefix("diff --git a/") - .and_then(|s| s.split(" b/").next()) - .unwrap_or("unknown") - .clone_into(&mut current_file); - additions = 0; - deletions = 0; - } else if line.starts_with('+') && !line.starts_with("+++") { - additions += 1; - } else if line.starts_with('-') && !line.starts_with("---") { - deletions += 1; - } - } - if !current_file.is_empty() { - files.push((current_file, additions, deletions)); - } - - if files.is_empty() { - return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); - } - - let total_lines: usize = raw.lines().count(); - let total_add: i32 = files.iter().map(|(_, a, _)| a).sum(); - let total_del: i32 = files.iter().map(|(_, _, d)| d).sum(); - let mut output = String::new(); - for (file, add, del) in &files { - let _ = writeln!(output, "{file} | +{add} -{del}"); - } - let _ = write!( - output, - "{} files changed, {} insertions(+), {} deletions(-)", - files.len(), - total_add, - total_del - ); - if total_lines > max_diff_lines { - let _ = write!(output, " (truncated from {total_lines} lines)"); - } - make_result(raw, output, FilterConfidence::Full) -} - -fn filter_log(raw: &str, max_entries: usize) -> FilterResult { - let lines: Vec<&str> = raw.lines().collect(); - if lines.len() <= max_entries { - return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); - } - - let mut output: String = lines[..max_entries].join("\n"); - let remaining = lines.len() - max_entries; - let _ = write!(output, "\n... and {remaining} more commits"); - make_result(raw, output, FilterConfidence::Full) -} - -fn filter_push(raw: &str) -> FilterResult { - let mut output = String::new(); - for line in raw.lines() { - let trimmed = line.trim(); - if trimmed.contains("->") || trimmed.starts_with("To ") || trimmed.starts_with("Branch") { - if !output.is_empty() { - output.push('\n'); - } - output.push_str(trimmed); - } - } - if output.is_empty() { - return make_result(raw, raw.to_owned(), FilterConfidence::Fallback); - } - make_result(raw, output, FilterConfidence::Full) -} - -#[cfg(test)] -mod tests { - use super::*; - - fn make_filter() -> GitFilter { - GitFilter::new(GitFilterConfig::default()) - } - - #[test] - fn matches_git_commands() { - let f = make_filter(); - assert!(f.matcher().matches("git status")); - assert!(f.matcher().matches("git diff --stat")); - assert!(f.matcher().matches("git log --oneline")); - assert!(f.matcher().matches("git push origin main")); - assert!(!f.matcher().matches("cargo build")); - assert!(!f.matcher().matches("github-cli")); - } - - #[test] - fn filter_status_summarizes() { - let f = make_filter(); - let raw = " M src/main.rs\n M src/lib.rs\n?? new_file.txt\nA added.rs\n"; - let result = f.filter("git status --short", raw, 0); - assert!(result.output.contains("M 2 files")); - assert!(result.output.contains("?? 1 files")); - assert!(result.output.contains("A 1 files")); - assert_eq!(result.confidence, FilterConfidence::Full); - } - - #[test] - fn filter_diff_compresses() { - let f = make_filter(); - let raw = "\ -diff --git a/src/main.rs b/src/main.rs -index abc..def 100644 ---- a/src/main.rs -+++ b/src/main.rs -+new line 1 -+new line 2 --old line 1 -diff --git a/src/lib.rs b/src/lib.rs -index ghi..jkl 100644 ---- a/src/lib.rs -+++ b/src/lib.rs -+added -"; - let result = f.filter("git diff", raw, 0); - assert!(result.output.contains("src/main.rs")); - assert!(result.output.contains("src/lib.rs")); - assert!(result.output.contains("2 files changed")); - assert!(result.output.contains("3 insertions(+)")); - assert!(result.output.contains("1 deletions(-)")); - } - - #[test] - fn filter_log_truncates() { - let f = make_filter(); - let lines: Vec = (0..50) - .map(|i| format!("abc{i:04} feat: commit {i}")) - .collect(); - let raw = lines.join("\n"); - let result = f.filter("git log --oneline", &raw, 0); - assert!(result.output.contains("abc0000")); - assert!(result.output.contains("abc0019")); - assert!(!result.output.contains("abc0020")); - assert!(result.output.contains("and 30 more commits")); - assert_eq!(result.confidence, FilterConfidence::Full); - } - - #[test] - fn filter_log_short_passthrough() { - let f = make_filter(); - let raw = "abc1234 feat: something\ndef5678 fix: other"; - let result = f.filter("git log --oneline", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn filter_push_extracts_summary() { - let f = make_filter(); - let raw = "\ -Enumerating objects: 5, done. -Counting objects: 100% (5/5), done. -Delta compression using up to 10 threads -Compressing objects: 100% (3/3), done. -Writing objects: 100% (3/3), 1.20 KiB | 1.20 MiB/s, done. -Total 3 (delta 2), reused 0 (delta 0) -To github.com:user/repo.git - abc1234..def5678 main -> main -"; - let result = f.filter("git push origin main", raw, 0); - assert!(result.output.contains("main -> main")); - assert!(result.output.contains("To github.com")); - assert!(!result.output.contains("Enumerating")); - } - - #[test] - fn filter_status_long_form() { - let f = make_filter(); - let raw = "\ -On branch main -Changes not staged for commit: - modified: src/main.rs - modified: src/lib.rs - deleted: old_file.rs - -Untracked files: - new_file.txt -"; - let result = f.filter("git status", raw, 0); - assert!(result.output.contains("M 2 files")); - assert!(result.output.contains("D 1 files")); - } - - #[test] - fn filter_diff_empty_passthrough() { - let f = make_filter(); - let raw = ""; - let result = f.filter("git diff", raw, 0); - assert_eq!(result.output, raw); - } - - #[test] - fn filter_unknown_subcommand_passthrough() { - let f = make_filter(); - let raw = "some output"; - let result = f.filter("git stash list", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn filter_diff_snapshot() { - let f = make_filter(); - let raw = "\ -diff --git a/src/main.rs b/src/main.rs -index abc..def 100644 ---- a/src/main.rs -+++ b/src/main.rs -+new line 1 --old line 1 -diff --git a/src/lib.rs b/src/lib.rs -index ghi..jkl 100644 ---- a/src/lib.rs -+++ b/src/lib.rs -+added line -"; - let result = f.filter("git diff", raw, 0); - insta::assert_snapshot!(result.output); - } - - #[test] - fn filter_status_snapshot() { - let f = make_filter(); - let raw = " M src/main.rs\n M src/lib.rs\n?? new_file.txt\nA added.rs\n"; - let result = f.filter("git status --short", raw, 0); - insta::assert_snapshot!(result.output); - } -} diff --git a/crates/zeph-tools/src/filter/log_dedup.rs b/crates/zeph-tools/src/filter/log_dedup.rs deleted file mode 100644 index 14cd9de3..00000000 --- a/crates/zeph-tools/src/filter/log_dedup.rs +++ /dev/null @@ -1,179 +0,0 @@ -use std::collections::HashMap; -use std::fmt::Write; -use std::sync::LazyLock; - -use regex::Regex; - -use super::{ - CommandMatcher, FilterConfidence, FilterResult, LogDedupFilterConfig, OutputFilter, make_result, -}; - -const MAX_UNIQUE_PATTERNS: usize = 10_000; - -static LOG_DEDUP_MATCHER: LazyLock = LazyLock::new(|| { - CommandMatcher::Custom(Box::new(|cmd| { - let c = cmd.to_lowercase(); - c.contains("journalctl") - || c.contains("tail -f") - || c.contains("docker logs") - || (c.contains("cat ") && c.contains(".log")) - })) -}); - -static TIMESTAMP_RE: LazyLock = LazyLock::new(|| { - Regex::new(r"\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}([.\d]*)?([Z+-][\d:]*)?").unwrap() -}); -static UUID_RE: LazyLock = LazyLock::new(|| { - Regex::new(r"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}").unwrap() -}); -static IP_RE: LazyLock = - LazyLock::new(|| Regex::new(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}").unwrap()); -static PORT_PID_RE: LazyLock = - LazyLock::new(|| Regex::new(r"(?:port|pid|PID)[=: ]+\d+").unwrap()); - -pub struct LogDedupFilter; - -impl LogDedupFilter { - #[must_use] - pub fn new(_config: LogDedupFilterConfig) -> Self { - Self - } -} - -impl OutputFilter for LogDedupFilter { - fn name(&self) -> &'static str { - "log_dedup" - } - - fn matcher(&self) -> &CommandMatcher { - &LOG_DEDUP_MATCHER - } - - fn filter(&self, _command: &str, raw_output: &str, _exit_code: i32) -> FilterResult { - let lines: Vec<&str> = raw_output.lines().collect(); - if lines.len() < 3 { - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - let mut pattern_counts: HashMap = HashMap::new(); - let mut order: Vec = Vec::new(); - - let mut capped = false; - for line in &lines { - let normalized = normalize(line); - if let Some(entry) = pattern_counts.get_mut(&normalized) { - entry.0 += 1; - } else if pattern_counts.len() < MAX_UNIQUE_PATTERNS { - order.push(normalized.clone()); - pattern_counts.insert(normalized, (1, (*line).to_owned())); - } else { - capped = true; - } - } - - let unique = order.len(); - let total = lines.len(); - - if unique == total && !capped { - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - let mut output = String::new(); - for key in &order { - let (count, example) = &pattern_counts[key]; - if *count > 1 { - let _ = writeln!(output, "{example} (x{count})"); - } else { - let _ = writeln!(output, "{example}"); - } - } - let _ = write!(output, "{unique} unique patterns ({total} total lines)"); - if capped { - let _ = write!(output, " (capped at {MAX_UNIQUE_PATTERNS})"); - } - - make_result(raw_output, output, FilterConfidence::Full) - } -} - -fn normalize(line: &str) -> String { - let s = TIMESTAMP_RE.replace_all(line, ""); - let s = UUID_RE.replace_all(&s, ""); - let s = IP_RE.replace_all(&s, ""); - PORT_PID_RE.replace_all(&s, "").to_string() -} - -#[cfg(test)] -mod tests { - use super::*; - - fn make_filter() -> LogDedupFilter { - LogDedupFilter::new(LogDedupFilterConfig::default()) - } - - #[test] - fn matches_log_commands() { - let f = make_filter(); - assert!(f.matcher().matches("journalctl -u nginx")); - assert!(f.matcher().matches("tail -f /var/log/syslog")); - assert!(f.matcher().matches("docker logs -f container")); - assert!(f.matcher().matches("cat /var/log/app.log")); - assert!(!f.matcher().matches("cat file.txt")); - assert!(!f.matcher().matches("cargo build")); - } - - #[test] - fn filter_deduplicates() { - let f = make_filter(); - let raw = "\ -2024-01-15T12:00:01Z INFO request handled path=/api/health -2024-01-15T12:00:02Z INFO request handled path=/api/health -2024-01-15T12:00:03Z INFO request handled path=/api/health -2024-01-15T12:00:04Z WARN connection timeout addr=10.0.0.1 -2024-01-15T12:00:05Z WARN connection timeout addr=10.0.0.2 -2024-01-15T12:00:06Z ERROR database unreachable -"; - let result = f.filter("journalctl -u app", raw, 0); - assert!(result.output.contains("(x3)")); - assert!(result.output.contains("(x2)")); - assert!(result.output.contains("3 unique patterns (6 total lines)")); - assert!(result.savings_pct() > 20.0); - assert_eq!(result.confidence, FilterConfidence::Full); - } - - #[test] - fn filter_all_unique_passthrough() { - let f = make_filter(); - let raw = "line one\nline two\nline three"; - let result = f.filter("cat app.log", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn filter_short_passthrough() { - let f = make_filter(); - let raw = "single line"; - let result = f.filter("cat app.log", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } - - #[test] - fn normalize_replaces_patterns() { - let line = "2024-01-15T12:00:00Z req=abc12345-1234-1234-1234-123456789012 addr=192.168.1.1 pid=1234"; - let n = normalize(line); - assert!(n.contains("")); - assert!(n.contains("")); - assert!(n.contains("")); - assert!(n.contains("")); - } -} diff --git a/crates/zeph-tools/src/filter/mod.rs b/crates/zeph-tools/src/filter/mod.rs index cacd4c5d..2fb0db59 100644 --- a/crates/zeph-tools/src/filter/mod.rs +++ b/crates/zeph-tools/src/filter/mod.rs @@ -1,25 +1,14 @@ //! Command-aware output filtering pipeline. -pub(crate) mod cargo_build; -mod clippy; -mod dir_listing; -mod git; -mod log_dedup; +pub(crate) mod declarative; pub mod security; -mod test_output; +use std::path::PathBuf; use std::sync::{LazyLock, Mutex}; use regex::Regex; use serde::{Deserialize, Serialize}; -pub use self::cargo_build::CargoBuildFilter; -pub use self::clippy::ClippyFilter; -pub use self::dir_listing::DirListingFilter; -pub use self::git::GitFilter; -pub use self::log_dedup::LogDedupFilter; -pub use self::test_output::TestOutputFilter; - // --------------------------------------------------------------------------- // FilterConfidence (#440) // --------------------------------------------------------------------------- @@ -64,6 +53,7 @@ pub enum CommandMatcher { Exact(&'static str), Prefix(&'static str), Regex(regex::Regex), + #[cfg(test)] Custom(Box bool + Send + Sync>), } @@ -79,6 +69,7 @@ impl CommandMatcher { Self::Exact(s) => command == *s, Self::Prefix(s) => command.starts_with(s), Self::Regex(re) => re.is_match(command), + #[cfg(test)] Self::Custom(f) => f(command), } } @@ -113,6 +104,7 @@ impl std::fmt::Debug for CommandMatcher { Self::Exact(s) => write!(f, "Exact({s:?})"), Self::Prefix(s) => write!(f, "Prefix({s:?})"), Self::Regex(re) => write!(f, "Regex({:?})", re.as_str()), + #[cfg(test)] Self::Custom(_) => write!(f, "Custom(...)"), } } @@ -248,157 +240,35 @@ impl Default for FilterMetrics { // FilterConfig (#444) // --------------------------------------------------------------------------- -fn default_true() -> bool { +pub(crate) fn default_true() -> bool { true } -fn default_max_failures() -> usize { - 10 -} - -fn default_stack_trace_lines() -> usize { - 50 -} - -fn default_max_log_entries() -> usize { - 20 -} - -fn default_max_diff_lines() -> usize { - 500 -} - /// Configuration for output filters. #[derive(Debug, Clone, Deserialize, Serialize)] pub struct FilterConfig { #[serde(default = "default_true")] pub enabled: bool, - #[serde(default)] - pub test: TestFilterConfig, - - #[serde(default)] - pub git: GitFilterConfig, - - #[serde(default)] - pub clippy: ClippyFilterConfig, - - #[serde(default)] - pub cargo_build: CargoBuildFilterConfig, - - #[serde(default)] - pub dir_listing: DirListingFilterConfig, - - #[serde(default)] - pub log_dedup: LogDedupFilterConfig, - #[serde(default)] pub security: SecurityFilterConfig, + + /// Directory containing a `filters.toml` override file. + /// Falls back to embedded defaults when `None` or when the file is absent. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub filters_path: Option, } impl Default for FilterConfig { fn default() -> Self { Self { enabled: true, - test: TestFilterConfig::default(), - git: GitFilterConfig::default(), - clippy: ClippyFilterConfig::default(), - cargo_build: CargoBuildFilterConfig::default(), - dir_listing: DirListingFilterConfig::default(), - log_dedup: LogDedupFilterConfig::default(), security: SecurityFilterConfig::default(), + filters_path: None, } } } -#[derive(Debug, Clone, Deserialize, Serialize)] -pub struct TestFilterConfig { - #[serde(default = "default_true")] - pub enabled: bool, - #[serde(default = "default_max_failures")] - pub max_failures: usize, - #[serde(default = "default_stack_trace_lines")] - pub truncate_stack_trace: usize, -} - -impl Default for TestFilterConfig { - fn default() -> Self { - Self { - enabled: true, - max_failures: default_max_failures(), - truncate_stack_trace: default_stack_trace_lines(), - } - } -} - -#[derive(Debug, Clone, Deserialize, Serialize)] -pub struct GitFilterConfig { - #[serde(default = "default_true")] - pub enabled: bool, - #[serde(default = "default_max_log_entries")] - pub max_log_entries: usize, - #[serde(default = "default_max_diff_lines")] - pub max_diff_lines: usize, -} - -impl Default for GitFilterConfig { - fn default() -> Self { - Self { - enabled: true, - max_log_entries: default_max_log_entries(), - max_diff_lines: default_max_diff_lines(), - } - } -} - -#[derive(Debug, Clone, Deserialize, Serialize)] -pub struct ClippyFilterConfig { - #[serde(default = "default_true")] - pub enabled: bool, -} - -impl Default for ClippyFilterConfig { - fn default() -> Self { - Self { enabled: true } - } -} - -#[derive(Debug, Clone, Deserialize, Serialize)] -pub struct CargoBuildFilterConfig { - #[serde(default = "default_true")] - pub enabled: bool, -} - -impl Default for CargoBuildFilterConfig { - fn default() -> Self { - Self { enabled: true } - } -} - -#[derive(Debug, Clone, Deserialize, Serialize)] -pub struct DirListingFilterConfig { - #[serde(default = "default_true")] - pub enabled: bool, -} - -impl Default for DirListingFilterConfig { - fn default() -> Self { - Self { enabled: true } - } -} - -#[derive(Debug, Clone, Deserialize, Serialize)] -pub struct LogDedupFilterConfig { - #[serde(default = "default_true")] - pub enabled: bool, -} - -impl Default for LogDedupFilterConfig { - fn default() -> Self { - Self { enabled: true } - } -} - #[derive(Debug, Clone, Deserialize, Serialize)] pub struct SecurityFilterConfig { #[serde(default = "default_true")] @@ -465,23 +335,8 @@ impl OutputFilterRegistry { ), metrics: Mutex::new(FilterMetrics::new()), }; - if config.test.enabled { - r.register(Box::new(TestOutputFilter::new(config.test.clone()))); - } - if config.clippy.enabled { - r.register(Box::new(ClippyFilter::new(config.clippy.clone()))); - } - if config.cargo_build.enabled { - r.register(Box::new(CargoBuildFilter::new(config.cargo_build.clone()))); - } - if config.git.enabled { - r.register(Box::new(GitFilter::new(config.git.clone()))); - } - if config.dir_listing.enabled { - r.register(Box::new(DirListingFilter::new(config.dir_listing.clone()))); - } - if config.log_dedup.enabled { - r.register(Box::new(LogDedupFilter::new(config.log_dedup.clone()))); + for f in declarative::load_declarative_filters(config.filters_path.as_deref()) { + r.register(f); } r } @@ -722,63 +577,23 @@ mod tests { let toml_str = "enabled = true"; let c: FilterConfig = toml::from_str(toml_str).unwrap(); assert!(c.enabled); - assert!(c.test.enabled); - assert!(c.git.enabled); - assert!(c.clippy.enabled); assert!(c.security.enabled); } #[test] - fn filter_config_deserialize_full() { + fn filter_config_deserialize_security() { let toml_str = r#" enabled = true -[test] -enabled = true -max_failures = 5 -truncate_stack_trace = 30 - -[git] -enabled = true -max_log_entries = 10 -max_diff_lines = 200 - -[clippy] -enabled = true - [security] enabled = true extra_patterns = ["TODO: security review"] "#; let c: FilterConfig = toml::from_str(toml_str).unwrap(); - assert_eq!(c.test.max_failures, 5); - assert_eq!(c.test.truncate_stack_trace, 30); - assert_eq!(c.git.max_log_entries, 10); - assert_eq!(c.git.max_diff_lines, 200); - assert!(c.clippy.enabled); + assert!(c.enabled); assert_eq!(c.security.extra_patterns, vec!["TODO: security review"]); } - #[test] - fn disabled_filter_excluded_from_registry() { - let config = FilterConfig { - test: TestFilterConfig { - enabled: false, - ..TestFilterConfig::default() - }, - ..FilterConfig::default() - }; - let r = OutputFilterRegistry::default_filters(&config); - assert!( - r.apply( - "cargo test", - "test result: ok. 5 passed; 0 failed; 0 ignored; 0 filtered out", - 0 - ) - .is_none() - ); - } - // CommandMatcher tests #[test] fn command_matcher_exact() { @@ -901,20 +716,6 @@ extra_patterns = ["TODO: security review"] } // Pipeline tests - #[test] - fn pipeline_single_stage() { - let config = FilterConfig::default(); - let filter = TestOutputFilter::new(config.test.clone()); - let mut pipeline = FilterPipeline::new(); - pipeline.push(&filter); - let result = pipeline.run( - "cargo test", - "test result: ok. 5 passed; 0 failed; 0 ignored; 0 filtered out", - 0, - ); - assert!(result.output.contains("5 passed")); - } - #[test] fn confidence_aggregation() { assert_eq!( diff --git a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__cargo_build__tests__cargo_build_error_snapshot.snap b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__cargo_build_error_snapshot.snap similarity index 73% rename from crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__cargo_build__tests__cargo_build_error_snapshot.snap rename to crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__cargo_build_error_snapshot.snap index 2392fb1d..046f9214 100644 --- a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__cargo_build__tests__cargo_build_error_snapshot.snap +++ b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__cargo_build_error_snapshot.snap @@ -1,8 +1,8 @@ --- -source: crates/zeph-tools/src/filter/cargo_build.rs +source: crates/zeph-tools/src/filter/declarative.rs expression: result.output --- -(1 compile/fetch lines removed) +(1 noise lines removed) error[E0308]: mismatched types --> crates/zeph-core/src/lib.rs:10:5 diff --git a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__cargo_build__tests__cargo_build_filter_snapshot.snap b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__cargo_build_filter_snapshot.snap similarity index 76% rename from crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__cargo_build__tests__cargo_build_filter_snapshot.snap rename to crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__cargo_build_filter_snapshot.snap index 08098d0f..7e8fdd6c 100644 --- a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__cargo_build__tests__cargo_build_filter_snapshot.snap +++ b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__cargo_build_filter_snapshot.snap @@ -1,9 +1,9 @@ --- -source: crates/zeph-tools/src/filter/cargo_build.rs +source: crates/zeph-tools/src/filter/declarative.rs expression: result.output --- Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.23s -(4 compile/fetch lines removed) +(4 noise lines removed) warning: unused import: `std::fmt` --> crates/zeph-core/src/lib.rs:3:5 diff --git a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__clippy__tests__clippy_grouped_warnings_snapshot.snap b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__clippy_grouped_warnings_snapshot.snap similarity index 79% rename from crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__clippy__tests__clippy_grouped_warnings_snapshot.snap rename to crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__clippy_grouped_warnings_snapshot.snap index 538aa47d..4ff25eb8 100644 --- a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__clippy__tests__clippy_grouped_warnings_snapshot.snap +++ b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__clippy_grouped_warnings_snapshot.snap @@ -1,5 +1,5 @@ --- -source: crates/zeph-tools/src/filter/clippy.rs +source: crates/zeph-tools/src/filter/declarative.rs expression: result.output --- clippy::needless_pass_by_value (2 warnings): diff --git a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__git__tests__filter_diff_snapshot.snap b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__filter_diff_snapshot.snap similarity index 71% rename from crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__git__tests__filter_diff_snapshot.snap rename to crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__filter_diff_snapshot.snap index 1403818c..60861203 100644 --- a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__git__tests__filter_diff_snapshot.snap +++ b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__filter_diff_snapshot.snap @@ -1,5 +1,5 @@ --- -source: crates/zeph-tools/src/filter/git.rs +source: crates/zeph-tools/src/filter/declarative.rs expression: result.output --- src/main.rs | +1 -1 diff --git a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__git__tests__filter_status_snapshot.snap b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__filter_status_snapshot.snap similarity index 62% rename from crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__git__tests__filter_status_snapshot.snap rename to crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__filter_status_snapshot.snap index 9c0417ae..5cee0323 100644 --- a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__git__tests__filter_status_snapshot.snap +++ b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__filter_status_snapshot.snap @@ -1,5 +1,5 @@ --- -source: crates/zeph-tools/src/filter/git.rs +source: crates/zeph-tools/src/filter/declarative.rs expression: result.output --- M 2 files | A 1 files | D 0 files | ?? 1 files diff --git a/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__summary_failures_snapshot.snap b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__summary_failures_snapshot.snap new file mode 100644 index 00000000..21124c9c --- /dev/null +++ b/crates/zeph-tools/src/filter/snapshots/zeph_tools__filter__declarative__tests__summary_failures_snapshot.snap @@ -0,0 +1,15 @@ +--- +source: crates/zeph-tools/src/filter/declarative.rs +assertion_line: 2268 +expression: result.output +--- +FAILURES: + +---- foo::test_b stdout ---- +thread 'foo::test_b' panicked at 'assertion `left == right` failed + left: 1 + right: 2', src/foo.rs:42:9 + +failures: + +test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 filtered out diff --git a/crates/zeph-tools/src/filter/test_output.rs b/crates/zeph-tools/src/filter/test_output.rs deleted file mode 100644 index a8fce88f..00000000 --- a/crates/zeph-tools/src/filter/test_output.rs +++ /dev/null @@ -1,245 +0,0 @@ -use std::fmt::Write; -use std::sync::LazyLock; - -use super::{ - CommandMatcher, FilterConfidence, FilterResult, OutputFilter, TestFilterConfig, make_result, -}; - -static TEST_MATCHER: LazyLock = LazyLock::new(|| { - CommandMatcher::Custom(Box::new(|command| { - let cmd = command.to_lowercase(); - let tokens: Vec<&str> = cmd.split_whitespace().collect(); - if tokens.first() != Some(&"cargo") { - return false; - } - tokens - .iter() - .skip(1) - .any(|t| *t == "test" || *t == "nextest") - })) -}); - -pub struct TestOutputFilter { - config: TestFilterConfig, -} - -impl TestOutputFilter { - #[must_use] - pub fn new(config: TestFilterConfig) -> Self { - Self { config } - } -} - -impl OutputFilter for TestOutputFilter { - fn name(&self) -> &'static str { - "test" - } - - fn matcher(&self) -> &CommandMatcher { - &TEST_MATCHER - } - - fn filter(&self, _command: &str, raw_output: &str, exit_code: i32) -> FilterResult { - let mut passed = 0u64; - let mut failed = 0u64; - let mut ignored = 0u64; - let mut filtered_out = 0u64; - let mut failure_blocks: Vec = Vec::new(); - let mut in_failure_block = false; - let mut current_block = String::new(); - let mut has_summary = false; - - for line in raw_output.lines() { - let trimmed = line.trim(); - - if trimmed.starts_with("FAIL [") || trimmed.starts_with("FAIL [") { - failed += 1; - continue; - } - if trimmed.starts_with("PASS [") || trimmed.starts_with("PASS [") { - passed += 1; - continue; - } - - // Standard cargo test failure block - if trimmed.starts_with("---- ") && trimmed.ends_with(" stdout ----") { - in_failure_block = true; - current_block.clear(); - current_block.push_str(line); - current_block.push('\n'); - continue; - } - - if in_failure_block { - current_block.push_str(line); - current_block.push('\n'); - if trimmed == "failures:" || trimmed.starts_with("---- ") { - failure_blocks.push(current_block.clone()); - in_failure_block = trimmed.starts_with("---- "); - if in_failure_block { - current_block.clear(); - current_block.push_str(line); - current_block.push('\n'); - } - } - continue; - } - - if trimmed == "failures:" && !current_block.is_empty() { - failure_blocks.push(current_block.clone()); - current_block.clear(); - } - - // Parse summary line - if trimmed.starts_with("test result:") { - has_summary = true; - for part in trimmed.split(';') { - let part = part.trim(); - if let Some(n) = extract_count(part, "passed") { - passed += n; - } else if let Some(n) = extract_count(part, "failed") { - failed += n; - } else if let Some(n) = extract_count(part, "ignored") { - ignored += n; - } else if let Some(n) = extract_count(part, "filtered out") { - filtered_out += n; - } - } - } - - if trimmed.contains("tests run:") { - has_summary = true; - } - } - - if in_failure_block && !current_block.is_empty() { - failure_blocks.push(current_block); - } - - if !has_summary && passed == 0 && failed == 0 { - return make_result( - raw_output, - raw_output.to_owned(), - FilterConfidence::Fallback, - ); - } - - let mut output = String::new(); - - if exit_code != 0 && !failure_blocks.is_empty() { - format_failures(&mut output, &failure_blocks, &self.config); - } - - let status = if failed > 0 { "FAILED" } else { "ok" }; - let _ = write!( - output, - "test result: {status}. {passed} passed; {failed} failed; \ - {ignored} ignored; {filtered_out} filtered out" - ); - - make_result(raw_output, output, FilterConfidence::Full) - } -} - -fn format_failures(output: &mut String, blocks: &[String], config: &TestFilterConfig) { - output.push_str("FAILURES:\n\n"); - let max = config.max_failures; - for block in blocks.iter().take(max) { - let lines: Vec<&str> = block.lines().collect(); - if lines.len() > config.truncate_stack_trace { - for line in &lines[..config.truncate_stack_trace] { - output.push_str(line); - output.push('\n'); - } - let remaining = lines.len() - config.truncate_stack_trace; - let _ = writeln!(output, "... ({remaining} more lines)"); - } else { - output.push_str(block); - } - output.push('\n'); - } - if blocks.len() > max { - let _ = writeln!(output, "... and {} more failure(s)", blocks.len() - max); - } -} - -fn extract_count(s: &str, label: &str) -> Option { - let idx = s.find(label)?; - let before = s[..idx].trim(); - let num_str = before.rsplit_once(' ').map_or(before, |(_, n)| n); - let num_str = num_str.trim_end_matches('.'); - let num_str = num_str.rsplit('.').next().unwrap_or(num_str).trim(); - num_str.parse().ok() -} - -#[cfg(test)] -mod tests { - use super::*; - - fn make_filter() -> TestOutputFilter { - TestOutputFilter::new(TestFilterConfig::default()) - } - - #[test] - fn matches_cargo_test() { - let f = make_filter(); - assert!(f.matcher().matches("cargo test")); - assert!(f.matcher().matches("cargo test --workspace")); - assert!(f.matcher().matches("cargo +nightly test")); - assert!(f.matcher().matches("cargo nextest run")); - assert!(!f.matcher().matches("cargo build")); - assert!(!f.matcher().matches("cargo test-helper")); - assert!(!f.matcher().matches("cargo install cargo-nextest")); - } - - #[test] - fn filter_success_compresses() { - let f = make_filter(); - let raw = "\ -running 3 tests -test foo::test_a ... ok -test foo::test_b ... ok -test foo::test_c ... ok - -test result: ok. 3 passed; 0 failed; 0 ignored; 0 filtered out; finished in 0.01s -"; - let result = f.filter("cargo test", raw, 0); - assert!(result.output.contains("3 passed")); - assert!(result.output.contains("0 failed")); - assert!(!result.output.contains("test_a")); - assert!(result.savings_pct() > 30.0); - assert_eq!(result.confidence, FilterConfidence::Full); - } - - #[test] - fn filter_failure_preserves_details() { - let f = make_filter(); - let raw = "\ -running 2 tests -test foo::test_a ... ok -test foo::test_b ... FAILED - ----- foo::test_b stdout ---- -thread 'foo::test_b' panicked at 'assertion failed: false' - -failures: - foo::test_b - -test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 filtered out; finished in 0.01s -"; - let result = f.filter("cargo test", raw, 1); - assert!(result.output.contains("FAILURES:")); - assert!(result.output.contains("foo::test_b")); - assert!(result.output.contains("assertion failed")); - assert!(result.output.contains("1 failed")); - } - - #[test] - fn filter_no_summary_passthrough() { - let f = make_filter(); - let raw = "some random output with no test results"; - let result = f.filter("cargo test", raw, 0); - assert_eq!(result.output, raw); - assert_eq!(result.confidence, FilterConfidence::Fallback); - } -} diff --git a/docs/src/advanced/tools.md b/docs/src/advanced/tools.md index 493b0402..729219a1 100644 --- a/docs/src/advanced/tools.md +++ b/docs/src/advanced/tools.md @@ -117,19 +117,13 @@ Before tool output reaches the LLM context, it passes through a command-aware fi ### Compound Command Matching -LLMs often generate compound shell expressions like `cd /path && cargo test 2>&1 | tail -80`. Filter matchers automatically extract the last command segment after `&&` or `;` separators and strip trailing pipes and redirections before matching. This means `cd /Users/me/project && cargo clippy --workspace -- -D warnings 2>&1` correctly matches the `ClippyFilter` — no special configuration needed. +LLMs often generate compound shell expressions like `cd /path && cargo test 2>&1 | tail -80`. Filter matchers automatically extract the last command segment after `&&` or `;` separators and strip trailing pipes and redirections before matching. This means `cd /Users/me/project && cargo clippy --workspace -- -D warnings 2>&1` correctly matches the clippy rules — no special configuration needed. -### Built-in Filters +### Built-in Rules -| Filter | Matches | What it removes | -|--------|---------|----------------| -| `TestOutputFilter` | `cargo test`, `cargo nextest`, `pytest`, `go test` | Passing test lines, verbose output; keeps failures and summary | -| `ClippyFilter` | `cargo clippy` | Duplicate diagnostic paths, redundant `help:` lines | -| `GitFilter` | `git log`, `git diff` | Limits log entries (default: 20), diff line count (default: 500) | -| `DirListingFilter` | `ls`, `find`, `tree` | Collapses redundant whitespace and deduplicates paths | -| `LogDedupFilter` | any command with repetitive log output | Deduplicates consecutive identical lines | +All 19 built-in rules are implemented in the declarative TOML engine and cover: Cargo test/nextest, Clippy, git status, git diff/log, directory listings, log deduplication, Docker, npm/yarn/pnpm, pip, Make, pytest, Go test, Terraform, kubectl, and Homebrew. -All filters also strip ANSI escape sequences, carriage-return progress bars, and collapse consecutive blank lines (`sanitize_output`). +All rules also strip ANSI escape sequences, carriage-return progress bars, and collapse consecutive blank lines (`sanitize_output`). ### Security Pass @@ -157,37 +151,91 @@ In CLI mode, after each filtered tool execution a one-line summary is printed to This appears only when lines were actually removed. It lets you verify the filter is working and estimate token savings without opening the TUI. -### Configuration +### Declarative Filters + +All filtering is driven by a declarative TOML engine. Rules are loaded at startup from a `filters.toml` file and compiled into the pipeline. + +When no user file is present, Zeph uses 19 embedded built-in rules that cover `cargo test`, `cargo nextest`, `cargo clippy`, `git status`, `git diff`, `git log`, directory listings (`ls`, `find`, `tree`), log deduplication, `docker build`, `npm`/`yarn`/`pnpm install`, `pip install`, `make`, `pytest`, `go test`, `terraform`, `kubectl`, and `brew`. + +To override, place a `filters.toml` next to your `config.toml` or set `filters_path`: ```toml [tools.filters] -enabled = true # Master switch (default: true) +filters_path = "/path/to/my/filters.toml" +``` -[tools.filters.test] -enabled = true -max_failures = 10 # Max failing tests to show (default: 10) -truncate_stack_trace = 50 # Stack trace line limit (default: 50) +#### Rule format -[tools.filters.git] -enabled = true -max_log_entries = 20 # Max git log entries (default: 20) -max_diff_lines = 500 # Max diff lines (default: 500) +Each rule has a `name`, a `match` block, and a `strategy` block: -[tools.filters.clippy] -enabled = true +```toml +[[rules]] +name = "docker-build" +match = { prefix = "docker build" } +strategy = { type = "strip_noise", patterns = [ + "^Step \\d+/\\d+ : ", + "^ ---> [a-f0-9]+$", + "^Removing intermediate container", + "^\\s*$", +] } + +[[rules]] +name = "make" +match = { prefix = "make" } +strategy = { type = "truncate", max_lines = 80, head = 15, tail = 15 } + +[[rules]] +name = "npm-install" +match = { regex = "^(npm|yarn|pnpm)\\s+(install|ci|add)" } +strategy = { type = "strip_noise", patterns = ["^npm warn", "^npm notice"] } +enabled = false # disable without removing +``` -[tools.filters.dir_listing] -enabled = true +#### Match types -[tools.filters.log_dedup] -enabled = true +| Field | Description | +|-------|-------------| +| `exact` | Matches the command string exactly | +| `prefix` | Matches if the command starts with the value | +| `regex` | Matches the command against a regex (max 512 chars) | + +Exactly one of `exact`, `prefix`, or `regex` must be set. + +#### Strategies + +Nine strategy types are available: + +| Strategy | Description | +|----------|-------------| +| `strip_noise` | Removes lines matching any of the provided regex patterns. `Full` confidence when lines removed, `Fallback` otherwise. | +| `truncate` | Keeps the first `head` lines and last `tail` lines when output exceeds `max_lines`. `Partial` confidence when truncated. Defaults: `head = 20`, `tail = 20`. | +| `keep_matching` | Keeps only lines matching at least one of the provided regex patterns; discards the rest. | +| `strip_annotated` | Strips lines that carry a specific annotation prefix (e.g. `note:`, `help:`). | +| `test_summary` | Parses test runner output (Cargo test/nextest, pytest, Go test); retains failures and the final summary, discards passing lines. | +| `group_by_rule` | Groups diagnostic lines (e.g. Clippy warnings) by lint rule and emits one block per rule. | +| `git_status` | Compact-formats `git status` output; preserves branch, staged, and unstaged sections. | +| `git_diff` | Limits diff output to `max_diff_lines` (default: 500); preserves file headers. | +| `dedup` | Normalises timestamps and UUIDs, then deduplicates consecutive identical lines, annotating repeat counts. | + +#### Safety limits + +- `filters.toml` files larger than 1 MiB are rejected (falls back to defaults). +- Regex patterns longer than 512 characters are rejected. +- Invalid rules are skipped with a warning; valid rules in the same file still load. + +### Configuration + +```toml +[tools.filters] +enabled = true # Master switch (default: true) +filters_path = "" # Custom filters.toml path (default: config dir) [tools.filters.security] enabled = true extra_patterns = [] # Additional regex patterns to flag as credentials ``` -Individual filters can be disabled without affecting others. +Individual rules can be disabled via `enabled = false` in the rule definition without removing them from the file. ## Configuration