Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,23 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]

### Added
- `ToolRegistry` with typed `ToolDef` definitions for 7 built-in tools (bash, read, edit, write, glob, grep, web_scrape) (#239)
- `FileExecutor` for sandboxed file operations: read, write, edit, glob, grep (#242)
- `ToolCall` struct and `execute_tool_call()` on `ToolExecutor` trait for structured tool invocation (#241)
- `CompositeExecutor` routes structured tool calls to correct sub-executor by tool_id (#243)
- Tool catalog section in system prompt via `ToolRegistry::format_for_prompt()` (#244)
- Configurable `max_tool_iterations` (default 10, previously hardcoded 3) via TOML and `ZEPH_AGENT_MAX_TOOL_ITERATIONS` env var (#245)
- Doom-loop detection: breaks agent loop on 3 consecutive identical tool outputs
- Context budget check at 80% threshold stops iteration before context overflow
- `IndexWatcher` for incremental code index updates on file changes via `notify` file watcher (#233)
- `watch` config field in `[index]` section (default `true`) to enable/disable file watching
- Repo map cache with configurable TTL (`repo_map_ttl_secs`, default 300s) to avoid per-message filesystem traversal (#231)
- Cross-session memory score threshold (`cross_session_score_threshold`, default 0.35) to filter low-relevance results (#232)

### Fixed
- Persist `MessagePart` data to SQLite via `remember_with_parts()` — pruning state now survives session restarts (#229)
- Clear tool output body from memory after Tier 1 pruning to reclaim heap (#230)

### Added
- Repo map cache with configurable TTL (`repo_map_ttl_secs`, default 300s) to avoid per-message filesystem traversal (#231)
- Cross-session memory score threshold (`cross_session_score_threshold`, default 0.35) to filter low-relevance results (#232)

## [0.9.4] - 2026-02-14

### Added
Expand Down
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ crossterm = "0.29"
axum = "0.8"
blake3 = "1.8"
criterion = "0.8"
glob = "0.3.3"
futures = "0.3"
ignore = "0.4"
hf-hub = { version = "0.4", default-features = false, features = ["tokio", "rustls-tls", "ureq"] }
Expand All @@ -32,6 +33,7 @@ ollama-rs = { version = "0.3", default-features = false, features = ["rustls", "
pulldown-cmark = "0.13"
qdrant-client = { version = "1.16", default-features = false }
ratatui = "0.30"
regex = "1.12"
reqwest = { version = "0.13", default-features = false }
rmcp = "0.14"
scrape-core = "0.2.2"
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, and Hugg

**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors, then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) — not O(N) — regardless of how many capabilities are installed.

**Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
**Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.

**Run anywhere.** Local models via Ollama or Candle (GGUF with Metal/CUDA), cloud APIs (Claude, OpenAI, GPT-compatible endpoints like Together AI and Groq), or all of them at once through the multi-model orchestrator with automatic fallback chains.

**Production-ready security.** Shell sandboxing with path restrictions, command filtering (12 blocked patterns), destructive command confirmation, secret redaction, audit logging, SSRF protection, and Trivy-scanned container images with 0 HIGH/CRITICAL CVEs.
**Production-ready security.** Shell sandboxing with path restrictions, command filtering (12 blocked patterns), destructive command confirmation, file operation sandbox with path traversal protection, secret redaction, audit logging, SSRF protection, and Trivy-scanned container images with 0 HIGH/CRITICAL CVEs.

**Self-improving.** Skills evolve through failure detection, self-reflection, and LLM-generated improvements — with optional manual approval before activation.

Expand Down Expand Up @@ -99,7 +99,7 @@ cargo build --release --features tui
| **Self-Learning** | Skills evolve via failure detection and LLM-generated improvements | [Self-Learning](https://bug-ops.github.io/zeph/guide/self-learning.html) |
| **TUI Dashboard** | ratatui terminal UI with markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
| **Multi-Channel I/O** | CLI, Telegram, and TUI with streaming support | [Channels](https://bug-ops.github.io/zeph/guide/channels.html) |
| **Defense-in-Depth** | Shell sandbox, command filter, secret redaction, audit log, SSRF protection | [Security](https://bug-ops.github.io/zeph/security.html) |
| **Defense-in-Depth** | Shell sandbox, file sandbox with path traversal protection, command filter, secret redaction, audit log, SSRF protection, doom-loop detection | [Security](https://bug-ops.github.io/zeph/security.html) |

## Architecture

Expand All @@ -111,7 +111,7 @@ zeph (binary)
├── zeph-memory — SQLite + Qdrant, semantic recall, summarization
├── zeph-index — AST-based code indexing, semantic retrieval, repo map (optional)
├── zeph-channels — Telegram adapter (teloxide) with streaming
├── zeph-tools — shell executor, web scraper, composite tool dispatch
├── zeph-tools — 7 built-in tools (shell, file, web scrape, fetch, grep, glob, think), tool registry, composite dispatch
├── zeph-mcp — MCP client, multi-server lifecycle, unified tool matching
├── zeph-a2a — A2A client + server, agent discovery, JSON-RPC 2.0
└── zeph-tui — ratatui TUI dashboard with live agent metrics (optional)
Expand Down
2 changes: 2 additions & 0 deletions config/default.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[agent]
# Agent display name
name = "Zeph"
# Maximum tool execution iterations per user message (doom-loop protection)
max_tool_iterations = 10

[llm]
# LLM provider: "ollama" for local models or "claude" for Claude API
Expand Down
105 changes: 98 additions & 7 deletions crates/zeph-core/src/agent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ use crate::context::{ContextBudget, EnvironmentContext, build_system_prompt};
use crate::redact::redact_secrets;
use zeph_memory::semantic::estimate_tokens;

// TODO(M14): Make configurable via AgentConfig (currently hardcoded for MVP)
const MAX_SHELL_ITERATIONS: usize = 3;
const DOOM_LOOP_WINDOW: usize = 3;
const MAX_QUEUE_SIZE: usize = 10;
const MESSAGE_MERGE_WINDOW: Duration = Duration::from_millis(500);
const RECALL_PREFIX: &str = "[semantic recall]\n";
Expand Down Expand Up @@ -100,6 +99,8 @@ pub struct Agent<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor>
#[cfg(feature = "index")]
repo_map_ttl: std::time::Duration,
warmup_ready: Option<watch::Receiver<bool>>,
max_tool_iterations: usize,
doom_loop_history: Vec<String>,
}

impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C, T> {
Expand All @@ -118,7 +119,7 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
.filter_map(|m| registry.get_skill(&m.name).ok())
.collect();
let skills_prompt = format_skills_prompt(&all_skills, std::env::consts::OS);
let system_prompt = build_system_prompt(&skills_prompt, None);
let system_prompt = build_system_prompt(&skills_prompt, None, None);
tracing::debug!(len = system_prompt.len(), "initial system prompt built");
tracing::trace!(prompt = %system_prompt, "full system prompt");

Expand Down Expand Up @@ -182,9 +183,17 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
#[cfg(feature = "index")]
repo_map_ttl: std::time::Duration::from_secs(300),
warmup_ready: None,
max_tool_iterations: 10,
doom_loop_history: Vec::new(),
}
}

#[must_use]
pub fn with_max_tool_iterations(mut self, max: usize) -> Self {
self.max_tool_iterations = max;
self
}

#[must_use]
pub fn with_memory(
mut self,
Expand Down Expand Up @@ -1605,7 +1614,7 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
.collect();
let skills_prompt = format_skills_prompt(&all_skills, std::env::consts::OS);
self.last_skills_prompt.clone_from(&skills_prompt);
let system_prompt = build_system_prompt(&skills_prompt, None);
let system_prompt = build_system_prompt(&skills_prompt, None, None);
if let Some(msg) = self.messages.first_mut() {
msg.content = system_prompt;
}
Expand Down Expand Up @@ -1653,6 +1662,7 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
tracing::info!("config reloaded");
}

#[allow(clippy::too_many_lines)]
async fn rebuild_system_prompt(&mut self, query: &str) {
let all_meta = self.registry.all_meta();
let matched_indices: Vec<usize> = if let Some(matcher) = &self.matcher {
Expand Down Expand Up @@ -1710,8 +1720,18 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
let catalog_prompt = format_skills_catalog(&remaining_skills);
self.last_skills_prompt.clone_from(&skills_prompt);
let env = EnvironmentContext::gather(&self.model_name);
let tool_catalog = {
let defs = self.tool_executor.tool_definitions();
if defs.is_empty() {
None
} else {
let reg = zeph_tools::ToolRegistry::new();
Some(reg.format_for_prompt())
}
};
#[allow(unused_mut)]
let mut system_prompt = build_system_prompt(&skills_prompt, Some(&env));
let mut system_prompt =
build_system_prompt(&skills_prompt, Some(&env), tool_catalog.as_deref());

if !catalog_prompt.is_empty() {
system_prompt.push_str("\n\n");
Expand Down Expand Up @@ -1832,9 +1852,33 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
}

async fn process_response(&mut self) -> anyhow::Result<()> {
for _ in 0..MAX_SHELL_ITERATIONS {
self.doom_loop_history.clear();

for iteration in 0..self.max_tool_iterations {
self.channel.send_typing().await?;

// Context budget check at 80% threshold
if let Some(ref budget) = self.context_budget {
let used: usize = self
.messages
.iter()
.map(|m| estimate_tokens(&m.content))
.sum();
let threshold = budget.max_tokens() * 4 / 5;
if used >= threshold {
tracing::warn!(
iteration,
used,
threshold,
"stopping tool loop: context budget nearing limit"
);
self.channel
.send("Stopping: context window is nearly full.")
.await?;
break;
}
}

let Some(response) = self.call_llm_with_timeout().await? else {
return Ok(());
};
Expand Down Expand Up @@ -1869,6 +1913,25 @@ impl<P: LlmProvider + Clone + 'static, C: Channel, T: ToolExecutor> Agent<P, C,
if !self.handle_tool_result(&response, result).await? {
return Ok(());
}

// Doom-loop detection: compare last N outputs by string equality
if let Some(last_msg) = self.messages.last() {
self.doom_loop_history.push(last_msg.content.clone());
if self.doom_loop_history.len() >= DOOM_LOOP_WINDOW {
let recent =
&self.doom_loop_history[self.doom_loop_history.len() - DOOM_LOOP_WINDOW..];
if recent.windows(2).all(|w| w[0] == w[1]) {
tracing::warn!(
iteration,
"doom-loop detected: {DOOM_LOOP_WINDOW} consecutive identical outputs"
);
self.channel
.send("Stopping: detected repeated identical tool outputs.")
.await?;
break;
}
}
}
}

Ok(())
Expand Down Expand Up @@ -3382,7 +3445,7 @@ mod agent_tests {
.iter()
.filter(|m| m.role == Role::Assistant)
.count();
assert!(assistant_count <= MAX_SHELL_ITERATIONS);
assert!(assistant_count <= 10);
}

#[test]
Expand Down Expand Up @@ -4560,4 +4623,32 @@ mod agent_tests {
assert_eq!(filtered[0].summary_text, "high score");
assert_eq!(filtered[1].summary_text, "at threshold");
}

#[test]
fn doom_loop_detection_triggers_on_identical_outputs() {
let s = "same output".to_owned();
let history = vec![s.clone(), s.clone(), s];
let recent = &history[history.len() - DOOM_LOOP_WINDOW..];
assert!(recent.windows(2).all(|w| w[0] == w[1]));
}

#[test]
fn doom_loop_detection_no_trigger_on_different_outputs() {
let history = vec![
"output a".to_owned(),
"output b".to_owned(),
"output c".to_owned(),
];
let recent = &history[history.len() - DOOM_LOOP_WINDOW..];
assert!(!recent.windows(2).all(|w| w[0] == w[1]));
}

#[test]
fn context_budget_80_percent_threshold() {
let budget = ContextBudget::new(1000, 0.20);
let threshold = budget.max_tokens() * 4 / 5;
assert_eq!(threshold, 800);
assert!(800 >= threshold); // at threshold → should stop
assert!(799 < threshold); // below threshold → should continue
}
}
7 changes: 7 additions & 0 deletions crates/zeph-core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,15 @@ pub struct Config {
pub secrets: ResolvedSecrets,
}

fn default_max_tool_iterations() -> usize {
10
}

#[derive(Debug, Deserialize)]
pub struct AgentConfig {
pub name: String,
#[serde(default = "default_max_tool_iterations")]
pub max_tool_iterations: usize,
}

#[derive(Debug, Deserialize)]
Expand Down Expand Up @@ -864,6 +870,7 @@ impl Config {
Self {
agent: AgentConfig {
name: "Zeph".into(),
max_tool_iterations: 10,
},
llm: LlmConfig {
provider: "ollama".into(),
Expand Down
23 changes: 17 additions & 6 deletions crates/zeph-core/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,25 @@ the user explicitly asks about a skill by name.\n\
- Do not execute commands that could cause data loss without confirmation.";

#[must_use]
pub fn build_system_prompt(skills_prompt: &str, env: Option<&EnvironmentContext>) -> String {
pub fn build_system_prompt(
skills_prompt: &str,
env: Option<&EnvironmentContext>,
tool_catalog: Option<&str>,
) -> String {
let mut prompt = BASE_PROMPT.to_string();

if let Some(env) = env {
prompt.push_str("\n\n");
prompt.push_str(&env.format());
}

if let Some(catalog) = tool_catalog
&& !catalog.is_empty()
{
prompt.push_str("\n\n");
prompt.push_str(catalog);
}

if !skills_prompt.is_empty() {
prompt.push_str("\n\n");
prompt.push_str(skills_prompt);
Expand Down Expand Up @@ -187,14 +198,14 @@ mod tests {

#[test]
fn without_skills() {
let prompt = build_system_prompt("", None);
let prompt = build_system_prompt("", None, None);
assert!(prompt.starts_with("You are Zeph"));
assert!(!prompt.contains("available_skills"));
}

#[test]
fn with_skills() {
let prompt = build_system_prompt("<available_skills>test</available_skills>", None);
let prompt = build_system_prompt("<available_skills>test</available_skills>", None, None);
assert!(prompt.contains("You are Zeph"));
assert!(prompt.contains("<available_skills>"));
}
Expand Down Expand Up @@ -308,23 +319,23 @@ mod tests {
os: "linux".into(),
model_name: "test".into(),
};
let prompt = build_system_prompt("skills here", Some(&env));
let prompt = build_system_prompt("skills here", Some(&env), None);
assert!(prompt.contains("You are Zeph"));
assert!(prompt.contains("<environment>"));
assert!(prompt.contains("skills here"));
}

#[test]
fn build_system_prompt_without_env() {
let prompt = build_system_prompt("skills here", None);
let prompt = build_system_prompt("skills here", None, None);
assert!(prompt.contains("You are Zeph"));
assert!(!prompt.contains("<environment>"));
assert!(prompt.contains("skills here"));
}

#[test]
fn base_prompt_contains_guidelines() {
let prompt = build_system_prompt("", None);
let prompt = build_system_prompt("", None, None);
assert!(prompt.contains("## Tool Use"));
assert!(prompt.contains("## Guidelines"));
assert!(prompt.contains("## Security"));
Expand Down
2 changes: 2 additions & 0 deletions crates/zeph-tools/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ license.workspace = true
repository.workspace = true

[dependencies]
glob.workspace = true
regex.workspace = true
reqwest = { workspace = true, features = ["rustls"] }
scrape-core.workspace = true
serde = { workspace = true, features = ["derive"] }
Expand Down
Loading
Loading