-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
performancePerformance optimizationPerformance optimization
Description
Problem
Every LLM call rescans the entire message list to estimate prompt tokens, performing 50-100 iterator steps + divisions per call.
Files: crates/zeph-core/src/agent/streaming.rs lines 132-136, 431-435
Current code:
let prompt_estimate: u64 = self
.messages
.iter()
.map(|m| u64::try_from(m.content.len()).unwrap_or(0) / 4)
.sum();Impact
- CPU: 2-5% overhead per LLM call
- Latency: +1-3ms per call (10-30ms cumulative over 10-iteration tool loop)
Solution
Maintain cached counter:
struct Agent {
cached_prompt_tokens: usize,
}
// Update on message push/drain
self.cached_prompt_tokens += estimate_tokens(&msg.content);Priority: P0
Effort: Medium (2-3 hours, needs careful state tracking)
Related to #391
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performancePerformance optimizationPerformance optimization