-
Notifications
You must be signed in to change notification settings - Fork 129
Description
Problem
Audit workflows (like ci-doctor) read 24 hours of logs daily for trend analysis. As a project scales to hundreds of daily workflow runs, analyzing 7-30 day trends requires re-reading gigabytes of raw logs at full token cost.
There's no compaction mechanism for historical logs — every analysis pass pays the full token cost even for weeks-old data where only the structured outcome matters.
Proposal
Add a log compaction step that replaces old raw logs with structured summaries:
{
"workflow": "ci-doctor",
"run_id": "12345",
"status": "failure",
"errors": ["missing tool: jq"],
"duration_s": 45,
"token_count": 45230,
"summary": "Failed due to missing jq in container, retried 3x before giving up"
}Configuration:
compact_after: 7d— keep raw logs for 7 days, then compact- Summaries retain diagnostic value (error patterns, failure categories, token costs)
- Raw logs archived (not deleted) for forensic access if needed
Expected savings: 80-90% token reduction for historical trend analysis. A 45K-token workflow log compacts to ~200 tokens without losing the structured signal agents need to learn from failures.
Why this matters
This builds on the existing token optimization work (#14355, #14395, #7081). The principle: agents analyzing past runs need structured error patterns and outcomes, not full execution traces. Similar to how databases materialize aggregates instead of re-querying raw tables.
Questions for maintainers
- Would this fit as a built-in command (
gh aw compact-logs) or as a workflow action? - Should compaction be opt-in per workflow or global?
- Any concerns about losing diagnostic detail in the summaries?
Happy to implement if this direction makes sense.