Skip to content

RFC: Log compaction for token-efficient historical analysis #14603

@anupamchugh

Description

@anupamchugh

Problem

Audit workflows (like ci-doctor) read 24 hours of logs daily for trend analysis. As a project scales to hundreds of daily workflow runs, analyzing 7-30 day trends requires re-reading gigabytes of raw logs at full token cost.

There's no compaction mechanism for historical logs — every analysis pass pays the full token cost even for weeks-old data where only the structured outcome matters.

Proposal

Add a log compaction step that replaces old raw logs with structured summaries:

{
  "workflow": "ci-doctor",
  "run_id": "12345",
  "status": "failure",
  "errors": ["missing tool: jq"],
  "duration_s": 45,
  "token_count": 45230,
  "summary": "Failed due to missing jq in container, retried 3x before giving up"
}

Configuration:

  • compact_after: 7d — keep raw logs for 7 days, then compact
  • Summaries retain diagnostic value (error patterns, failure categories, token costs)
  • Raw logs archived (not deleted) for forensic access if needed

Expected savings: 80-90% token reduction for historical trend analysis. A 45K-token workflow log compacts to ~200 tokens without losing the structured signal agents need to learn from failures.

Why this matters

This builds on the existing token optimization work (#14355, #14395, #7081). The principle: agents analyzing past runs need structured error patterns and outcomes, not full execution traces. Similar to how databases materialize aggregates instead of re-querying raw tables.

Questions for maintainers

  1. Would this fit as a built-in command (gh aw compact-logs) or as a workflow action?
  2. Should compaction be opt-in per workflow or global?
  3. Any concerns about losing diagnostic detail in the summaries?

Happy to implement if this direction makes sense.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions