Skip to content

feat(sdk): Track and display auxiliary LLM costs#115

Merged
dcramer merged 12 commits intomainfrom
feat/auxiliary-usage-tracking
Feb 7, 2026
Merged

feat(sdk): Track and display auxiliary LLM costs#115
dcramer merged 12 commits intomainfrom
feat/auxiliary-usage-tracking

Conversation

@dcramer
Copy link
Member

@dcramer dcramer commented Feb 7, 2026

Track and display costs from auxiliary LLM calls that were previously invisible.

Warden makes two direct Anthropic API calls (not through Claude Code SDK) whose costs were untracked:

  1. Extraction repair (extractFindingsWithLLM): Uses claude-haiku-4-5 when regex extraction fails
  2. Semantic dedup (findSemanticDuplicates): Uses claude-haiku-4-5 to detect duplicate findings

Both now capture response.usage, convert it to UsageStats via a new pricing module, and surface it in all output formats. The data model uses AuxiliaryUsageMap (a record of agent name to UsageStats) on SkillReport, designed to accommodate future auxiliary agents.

What changed:

  • New AuxiliaryUsageMapSchema type and optional auxiliaryUsage field on SkillReport
  • New src/sdk/pricing.ts with model pricing constants and apiUsageToStats()
  • New aggregateAuxiliaryUsage() and mergeAuxiliaryUsage() helpers in usage.ts
  • Extraction repair in extract.ts now returns usage; threaded through hunk -> file -> report in both runSkill() and runSkillTask() code paths
  • Semantic dedup in dedup.ts now returns usage; merged into report in poster.ts
  • formatStatsCompact() shows total cost with per-agent breakdown: $0.0060 (+extraction: $0.0012)
  • GitHub check summaries, PR comment renderer, and JSONL output all include auxiliary costs
  • All new fields are optional for backward compatibility

Depends on #114.

Refs #108

dcramer and others added 2 commits February 6, 2026 14:19
Route interrupt message through Ink rendering pipeline (TTY) and
logPlain (non-TTY) instead of raw stderr writes that corrupt Ink's
cursor tracking. The abort signal triggers the message in both paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Warden makes two auxiliary LLM calls (direct Anthropic API) whose costs
were invisible: extraction repair and semantic dedup. Both now capture
usage data and surface it in all output formats.

Add AuxiliaryUsageMap type (record of agent name to UsageStats) and
optional auxiliaryUsage field on SkillReport. Create pricing module for
calculating costs from raw API token counts. Thread auxiliary usage from
extraction repair through hunk -> file -> skill report in both runSkill
and runSkillTask code paths. Capture semantic dedup usage and merge it
into reports in the review poster.

Update formatStatsCompact to show total cost (primary + auxiliary) with
per-agent breakdown suffix. Update GitHub check summaries, PR comment
renderer, and JSONL output to include auxiliary costs.

Refs #108

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
warden Ready Ready Preview, Comment Feb 7, 2026 3:58am

Request Review

dcramer and others added 2 commits February 6, 2026 16:33
…elpers

Hoist usage variable outside try block in findSemanticDuplicates so
API usage is preserved even when response parsing fails. Also extract
shared auxiliary cost formatting helpers from github-checks.ts to
formatters.ts to eliminate duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded MODEL_PRICING table with JSON generated from the
open-source pydantic/genai-prices repository. Adds scripts/update-pricing.ts
to fetch and normalize Anthropic pricing data (handling tiered pricing),
and commits the generated model-pricing.json so the library works without
running the script. Also includes per-file report tracking (FileReport),
ink-runner file completion display, and JSONL per-file records.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use Promise.all return values to collect results in input order instead
of pushing to shared arrays from concurrent async functions. The previous
side-effect approach produced non-deterministic ordering for report.files
and findings when parallel=true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The action was calling runSkill() without callbacks, producing zero
output during skill execution. In CI this meant minutes of silence
after "Running trigger". Wire up onFileStart, onHunkStart, and
onFileComplete so file and hunk progress appears in the Actions log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the ad-hoc console.log callbacks in the trigger executor with
the same runSkillTask + createDefaultCallbacks log-mode reporter the
CLI uses. This gives CI output consistent formatting: timestamped lines
with file progress, hunk ranges, duration, cost, and finding counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add null/type check before iterating anthropic.models in
  update-pricing.ts (external data source may omit the field)
- Fix JSONL spec example: findings array now has 2 entries matching
  the "2 issues (1 high, 1 medium)" summary
- Code simplifier: extract AuxiliaryUsageEntry type (was repeated 7x),
  FileProcessResult interface, simplify aux collection with flatMap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The JSONL summary was only aggregating main SDK usage, omitting
auxiliary costs (extraction LLM calls). Consumers reading only the
summary line would undercount total costs compared to GitHub checks.
Now aggregates auxiliaryUsage across all skill reports using
mergeAuxiliaryUsage, matching the GitHub check summary behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add null check on model.prices in update-pricing script to skip models
without pricing data. Add batchDelayMs rate-limiting between file
batches in runSkillTask, matching the existing behavior in runSkill.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

findingsBySeverity: Record<string, number>;
totalDurationMs?: number;
totalUsage?: UsageStats;
totalAuxiliaryUsage?: AuxiliaryUsageMap;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation for new totalAuxiliaryUsage properties

Low Severity

The totalAuxiliaryUsage property is indented with 2 spaces in both the return type declaration (line 67) and the return object literal (line 93), while all sibling properties use 4 spaces (type) and 6 spaces (object literal) respectively. This misalignment breaks the visual structure of the code block.

Additional Locations (1)

Fix in Cursor Fix in Web

@dcramer dcramer marked this pull request as ready for review February 7, 2026 05:13
@dcramer dcramer merged commit bf8bc00 into main Feb 7, 2026
12 checks passed
@dcramer dcramer deleted the feat/auxiliary-usage-tracking branch February 7, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant