Skip to content

chore: improve CI performance by upgrading runner and removing test overhead#2076

Merged
nick-inkeep merged 4 commits intomainfrom
chore/ci-performance-improvements
Feb 17, 2026
Merged

chore: improve CI performance by upgrading runner and removing test overhead#2076
nick-inkeep merged 4 commits intomainfrom
chore/ci-performance-improvements

Conversation

@nick-inkeep
Copy link
Collaborator

@nick-inkeep nick-inkeep commented Feb 17, 2026

Summary

Targeted CI performance improvements to reduce CI check times from 8-20 minutes to an expected 3-6 minutes for typical PRs.

Changes

1. Remove OpenTelemetry from test setup (biggest per-worker impact)

  • Removed NodeSDK initialization with getNodeAutoInstrumentations() from agents-api/src/__tests__/setup.ts
  • This was monkey-patching ~20 Node.js modules in every Vitest worker thread
  • Removed now-unused @opentelemetry/exporter-trace-otlp-proto and @opentelemetry/sdk-metrics devDependencies
  • No tests depend on real OTel — tests that reference tracing already use vi.mock

2. Tune Vitest worker threads (reduce CPU contention)

  • agents-api: maxThreads 10→8, minThreads 4→2
  • Aligns with the 4 vCPU runner to reduce context switching overhead

3. Upgrade CI runner (more resources)

  • Changed runs-on: ubuntu-latestruns-on: ubuntu-16gb (matches Cypress workflow)
  • Added timeout-minutes: 30 safety guardrail

4. Fix turbo cache cascade invalidation (biggest systemic impact)

  • Exclude test files from build inputs: !**/*.test.*, !**/*.spec.*, !**/__tests__/** — prevents test file changes in core packages from cascading build hash invalidation to all 11+ downstream packages. Uses officially documented $TURBO_DEFAULT$ + negation glob pattern (turbo 1.12+).
  • Remove transit dependency from lint: transit is a no-op coordination task (no package defines a transit script) but its hash changes on any file change, cascading to all downstream packages' lint tasks. Lint only reads local source files and doesn't need this ordering.
  • Move TURBO_TOKEN/TURBO_TEAM to job-level env: Ensures all turbo invocations (pnpm check, pnpm knip) use remote cache — previously only pnpm check had access.

Evidence

Before (PR #2068 — changed 2 test files):

  • 36 out of 45 turbo tasks were cache misses (80%)
  • Total CI time: 20m25s

Expected after:

  • Same 2-file change would cause ~6 cache misses instead of 36
  • Combined with OTel removal and thread tuning, typical CI: 3-6 minutes

Test plan

  • agents-api tests pass (97 passed, 4 skipped, 4 pre-existing CORS failures from PR gate cors for local dev #2066)
  • agents-core tests pass (84 passed, all green)
  • turbo check --dry validates config (lint has no transit dependency, build excludes test files)
  • Build works correctly (turbo build --filter=@inkeep/agents-sdk succeeds)
  • Lint works without transit dependency (turbo lint --filter=@inkeep/agents-sdk succeeds)
  • CI pipeline passes on this PR

🤖 Generated with Claude Code

…verhead

- Upgrade ci job runner from ubuntu-latest to ubuntu-16gb for more resources
- Remove OpenTelemetry NodeSDK initialization from test setup (was creating
  full auto-instrumentation per worker thread with no benefit in unit tests)
- Reduce agents-api vitest maxThreads from 10 to 8 and minThreads from 4 to 2
  to better match runner core count and reduce per-worker initialization cost

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 17, 2026 5:51pm
agents-docs Ready Ready Preview, Comment Feb 17, 2026 5:51pm
agents-manage-ui Ready Ready Preview, Comment Feb 17, 2026 5:51pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Feb 17, 2026

⚠️ No Changeset found

Latest commit: 9ac0c84

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(1) Total Issues | Risk: Low

🔴❗ Critical (0) ❗🔴

None.

🟠⚠️ Major (0) 🟠⚠️

None.

🟡 Minor (0) 🟡

None.

💭 Consider (1) 💭

💭 1) agents-api/package.json Potentially unused OTel devDependencies

Issue: With the removal of OTel NodeSDK initialization from test setup, two devDependencies appear to no longer have any consumers in agents-api:

  • @opentelemetry/exporter-trace-otlp-proto (line 98)
  • @opentelemetry/sdk-metrics (line 99)

Why: These were only imported in the removed test setup code. Production instrumentation uses @opentelemetry/exporter-trace-otlp-http (not -proto) and doesn't use sdk-metrics. Leaving unused deps adds install overhead and may trigger false positives in dependency analysis.

Fix: Run pnpm knip to confirm these are flagged as unused, then remove if confirmed. This can be addressed in a follow-up PR.

Refs:

🧹 While You're Here (1) 🧹

🧹 1) .github/workflows/ci.yml CI job missing timeout-minutes

Issue: The ci job lacks a timeout-minutes setting, while the Cypress workflow (which also uses ubuntu-16gb) sets timeout-minutes: 30.

Why: Without a timeout, a hung build/test step could consume runner minutes until the 360-minute GitHub default. Given this PR aims to reduce CI time, an explicit timeout documents expected duration and provides a safeguard.

Fix: Add timeout-minutes: 30 (or appropriate value) to the ci job definition, similar to the Cypress workflow.

Refs:


✅ APPROVE

Summary: Clean CI performance optimization. The OTel removal is verified safe (tests mock the API layer directly, production instrumentation is unaffected), the runner upgrade matches the existing Cypress workflow pattern, and the thread pool tuning is well-reasoned for the target runner specs. The Consider item (unused devDeps) is a minor cleanup opportunity, and the While You're Here item (timeout) is a pre-existing improvement opportunity. Ship it! 🚀

Discarded (2)
Location Issue Reason Discarded
.github/workflows/ci.yml Missing explicit permissions: block Pre-existing, consistent with other workflows in repo, very low priority security hygiene
agents-api/vitest.config.ts Thread pool values lack explanatory comment AGENTS.md guidance: "No Comments: Do not add comments unless explicitly requested"
Reviewers (2)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-devops 5 0 1 1 0 0 2
pr-review-standards 0 0 0 0 0 0 0
Total 5 0 1 1 0 0 2

Note: pr-review-standards confirmed no bugs, security issues, or AGENTS.md compliance problems. One finding ("thread pool comment now stale") was discarded as it conflicts with AGENTS.md guidance on comments.

@github-actions github-actions bot deleted a comment from claude bot Feb 17, 2026
Remove @opentelemetry/exporter-trace-otlp-proto and @opentelemetry/sdk-metrics
from agents-api devDependencies since they were only used in the test setup
OTel initialization that was removed in the previous commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nick-inkeep and others added 2 commits February 17, 2026 09:46
Three targeted fixes to prevent catastrophic cache invalidation in CI:

1. Exclude test files from build task inputs - prevents test file changes
   in core packages from cascading build hash invalidation to all 11+
   downstream packages. Uses officially documented $TURBO_DEFAULT$ with
   negation globs (turbo 1.12+).

2. Remove transit dependency from lint task - transit is a no-op
   coordination task (no package defines a transit script) but its hash
   changes on any file change, cascading to all downstream lint tasks.
   Lint only reads local source files and doesn't need dependency ordering.

3. Move TURBO_TOKEN/TURBO_TEAM to job-level env - ensures all turbo
   invocations (check, knip) use remote cache, not just pnpm check.
   Also adds timeout-minutes: 30 as a safety guardrail.

Evidence: PR #2068 changed 2 test files but caused 36/45 cache misses.
With these fixes, the same change would cause ~6 misses (only the
directly affected packages).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

🔴❗ Critical (0) ❗🔴

None.

🟠⚠️ Major (0) 🟠⚠️

None.

🟡 Minor (0) 🟡

None.

💭 Consider (0) 💭

None.

🕐 Pending Recommendations (0)

None — both items from the previous review have been addressed.


✅ APPROVE

Summary: Clean delta addressing all prior feedback. The two commits since last review implement exactly what was suggested: (1) timeout-minutes: 30 added to the CI job matching the Cypress workflow pattern, and (2) unused OTel devDependencies removed from agents-api/package.json. The additional turbo cache optimizations (test file exclusions from build inputs, transit dependency removal from lint) are correctly implemented — the devops reviewer verified these don't affect build correctness since test files are already excluded from tsdown bundling and no package defines a transit script. Ship it! 🚀

Discarded (4)
Location Issue Reason Discarded
.github/workflows/ci.yml:19 Previous feedback addressed verification INFO-level confirmation, not an actionable issue
.github/workflows/ci.yml:20-24 TURBO env vars hoisting verification INFO-level confirmation, not an actionable issue
turbo.json:31-34 Build input exclusions verification INFO-level confirmation, not an actionable issue
turbo.json:74-76 Lint transit dependency removal verification INFO-level confirmation, not an actionable issue
Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-devops 4 0 0 0 0 0 4
Total 4 0 0 0 0 0 4

Note: All devops findings were INFO-level verifications confirming the delta changes are correctly implemented. No issues identified.

@github-actions github-actions bot deleted a comment from claude bot Feb 17, 2026
@nick-inkeep
Copy link
Collaborator Author

CI Status Note

The CI failure is caused by 4 pre-existing CORS test failures (isOriginAllowed for 127.0.0.1/localhost) introduced in PR #2066. These are not caused by this PR — the last 3 main branch CI runs all fail with the same tests:

  • Run 22109208025 (main, 53d4fe6d): failure
  • Run 22108085178 (main, 18ffaef5): failure
  • Run 22107819410 (main, e88b38f9): failure

Turbo cache improvement signal

On the re-run, turbo cached 42 out of 45 tasks (93%), with "Run CI checks" completing in 6m42s. For comparison, PR #2068 (before these changes) cached only 7/45 tasks (15%) and took 17+ minutes for the same step.

The full impact of the cache cascade fixes will be visible on subsequent PRs, where test-only file changes in agents-core will no longer invalidate all 36 downstream build/lint/typecheck/test tasks.

@nick-inkeep nick-inkeep merged commit 25e4125 into main Feb 17, 2026
6 checks passed
@nick-inkeep nick-inkeep deleted the chore/ci-performance-improvements branch February 17, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments