evaluation docs by shagun-singh-inkeep · Pull Request #2123 · inkeep/agents

shagun-singh-inkeep · 2026-02-18T16:07:51Z

No description provided.

vercel · 2026-02-18T16:07:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Feb 18, 2026 4:51pm
agents-docs	Ready	Preview, Comment	Feb 18, 2026 4:51pm
agents-manage-ui	Ready	Preview, Comment	Feb 18, 2026 4:51pm

changeset-bot · 2026-02-18T16:07:59Z

⚠️ No Changeset found

Latest commit: f398b46

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

claude

PR Review Summary

(4) Total Issues | Risk: Medium

🟠⚠️ Major (2) 🟠⚠️

🟠 1) agents-docs/content/typescript-sdk/evaluations.mdx:9-11 Missing EvaluationClient initialization example

Issue: The documentation references client.createEvaluator(), client.createEvaluationSuiteConfig(), and client.triggerBatchEvaluation() but never shows how to instantiate the EvaluationClient. Users copying the code examples will encounter undefined client errors.

Why: This is a blocking usability issue. Developers cannot use any of the documented examples without first knowing how to create an EvaluationClient instance, what parameters it requires (tenantId, projectId, apiUrl, apiKey), and where to import it from.

Fix: Add a Quick Start or Setup section at the beginning of the page showing how to instantiate the client:

import { EvaluationClient } from '@inkeep/agents-sdk';

const client = new EvaluationClient({
  tenantId: 'YOUR_TENANT_ID',
  projectId: 'YOUR_PROJECT_ID',
  apiUrl: 'https://api.inkeep.com', // or your self-hosted URL
  apiKey: process.env.INKEEP_API_KEY,
});

Refs:

EvaluationClient constructor — shows required config params
write-docs skill — Reference Pattern — recommends Quick Start section

🟠 2) agents-docs/content/typescript-sdk/evaluations.mdx:4-9 Datasets mentioned but not documented

Issue: The frontmatter description says "Manage datasets, evaluators, and run evaluations" and line 9 mentions "manage datasets... trigger batch evaluations or dataset runs" — but the page contains no documentation for dataset operations. The EvaluationClient has extensive dataset methods (createDataset, listDatasets, createDatasetItem, triggerDatasetRun) that are completely undocumented.

Why: Users searching for dataset documentation will find this page via the keywords/description but won't find what they need. This creates a mismatch between what the page promises and what it delivers.

Fix: Either:

Add a Datasets section documenting createDataset, createDatasetItem, triggerDatasetRun, etc., OR
Remove datasets from the description/keywords if intentionally out of scope:
- Update description to: "Manage evaluators and run evaluations programmatically with the TypeScript SDK"
- Remove "datasets" and "dataset runs" from keywords

Refs:

EvaluationClient dataset methods — extensive dataset API
write-docs skill — Philosophy — "Accuracy over speed"

🟡 Minor (2) 🟡

🟡 1) multi-file Terminology inconsistency: "suite config" vs "continuous test"

files: agents-docs/content/typescript-sdk/evaluations.mdx, agents-docs/content/visual-builder/evaluations.mdx

Issue: The SDK docs introduce "evaluation suite configs" while the Visual Builder docs call the same concept "continuous tests". The SDK docs attempt to bridge with "(evaluation run configs)" parenthetical but this conflates two different API entities (suite configs vs run configs are separate resources).

Why: Customers reading both docs will be confused about the relationship between these terms. The Visual Builder UI uses "Continuous Tests" tab label, so users of that interface won't recognize "evaluation suite config" terminology.

Fix: Establish clearer terminology mapping:

Add a brief callout in the SDK docs explaining: "In the Visual Builder UI, Evaluation Suite Configs are the underlying configuration for what the UI calls 'Continuous Tests'."
Or use consistent user-facing terminology across both docs.

Refs:

🟡 2) agents-docs/content/typescript-sdk/evaluations.mdx:8-9 Missing Overview section header

Issue: The page jumps directly into introductory prose without an ## Overview heading. This is inconsistent with peer TypeScript SDK documentation pages.

Why: Peer SDK files (skills.mdx, context-fetchers.mdx, memory.mdx, external-agents.mdx) all begin with an ## Overview section after frontmatter. Consistent structure aids navigation and scannability.

Fix: Add ## Overview heading before the introductory paragraph. Same applies to the Visual Builder evaluations page.

Refs:

typescript-sdk/skills.mdx:8-20 — peer file with Overview section

Inline Comments:

🟡 Minor: evaluations.mdx:12 Missing blank line before heading
🟡 Minor: evaluations.mdx:66 Double space after period
💭 Consider: evaluations.mdx:89 Remove "via suite configs" implementation detail

🚫 REQUEST CHANGES

Summary: Good documentation foundation for the Evaluations feature. The two blocking issues are: (1) users cannot run any examples without knowing how to instantiate EvaluationClient, and (2) the page promises dataset documentation that doesn't exist. Address these and the terminology clarity, and this will be ready to ship.

Discarded (4)

Location	Issue	Reason Discarded
`evaluations.mdx:3`	Redundant sidebarTitle (matches title)	LOW confidence — developer preference, matches title is acceptable
`evaluations.mdx:93`	Related section naming inconsistency	MEDIUM confidence — multiple conventions exist in the codebase, no clear standard
`evaluationSuiteConfig` example	Potentially misleading evaluatorIds handling	VERIFIED ACCURATE — API route does accept and process evaluatorIds
`evaluations.mdx:88-89`	Date range behavior not clarified	Valid but not actionable — would require API investigation to document correctly

Reviewers (3)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-docs`	7	2	0	2	3
`pr-review-product`	5	2	1	0	2
`pr-review-consistency`	7	1	0	1	5
Total	19	5	1	3	10

Note: Some findings were merged across reviewers (e.g., missing client initialization was raised by both docs and product reviewers).

agents-docs/content/typescript-sdk/evaluations.mdx

agents-docs/content/visual-builder/evaluations.mdx

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

claude

PR Review Summary

(2) Total Issues | Risk: Medium

Delta Review

This re-review is scoped to the 3 commits since the last automated review. The delta addressed all 3 inline comment suggestions from the prior review:

Fixed	Issue
✅	Missing blank line before `## Evaluators` heading
✅	Double space after period in continuous tests description
✅	Removed "via suite configs" implementation detail from summary table

No new issues introduced in the delta.

🕐 Pending Recommendations (2)

These Major issues were raised in the prior review and remain unaddressed:

🟠 typescript-sdk/evaluations.mdx:9-11 Missing EvaluationClient initialization example — Users cannot run any code examples without knowing how to instantiate the client (import, required params: tenantId, projectId, apiUrl, apiKey)
🟠 typescript-sdk/evaluations.mdx:4-9 Datasets mentioned but not documented — The frontmatter/intro promises dataset operations but the page contains no dataset documentation. Either add dataset docs or remove the mentions.

🚫 REQUEST CHANGES

Summary: Great progress — all the inline formatting suggestions are now addressed. The two Major issues from the prior review still need attention before this is ready to ship: (1) add a Quick Start section showing how to instantiate EvaluationClient, and (2) either document the dataset operations or remove datasets from the description/keywords. Once those are resolved, this will be good to go! 🎉

Reviewers (0)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
None dispatched	—	—	—	—	—	—	—

Note: Delta was purely formatting fixes — no substantive changes requiring reviewer dispatch.

claude

PR Review Summary

(1) Total Issues | Risk: Medium

Delta Review

This re-review is scoped to the 1 commit since the last automated review. The delta adds a "Setup: create a client" section (24 lines) to agents-docs/content/typescript-sdk/evaluations.mdx.

Status	Issue
✅ Addressed	Missing EvaluationClient initialization example — the new Setup section correctly documents all required parameters (`tenantId`, `projectId`, `apiUrl`) and the optional `apiKey`
✅ Verified	The code example accurately matches the `EvaluationClientConfig` interface in `packages/agents-sdk/src/evaluationClient.ts`
✅ Verified	Parameter table includes correct types and descriptions

No new issues introduced in the delta. The Setup section follows documentation standards with a clear heading, runnable code example using environment variables, and a complete parameter reference table.

🕐 Pending Recommendations (1)

This Major issue was raised in prior reviews and remains unaddressed:

🟠 typescript-sdk/evaluations.mdx:4-9 Datasets mentioned but not documented — The frontmatter description says "Manage datasets, evaluators, and run evaluations" and the intro references "manage datasets... trigger batch evaluations or dataset runs." However, the page contains no dataset operations documentation. The EvaluationClient has extensive dataset methods (createDataset, listDatasets, createDatasetItem, triggerDatasetRun) that remain undocumented. Either add a Datasets section or remove datasets from the description/keywords to accurately reflect the current scope.

🚫 REQUEST CHANGES

Summary: Excellent progress! The delta successfully addresses the client initialization issue — the new Setup section is accurate, well-structured, and follows documentation standards. The one remaining blocker is the datasets discrepancy: the page promises dataset documentation but doesn't deliver it. Once that's resolved (either by adding dataset docs or removing the dataset mentions from frontmatter/intro), this will be ready to ship. 🎉

Discarded (0)

No findings were discarded in this review.

Reviewers (2)

Reviewer	Returned	Pending Recs
`pr-review-docs`	1	0
`pr-review-product`	1	1
Total	2	1

Note: The product reviewer's finding about datasets was already raised in prior reviews and is tracked as a Pending Recommendation rather than a new Main finding.

claude

PR Review Summary

(0) Total Issues | Risk: Low

Delta Review

This re-review is scoped to the 1 commit since the last automated review (f398b466b). The delta updates the frontmatter and intro paragraph in agents-docs/content/typescript-sdk/evaluations.mdx (3 lines changed).

Status	Change
✅ Addressed	Removed "datasets" from description — now accurately says "Manage evaluators programmatically"
✅ Addressed	Removed "datasets" and "dataset runs" from keywords
✅ Addressed	Updated intro paragraph to remove dataset mentions — now says "manage evaluators, evaluation suite configs, trigger batch evaluations"

No new issues introduced in the delta. The documentation now accurately reflects its scope.

Summary of All Addressed Issues

Over the course of this PR's review iterations, all feedback has been incorporated:

Issue	Status	Commit
Missing `EvaluationClient` initialization example	✅ Fixed	`1c55dca`
Missing blank line before heading	✅ Fixed	`65f3b1c`
Double space after period	✅ Fixed	`4d88e8a`
"via suite configs" implementation detail exposure	✅ Fixed	`5d466d2`
Datasets mentioned but not documented	✅ Fixed	`f398b46`

✅ APPROVE

Summary: Excellent work! 🎉 All prior review feedback has been addressed. The documentation is now accurate, well-structured, and ready to ship. The TypeScript SDK and Visual Builder pages provide clear, actionable guidance for users working with evaluations.

Reviewers (0)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
None dispatched	—	—	—	—	—	—	—

Note: Delta was a targeted fix for prior feedback — no substantive changes requiring reviewer dispatch.

github-actions · 2026-02-18T16:50:36Z

🔎💬 Inkeep AI search and chat service is syncing content for source 'Inkeep Agent Framework Docs'

* eval docs * Update agents-docs/content/typescript-sdk/evaluations.mdx Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> * Update agents-docs/content/visual-builder/evaluations.mdx Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> * Update agents-docs/content/visual-builder/evaluations.mdx Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> * claude * claude --------- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

eval docs

ff25add

vercel bot deployed to Preview – agents-api February 18, 2026 16:09 View deployment

vercel bot deployed to Preview – agents-docs February 18, 2026 16:10 View deployment

vercel bot deployed to Preview – agents-manage-ui February 18, 2026 16:11 View deployment

claude bot reviewed Feb 18, 2026

View reviewed changes

agents-docs/content/typescript-sdk/evaluations.mdx Show resolved Hide resolved

agents-docs/content/visual-builder/evaluations.mdx Outdated Show resolved Hide resolved

agents-docs/content/visual-builder/evaluations.mdx Outdated Show resolved Hide resolved

github-actions bot deleted a comment from claude bot Feb 18, 2026

shagun-singh-inkeep and others added 3 commits February 18, 2026 11:22

Update agents-docs/content/typescript-sdk/evaluations.mdx

65f3b1c

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Update agents-docs/content/visual-builder/evaluations.mdx

4d88e8a

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Update agents-docs/content/visual-builder/evaluations.mdx

5d466d2

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

vercel bot deployed to Preview – agents-api February 18, 2026 16:24 View deployment

claude bot reviewed Feb 18, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Feb 18, 2026

vercel bot deployed to Preview – agents-docs February 18, 2026 16:25 View deployment

vercel bot deployed to Preview – agents-manage-ui February 18, 2026 16:26 View deployment

claude

1c55dca

vercel bot deployed to Preview – agents-api February 18, 2026 16:37 View deployment

vercel bot deployed to Preview – agents-manage-ui February 18, 2026 16:39 View deployment

vercel bot deployed to Preview – agents-docs February 18, 2026 16:39 View deployment

claude bot reviewed Feb 18, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Feb 18, 2026

claude

f398b46

vercel bot deployed to Preview – agents-api February 18, 2026 16:49 View deployment

claude bot reviewed Feb 18, 2026

View reviewed changes

vercel bot deployed to Preview – agents-docs February 18, 2026 16:49 View deployment

github-actions bot deleted a comment from claude bot Feb 18, 2026

shagun-singh-inkeep merged commit 54d2295 into main Feb 18, 2026
10 of 11 checks passed

shagun-singh-inkeep deleted the evaluation-docs branch February 18, 2026 16:50

vercel bot deployed to Preview – agents-manage-ui February 18, 2026 16:51 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

evaluation docs#2123

evaluation docs#2123
shagun-singh-inkeep merged 6 commits intomainfrom
evaluation-docs

shagun-singh-inkeep commented Feb 18, 2026

Uh oh!

vercel bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

shagun-singh-inkeep commented Feb 18, 2026

Uh oh!

vercel bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (2) 🟠⚠️

🟡 Minor (2) 🟡

🚫 REQUEST CHANGES

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

Delta Review

🕐 Pending Recommendations (2)

🚫 REQUEST CHANGES

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

Delta Review

🕐 Pending Recommendations (1)

🚫 REQUEST CHANGES

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

Delta Review

Summary of All Addressed Issues

✅ APPROVE

Uh oh!

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 18, 2026 •

edited

Loading

changeset-bot bot commented Feb 18, 2026 •

edited

Loading