inkeep · shagun-singh-inkeep · Feb 18, 2026 · Feb 18, 2026 · Feb 18, 2026 · Feb 18, 2026
diff --git a/agents-docs/content/typescript-sdk/evaluations.mdx b/agents-docs/content/typescript-sdk/evaluations.mdx
@@ -0,0 +1,121 @@
+---
+title: Evaluations
+sidebarTitle: Evaluations
+description: Manage evaluators programmatically with the TypeScript SDK
+icon: LuFlaskConical
+keywords: evaluations, evaluators, batch evaluation
+---
+
+The TypeScript SDK provides an **EvaluationClient** that talks to the Evaluations API so you can manage evaluators, evaluation suite configs, trigger batch evaluations, and read results—all from code.
+
+For full endpoint details and request/response shapes, see the [Evaluations API reference](/api-reference/evaluations).
+
+## Setup: create a client
+
+Create an evaluation client with your tenant ID, project ID, API base URL, and optional API key. 
+
+```typescript
+import { EvaluationClient } from "@inkeep/agents-sdk";
+
+const client = new EvaluationClient({
+  tenantId: process.env.INKEEP_TENANT_ID!,
+  projectId: process.env.INKEEP_PROJECT_ID!,
+  apiUrl: "https://api.inkeep.com",
+  apiKey: process.env.INKEEP_API_KEY,
+});
+```
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `tenantId` | string | Yes | Your tenant (organization) ID |
+| `projectId` | string | Yes | Your project ID |
+| `apiUrl` | string | Yes | API base URL (e.g. `https://api.inkeep.com` or your self-hosted URL) |
+| `apiKey` | string | No | Bearer token for authenticated requests. Omit for unauthenticated or custom auth. |
+
+Use `client` in the examples below (e.g. `client.createEvaluator(...)`).
+
+## Evaluators
+
+Evaluators define how to score agent outputs (e.g. with a prompt and model, optional pass criteria).
+
+### Creating an evaluator
+
+Pass an object with `name`, `description`, `prompt`, `schema` (JSON schema for the evaluator output), and `model` (model identifier and optional provider options). Optionally include `passCriteria` to define pass/fail conditions on the schema fields.
+
+```typescript
+const evaluator = await client.createEvaluator({
+  name: "Helpfulness",
+  description: "Scores how helpful the agent response is (0-1)",
+  prompt: `You are an expert evaluator. Score how helpful the assistant's response is to the user on a scale of 0.0 to 1.0.
+Consider clarity, relevance, and completeness. Respond with a JSON object with a "score" field.`,
+  schema: {
+    type: "object",
+    properties: {
+      score: { type: "number", description: "Helpfulness score from 0 to 1" },
+    },
+    required: ["score"],
+  },
+  model: {
+    model: "gpt-4o-mini",
+    providerOptions: {},
+  },
+  passCriteria: {
+    operator: "and",
+    conditions: [{ field: "score", operator: ">=", value: 0.8 }],
+  },
+});
+```
+
+## Evaluation suite configs
+
+Suite configs group evaluators and optional agent filters and sample rates. They are used by **continuous tests** (evaluation run configs) to decide which conversations to evaluate automatically.
+
+### Creating an evaluation suite config
+
+Pass **evaluatorIds** (required, at least one) and optionally **sampleRate** (0–1) and **filters** (e.g. `agentIds` to restrict which agents’ conversations are evaluated). The suite can then be attached to a continuous test (evaluation run config).
+
+```typescript
+const suiteConfig = await client.createEvaluationSuiteConfig({
+  evaluatorIds: ["eval-helpfulness", "eval-accuracy"],
+  sampleRate: 0.1,
+  filters: {
+    agentIds: ["agent-support-bot"],
+  },
+});
+```
+
+| Option | Type | Required | Description |
+|--------|------|----------|-------------|
+| `evaluatorIds` | string[] | Yes | At least one evaluator ID to run in this suite |
+| `sampleRate` | number | No | Fraction of matching conversations to evaluate (0–1). Omit to evaluate all. |
+| `filters` | object | No | Restrict which conversations are in scope, e.g. `{ agentIds: ["agent-id"] }` |
+
+## Batch evaluation
+
+Trigger a one-off batch evaluation over conversations, optionally filtered by conversation IDs or date range:
+
+```typescript
+const result = await client.triggerBatchEvaluation({
+  evaluatorIds: ["eval-1", "eval-2"],
+  name: "Weekly quality check",
+  dateRange: {
+    startDate: "2025-02-01",
+    endDate: "2025-02-07",
+  },
+});
+// result: { message, evaluationJobConfigId, evaluatorIds }
+```
+
+| Option | Type | Required | Description |
+|--------|------|----------|-------------|
+| `evaluatorIds` | string[] | Yes | IDs of evaluators to run |
+| `name` | string | No | Name for the job (defaults to a timestamped name) |
+| `conversationIds` | string[] | No | Limit to these conversations |
+| `dateRange` | Object with `startDate` and `endDate` (YYYY-MM-DD) | No | Limit to conversations in this date range |
+
+To list results by job or run config, use the [Evaluations API](/api-reference/evaluations) (e.g. get evaluation results by job config ID or by run config ID).
+
+## Related
+
+- [Evaluations API reference](/api-reference/evaluations) — Full list of evaluation endpoints and schemas
+- [Visual Builder: Evaluations](/visual-builder/evaluations) — Configure evaluators, batch evaluations, and continuous tests in the UI
diff --git a/agents-docs/content/typescript-sdk/meta.json b/agents-docs/content/typescript-sdk/meta.json
@@ -13,6 +13,7 @@
     "context-fetchers",
     "structured-outputs",
     "data-operations",
+    "evaluations",
     "(observability)",
     "external-agents",
     "workspace-configuration",

diff --git a/agents-docs/content/visual-builder/evaluations.mdx b/agents-docs/content/visual-builder/evaluations.mdx
@@ -0,0 +1,91 @@
+---
+title: Evaluations
+sidebarTitle: Evaluations
+description: Configure evaluators, batch evaluations, and continuous tests in the Visual Builder
+icon: LuFlaskConical
+keywords: evaluations, evaluators, batch evaluations, continuous tests, datasets
+---
+
+The Visual Builder lets you define, manage, and run evaluations. You define evaluators (how to score agents), then run them in two ways: **batch evaluations** (one-time jobs over selected conversations) and **continuous tests** (automatic evaluation on a sample of live conversations).
+
+## Where to find evaluations
+
+1. Open your project in the Visual Builder.
+2. In the project sidebar, go to **Evaluations** for evaluators, batch jobs, and continuous tests.
+
+<Note>
+You need **Edit** permission on the project to create or change evaluators, batch evaluations, and continuous tests. See [Access control](/visual-builder/access-control) for roles and permissions.
+</Note>
+
+## Evaluators
+
+Evaluators define how agent responses are scored. Each evaluator has a **prompt**, a **schema**, and optionally a **pass criteria** to produce a score or structured output.
+
+### Creating an evaluator
+
+1. Go to **Evaluations** and open the **Evaluators** tab.
+2. Click **New evaluator**.
+3. Fill in:
+   - **Name** and optional **Description**
+   - **Prompt** — instructions for the model (e.g. what to score and how)
+   - **Schema** — JSON schema for structured output (e.g. numeric score, categories)
+   - **Model** — the model used to run the evaluator
+   - **Pass criteria** (optional) — conditions on numeric schema fields that define pass/fail (e.g. `score >= 0.8`)
+
+4. Save. The evaluator is then available for batch evaluations and continuous tests.
+
+### Example Evaluator
+
+<Image
+  src="/images/evaluator-example.png"
+  alt="Evaluator form in the Visual Builder showing name, prompt, schema, model, and pass criteria fields"
+/>
+
+### Editing or deleting
+
+From the Evaluators list, open an evaluator to view or edit it, or use the delete action.
+
+## Batch evaluations
+
+Batch evaluations run selected evaluators over a set of conversations once. You choose which evaluators to run and over what date range.
+
+### Creating a batch evaluation
+
+1. Go to **Evaluations** and open the **Batch Evaluations** tab.
+2. Click **New batch evaluation**.
+3. Select one or more **Evaluators**.
+4. Narrow the scope by **Date range** — only conversations within that range
+5. Start the job. A new batch evaluation job is created and runs asynchronously.
+
+### Viewing results
+
+From the Batch Evaluations list, open a job to see its **results**: per-conversation evaluation outputs, pass/fail if pass criteria are set, and status. You can filter and inspect individual results.
+
+## Continuous tests
+
+Continuous tests evaluate a sample of **live** conversations automatically. You specify which evaluators to run, which agents (optional), and a **sample rate** (e.g. 10% of conversations).
+
+### Creating a continuous test
+
+1. Go to **Evaluations** and open the **Continuous Tests** tab.
+2. Click **New continuous test**.
+3. Set **Name** and optional **Description**.
+4. The config is **Active** so it runs on new conversations.
+5. Choose **Evaluators** to run.
+6. Optionally restrict by **Agents** (only evaluate conversations for those agents).
+7. Set **Sample rate** (0–1) to evaluate a fraction of matching conversations.
+8. Save. Once active, matching conversations will be evaluated according to the sample rate.
+
+### Viewing results
+
+From the Continuous Tests list, open a config to see **Results** for that run config: all evaluation results triggered by that continuous test, with filters.
+
+## Summary
+
+| Area | Purpose |
+|------|---------|
+| **Evaluators** | Define how to score agent outputs (prompt, model, schema, pass criteria). |
+| **Batch evaluations** | Run evaluators once over a scoped set of conversations (date range). |
+| **Continuous tests** | Automatically run evaluators on a sample of live conversations. |
+
+For programmatic access to the same concepts, see [TypeScript SDK: Evaluations](/typescript-sdk/evaluations) and the [Evaluations API reference](/api-reference/evaluations).
diff --git a/agents-docs/content/visual-builder/meta.json b/agents-docs/content/visual-builder/meta.json
@@ -7,6 +7,7 @@
     "context-fetchers",
     "access-control",
     "skills",
+    "evaluations",
     "..."
   ]
 }
diff --git a/agents-docs/public/images/evaluator-example.png b/agents-docs/public/images/evaluator-example.png