Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: for configuration file, minor fixes elsewhere #95

Merged
merged 3 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# empirical.run
# Empirical

[![npm](https://img.shields.io/npm/v/@empiricalrun/cli)](https://npmjs.com/package/@empiricalrun/cli)
[![Discord](https://img.shields.io/badge/discord-empirical.run-blue?logo=discord&logoColor=white&color=5d68e8)](https://discord.gg/NeR6jj8dw9)
Expand All @@ -19,6 +19,8 @@ With Empirical, you can:

## Usage

[See quick start on docs →](https://docs.empirical.run/quickstart)

Empirical bundles together a CLI and a web app. The CLI handles running tests and
the web app visualizes results.

Expand All @@ -28,16 +30,22 @@ Everything runs locally, with a JSON configuration file, `empiricalrc.json`.

### Start with a basic example

This example converts incoming unstructured user messages into structured JSON objects
using an LLM.
In this example, we will ask an LLM to parse user messages to extract entities and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become `"{name: 'Alice', location: 'Maryland'}"`.

Our test will succeed if the model outputs valid JSON.

1. Use the CLI to create a sample configuration file called `empiricalrc.json`.

```sh
npx @empiricalrun/cli init
cat empiricalrc.json
```

2. Run the test samples against the models with the `run` command.
2. Run the test samples against the models with the `run` command. This step requires
the `OPENAI_API_KEY` environment variable to authenticate with OpenAI. This
execution will cost $0.0026, based on the selected models.

```sh
npx @empiricalrun/cli run
Expand Down
56 changes: 56 additions & 0 deletions docs/configuration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: 'Configuration file'
description: 'Use a JSON file to configure your tests'
---

Empirical uses a JSON configuration file, called `empiricalrc.json`, to describe
the test to run. This configuration is declarative, in the sense that you define what you
want to test, and Empirical will internally implement the expected behavior.

## Configuration reference

The `empiricalrc.json` configuration file has two high-level properties:

- `runs`: Use this to define [model providers](./models/basics) and [scoring functions](./scoring/basics)
- `dataset`: Use this to define the [scenarios to test](./dataset/basics)

## Code editor integration

Your code editor can give you auto-completions and detect linting errors for this configuration
file. This uses a [JSON Schema](https://json-schema.org/) definition which is hosted by Empirical.

There are two ways to configure the schema definition.

### `$schema` property

Use the `$schema` property in the configuration file to specify the JSON schema URL.

```json empiricalrc.json
{
"$schema": "https://assets.empirical.run/config/schema/v1.14.json",
"runs": [
// ...
],
"dataset": {
// ...
}
}
```

### Visual Studio Code

Add the `json.schemas` property to your VS Code configuration (user or workspace). This maps
the `empiricalrc.json` file to use the JSON schema.

```json settings.json
{
"json.schemas": [
{
"fileMatch": [
"empiricalrc.json"
],
"url": "https://assets.empirical.run/config/schema/v1.14.json"
}
]
}
```
1 change: 1 addition & 0 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"introduction",
"quickstart",
"examples",
"configuration",
"running-in-ci"
]
},
Expand Down
30 changes: 19 additions & 11 deletions docs/models/basics.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ title: 'Basics'
description: 'Choose model providers to test with'
---

Empirical tests how different models and model configurations work for your
application. Choose the models and configurations by defining the configuration
for model providers.
Empirical can test how different models and model configurations work for your
application. You can define which models and configurations to test in the
[configuration file](../configuration).

Empirical supports two types of model providers:

Expand All @@ -18,12 +18,12 @@ The rest of this doc focuses on the `model` type.

## Run configuration for LLMs

Specify the `provider`, `model` and `prompt` keys to configure this. See below
for supported providers.
To test an LLM, specify the following properties in the configuration:

Use placeholders in the prompt (like `{{user_name}}`) to replace the placeholder with the
actual value from the sample inputs. See [dataset](../dataset/basics) to learn more about
sample inputs.
- `provider`: Name of the inference provider (e.g. `openai`, or other [supported providers](#supported-providers))
- `model`: Name of the model (e.g. `gpt-3.5-turbo` or `claude-3-haiku`)
- `prompt`: Prompt sent to the model, with optional [placeholders](#placeholders)
- `name` [optional]: A name or label for this run (auto-generated if not specified)

You can configure as many model providers as you like. These models will be shown in a
side-by-side comparison view in the web reporter.
Expand All @@ -35,10 +35,18 @@ side-by-side comparison view in the web reporter.
"provider": "openai",
"model": "gpt-3.5-turbo",
"prompt": "Hey I'm {{user_name}}"
},
}
]
```

### Placeholders

Define placeholders in the prompt with Handlebars syntax (like `{{user_name}}`) to inject values
from the dataset sample. These placeholders will be replaced with the corresponding input value
during execution.

See [dataset](../dataset/basics) to learn more about sample inputs.

## Supported providers

| Provider | Description |
Expand Down Expand Up @@ -82,7 +90,7 @@ You can add other parameters or override this behavior with [passthrough](#passt
"parameters": {
"temperature": 0.1
}
},
}
]
```

Expand All @@ -104,7 +112,7 @@ For example, Mistral models support a `safePrompt` parameter for [guardrailing](
"temperature": 0.1,
"safePrompt": true
}
},
}
]
```

Expand Down
107 changes: 101 additions & 6 deletions docs/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,30 +8,125 @@ the web app visualizes results.

Everything runs locally, with a JSON configuration file, `empiricalrc.json`.

Required: [Node.js](https://nodejs.org/en) 20+ needs to be installed on your system.
Required: Node.js 20+ needs to be installed on your system.

## Start with a basic example

This example converts incoming unstructured user messages into structured JSON objects
using an LLM.
In this example, we will ask an LLM to parse user messages to extract entities and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become `"{name: 'Alice', location: 'Maryland'}"`.

1. Use the CLI to create a sample configuration file called `empiricalrc.json`.
Our test will succeed if the model outputs valid JSON.

<Steps>
<Step title="Set up Empirical">
Use the CLI to create a sample configuration file in `empiricalrc.json`.

```sh
npx @empiricalrun/cli init
```

2. Run the test samples against the models with the `run` command.
Read the file to see the configured models and dataset samples that we will test
for. The default configuration uses models from OpenAI.

```sh
cat empiricalrc.json
```
</Step>

<Step title="Run the test">
Run the test samples against the models with the `run` command.

```sh
npx @empiricalrun/cli run
```

3. Use the `ui` command to open the reporter web app in your web browser and see side-by-side results.
This step requires the `OPENAI_API_KEY` environment variable to authenticate with
OpenAI. This execution will cost $0.0026, based on the selected models.
</Step>

<Step title="See results">
Use the `ui` command to open the reporter web app in your web browser and see
side-by-side results.

```sh
npx @empiricalrun/cli ui
```
</Step>

<Step title="[Bonus] Fix GPT-4 Turbo">
GPT-4 Turbo tends to fail our JSON syntax check, because it returns outputs
in markdown syntax (with backticks ` ```json `). We can fix this behavior by enabling
[JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode).

```json
{
"model": "gpt-4-turbo-preview",
// ...
// Existing properties
"parameters": {
"response_format": {
"type": "json_object"
}
}
}
```

<Accordion title="empiricalrc.json: Updated with JSON mode">
```json empiricalrc.json
{
"runs": [
{
"type": "model",
"provider": "openai",
"model": "gpt-3.5-turbo",
"prompt": "Extract the name, age and location from the message, and respond with a JSON object. If an entity is missing, respond with null.\n\nMessage: {{user_message}}",
"scorers": [
{
"type": "is-json"
}
]
},
{
"type": "model",
"provider": "openai",
"model": "gpt-4-turbo-preview",
"parameters": {
"response_format": {
"type": "json_object"
}
},
"prompt": "Extract the name, age and location from the message, and respond with a JSON object. If an entity is missing, respond with null.\n\nMessage: {{user_message}}",
"scorers": [
{
"type": "is-json"
}
]
}
],
"dataset": {
"samples": [
{
"inputs": {
"user_message": "Hi my name is John Doe. I'm 26 years old and I work in real estate."
}
},
{
"inputs": {
"user_message": "This is Alice. I am a nurse from Maryland. I was born in 1990."
}
}
]
}
}
```
</Accordion>

Re-running the test with `npx @empiricalrun/cli run` will give us better results
for GPT-4 Turbo.
</Step>
</Steps>


## Make it yours

Expand Down
4 changes: 2 additions & 2 deletions docs/running-in-ci.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 'Running in CI'
description: 'Automate test execution and reporting in your CI pipeline'
title: 'Run in GitHub Actions'
description: 'Automate continuous testing and reporting with CI'
---

The Empirical CLI is optimized to run in CI/CD environments. This enables your team to
Expand Down
2 changes: 0 additions & 2 deletions examples/basic/empiricalrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
"runs": [
{
"type": "model",
"name": "gpt-3.5-turbo run",
"provider": "openai",
"model": "gpt-3.5-turbo",
"prompt": "Extract the name, age and location from the message, and respond with a JSON object. If an entity is missing, respond with null.\n\nMessage: {{user_message}}",
Expand All @@ -15,7 +14,6 @@
},
{
"type": "model",
"name": "gpt-4-turbo-preview run",
"provider": "openai",
"model": "gpt-4-turbo-preview",
"prompt": "Extract the name, age and location from the message, and respond with a JSON object. If an entity is missing, respond with null.\n\nMessage: {{user_message}}",
Expand Down
18 changes: 13 additions & 5 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# empirical.run CLI
# Empirical CLI

[![npm](https://img.shields.io/npm/v/@empiricalrun/cli)](https://npmjs.com/package/@empiricalrun/cli)
[![Discord](https://dcbadge.vercel.app/api/server/NeR6jj8dw9?style=flat&compact=true)](https://discord.gg/NeR6jj8dw9)
[![Discord](https://img.shields.io/badge/discord-empirical.run-blue?logo=discord&logoColor=white&color=5d68e8)](https://discord.gg/NeR6jj8dw9)

Empirical is the fastest way to test different LLMs, prompts and other model configurations, across all the scenarios
that matter for your application.
Expand All @@ -19,6 +19,8 @@ With Empirical, you can:

## Usage

[See quick start on docs →](https://docs.empirical.run/quickstart)

Empirical bundles together a CLI and a web app. The CLI handles running tests and
the web app visualizes results.

Expand All @@ -28,16 +30,22 @@ Everything runs locally, with a JSON configuration file, `empiricalrc.json`.

### Start with a basic example

This example converts incoming unstructured user messages into structured JSON objects
using an LLM.
In this example, we will ask an LLM to parse user messages to extract entities and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become `"{name: 'Alice', location: 'Maryland'}"`.

Our test will succeed if the model outputs valid JSON.

1. Use the CLI to create a sample configuration file called `empiricalrc.json`.

```sh
npx @empiricalrun/cli init
cat empiricalrc.json
```

2. Run the test samples against the models with the `run` command.
2. Run the test samples against the models with the `run` command. This step requires
the `OPENAI_API_KEY` environment variable to authenticate with OpenAI. This
execution will cost $0.0026, based on the selected models.

```sh
npx @empiricalrun/cli run
Expand Down
2 changes: 0 additions & 2 deletions packages/cli/src/runs/config/defaults/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ export const config: RunsConfig = {
runs: [
{
type: "model",
name: "gpt-3.5-turbo run",
provider: "openai",
model: "gpt-3.5-turbo",
prompt:
Expand All @@ -17,7 +16,6 @@ export const config: RunsConfig = {
},
{
type: "model",
name: "gpt-4-turbo-preview run",
provider: "openai",
model: "gpt-4-turbo-preview",
prompt:
Expand Down