Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add docs for reporter #106

Merged
merged 6 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/images/terminal-reporter-summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/web-app-reporter-summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@
"scoring/llm",
"scoring/python"
]
},
{
"group": "Reporter",
"pages": [
"reporter/basics"
]
}
],
"footerSocials": {
Expand Down
72 changes: 72 additions & 0 deletions docs/reporter/basics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: 'Basics'
description: 'How to view results of a test'
---

After executing the test, you can access the results through a reporter.

Empirical provides 2 built-in reporter types:
- `terminal` : Displays test summary in the console.
- `webapp` : Launches a web server to view test summary in a browser.

## Terminal reporter
A terminal reporter presents a test summary in the console.
This summary is available immediately after running test samples using the `npx @empiricalrun/cli run` command.

![Terminal reporter](./images/terminal-reporter-summary.png)

The summary includes:
- Table containing statistical summaries for each configured run
- Total number of [dataset](./../dataset/basics) samples
- Duration of the test run
- Errors during the test run, if any

### Statistical summary
The summary table includes statistics for following metrics:
- `output` : Percentage of successful outputs from the run provider. A score lower than 100% means that there were instances where the run provider failed to respond.
- `scorer` : Average score of the configured [scorer](./../scoring/basics) represented in percentage.
- This metric is shown for each configured [scorer](./../scoring/basics) separately by name.


## Webapp reporter
The webapp reporter launches a local web server to view detailed summary of the runs in the browser.

To start the webapp reporter, run the following command after running the test:
```sh
npx @empiricalrun/cli ui
```

Running the above command will open a view on a browser similar to the following
![Web app reporter](./images/web-app-reporter-summary.png)

The webapp reporter view contains:

- `inputs` : A list of all [dataset](./../dataset/basics) test samples
- `runs` : Outputs for each [run](./../configuration) configured
- `statistics` : Statistical summary for each run

The webapp also allows you to:
- Modify [run](./../configuration) and execute it
- Delete a [run](./../configuration)

The webapp interactivity saves you the hassle of repeatedly navigating back to `empiricalrc.json` to configure, run and compare results.
Using webapp, one can seamlessly modify, execute and delete [runs](./../configuration) in real-time, offering a much higher iteration speed and productivity.

### Modifying a run
A [run](./../configuration) configuration can be modified and executed in 2 steps:
<Steps>
<Step title='Click "Show config" button next to the run'>
A run configuration box will open, allowing you to update the prompt and other parameters of the config.
You can update them as per your need.

</Step>
<Step title='Hit "Run"'>
Click the "Run" button to execute the run with the updated parameters.

Post execution, a new run will be added to the table, allowing you to compare the results side-by-side.
</Step>
</Steps>
<video muted controls autoPlay loop playsInline src="https://assets.empirical.run/docs%2Fvideos%2Fedit-config-ui.mp4" />

### Deleting a run
To delete a run, click on the `(-)` button next to the run you want to remove. This will permanently remove the run from the webapp.
12 changes: 6 additions & 6 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
{
"name": "turbo",
"name": "empiricalrun",
"private": true,
"scripts": {
"build": "turbo build",
"changeset": "changeset",
"dev": "npm exec turbowatch turbowatch.ts",
"docs:install": "cd docs && pnpm --ignore-workspace install",
"docs:dev": "cd docs && pnpm dev",
"format": "prettier --write \"**/*.{ts,tsx,md}\"",
"gen:pkg:lib": "turbo gen pkg:lib",
"lint": "turbo lint",
"test": "turbo test",
"test:watch": "turbo run test:watch",
"publish-packages": "changeset publish",
"changeset": "changeset",
"docs:install": "cd docs && pnpm --ignore-workspace install",
"docs:dev": "cd docs && pnpm dev"
"test": "turbo test",
"test:watch": "turbo run test:watch"
},
"devDependencies": {
"@changesets/cli": "^2.27.1",
Expand Down
4 changes: 2 additions & 2 deletions packages/cli/src/bin/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env node
import { green, red, yellow, bold } from "picocolors";
import { green, red, yellow, bold, cyan, underline } from "picocolors";
import { promises as fs } from "fs";
import { program } from "commander";
import cliProgress from "cli-progress";
Expand Down Expand Up @@ -225,7 +225,7 @@ program
});
const fullUrl = `http://localhost:${availablePort}`;
app.listen(availablePort, () => {
console.log(`Empirical app running on ${fullUrl}`);
console.log(cyan(`Empirical app running on ${underline(fullUrl)}`));
opener(fullUrl);
});
});
Expand Down