v0.3.0
arjunattam
released this
04 Apr 09:28
·
133 commits
to main
since this release
Empirical is a CLI and web UI for developers to test different LLMs, prompts and other model configurations — across all the scenarios that matter. Try it out with the quick start →
This is our first open source release and lays out the core primitives and capabilities of the product
Capabilities
Configuration
- Empirical has a declarative configuration file for your tests in
empiricalrc.json
(see an example)- Each config has 3 parts: model providers (what to test), datasets (scenarios to test), scorers (measure output quality)
Model providers
- Empirical can run tests against off-the-shelf LLMs and custom models or apps (e.g. RAG)
- Off-the-shelf LLMs: Models hosted by OpenAI, Anthropic, Mistral, Google and Fireworks are supported today
- Custom models or apps: You can write a Python script that behaves as an entry point to run tests against custom models or RAG applications
Datasets
- Specify scenarios to test as samples in the configuration, or import them from file
Scorers
- Measure the quality of your output with built-in scoring functions, or write your own scorers as LLM prompts or Python functions
Web UI
- Review results and compare model configurations side-by-side in our web UI
- This brings the Empirical playground and comparison pages to your local environment
Continuous evaluation in CI
- Run your tests in GitHub Actions and get results reported as a PR comment
Get in touch
File an issue or join our Discord — we look forward to hearing from you ^_^