Skip to content

v0.3.0

Compare
Choose a tag to compare
@arjunattam arjunattam released this 04 Apr 09:28
· 133 commits to main since this release
e674d26

Empirical is a CLI and web UI for developers to test different LLMs, prompts and other model configurations — across all the scenarios that matter. Try it out with the quick start →

This is our first open source release and lays out the core primitives and capabilities of the product

Capabilities

Configuration

  • Empirical has a declarative configuration file for your tests in empiricalrc.json (see an example)
    • Each config has 3 parts: model providers (what to test), datasets (scenarios to test), scorers (measure output quality)

Model providers

  • Empirical can run tests against off-the-shelf LLMs and custom models or apps (e.g. RAG)
    • Off-the-shelf LLMs: Models hosted by OpenAI, Anthropic, Mistral, Google and Fireworks are supported today
    • Custom models or apps: You can write a Python script that behaves as an entry point to run tests against custom models or RAG applications

Datasets

Scorers

Web UI

  • Review results and compare model configurations side-by-side in our web UI

Continuous evaluation in CI

Get in touch

File an issue or join our Discord — we look forward to hearing from you ^_^