-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog #271
Blog #271
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
--- | ||
title: "Launch Week #1, Day 5. Online evaluations" | ||
date: "2024-12-06" | ||
description: "Online evaluations are a way to monitor and assess LLM behavior in real-time" | ||
author: | ||
name: Robert Kim | ||
url: https://x.com/skull8888888888 | ||
image: /blog/2024-12-06-online-evals.jpg | ||
tags: ["online evaluations"] | ||
--- | ||
|
||
At Laminar, we're excited to announce our newest feature: Online Evaluations. This feature allows engineering teams to run custom evaluators, either LLM-based or Python-based, on their LLM calls as they happen in production. | ||
|
||
## What are Online Evaluations? | ||
Online evaluations run automated checks and produce labels on your LLM calls as they happen in production. Instead of collecting data for post-hoc analysis, Laminar automatically evaluates each model call in real-time by analyzing the inputs and outputs of your LLM spans. | ||
|
||
## Why We Built It | ||
When you have thousands of LLM calls happening every day, it's hard to know if your LLMs are behaving as expected. Online evaluations allow you to monitor the quality of your LLMs in real-time, collect performance statistics, and detect issues before they impact users. | ||
|
||
## How It Works | ||
Laminar's online evaluations system is built around three core concepts: | ||
|
||
### 1. Span Paths | ||
|
||
Span paths uniquely identify where LLM calls happen in your code. They're automatically constructed from the location of the call, making it easy to track specific functions and endpoints. | ||
|
||
### 2. Span Labels | ||
|
||
Labels are values attached to spans that indicate evaluation results. | ||
|
||
### 3. Evaluators | ||
|
||
Evaluators analyze inputs and outputs to generate labels. Laminar supports two types of evaluators: | ||
|
||
- LLM-based evaluators | ||
- Python-based evaluators | ||
|
||
|
||
## Setting Up Evaluations | ||
![Setting up evaluations](/blog/2024-12-06-online-evals-example.png) | ||
|
||
Getting started with Laminar's online evaluations is straightforward: | ||
|
||
1. Navigate to "Traces" in your Laminar dashboard | ||
2. Select the span you want to evaluate | ||
3. Click "Add Label" and create or choose a label class | ||
4. Configure your evaluator: | ||
|
||
- Choose between Python code or LLM-based evaluation | ||
- Test your evaluator directly in the UI | ||
5. Save and enable for production | ||
|
||
Once enabled, Laminar will automatically run your evaluator on your LLM calls and attach labels to the spans. This label will be marked as `AUTO` in the dashboard. | ||
|
||
![Evaluations in action](/blog/2024-12-06-online-evals-test-label.png) | ||
|
||
## Best Practices | ||
|
||
Start Simple | ||
|
||
- Begin with basic format and content checks | ||
- Add more sophisticated evaluations gradually | ||
- Monitor evaluator performance impact | ||
|
||
|
||
Layer Your Checks | ||
|
||
- Technical validation (format, structure) | ||
- Content validation (completeness, relevance) | ||
- Quality metrics (coherence, accuracy) | ||
|
||
Monitor Results | ||
|
||
- Track evaluation trends over time | ||
- Regularly review and refine criteria | ||
|
||
|
||
## Conclusion | ||
Online evaluations represent a significant step forward in LLM operations, bringing immediate quality feedback to production systems. With Laminar's implementation, teams can maintain high standards while gathering valuable insights about their models' behavior. | ||
Try out online evaluations today and let us know what you think! Check out our [documentation](https://docs.lmnr.ai/evaluations/online-evaluations) for detailed setup instructions and best practices. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,17 +14,18 @@ interface BlogMetaProps { | |
|
||
export default function BlogMeta({ data }: BlogMetaProps) { | ||
return ( | ||
<div className="flex flex-col space-y-1 items-start"> | ||
<h1 className="text-5xl font-bold py-2">{data.title}</h1> | ||
{/* <p className="text-secondary-foreground">{data.description}</p> */} | ||
<p className="text-secondary-foreground"> {formatUTCDate(data.date)} </p> | ||
{data.author.url | ||
? <Label className="text-secondary-foreground hover:text-primary"><Link href={data.author.url}>{data.author.name}</Link></Label> | ||
: <Label className="text-secondary-foreground">{data.author.name}</Label> | ||
} | ||
<div className="flex flex-col gap-8 items-center"> | ||
<div className="flex flex-col w-full md:w-[700px] gap-4"> | ||
<h1 className="text-5xl font-bold">{data.title}</h1> | ||
<p className="text-secondary-foreground"> {formatUTCDate(data.date)} </p> | ||
{data.author.url | ||
? <Label className="text-secondary-foreground hover:text-primary"><Link href={data.author.url}>{data.author.name}</Link></Label> | ||
: <Label className="text-secondary-foreground">{data.author.name}</Label> | ||
} | ||
</div> | ||
{data.image && | ||
<div className="w-full flex items-center py-4"> | ||
<Image src={data.image} alt={data.title} width={1200} height={800} /> | ||
<div className="w-full flex rounded overflow-hidden"> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. while we're at it, do we need border here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no |
||
<Image src={data.image} alt={data.title} width={1000} height={800} /> | ||
</div> | ||
} | ||
</div> | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe a doc link to https://docs.lmnr.ai/tracing/structure#grouping-spans-into-traces
But not necessary, those docs don't specialize on path