Evaluating OpenAI Agents #1752

jannikmaierhoefer · 2025-03-31T13:38:20Z

Summary

This PR adds a guide on evaluating OpenAI agents using Langfuse.

Motivation

This cookbook guides users through the typical evaluation process involved in developing AI agents using the open-source tool Langfuse.

It shows how to perform offline evaluation by looping over a dataset and iterating agent metrics (e.g. model, search tool, etc.) It also explains how to do online evaluation, i.e., assessing metrics like costs and latency in a live production environment.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

jannikmaierhoefer · 2025-04-10T11:47:32Z

Hi @lspacagna-oai what do you think about this addition? Let us know if you have any feedback :)

vishnu-oai

approving for @lspacagna-oai

lspacagna-oai · 2025-05-28T13:52:15Z

Hi @lspacagna-oai what do you think about this addition? Let us know if you have any feedback :)

Thanks for contributing, apologies for the delay in reviewing!

jannikmaierhoefer added 2 commits March 31, 2025 15:23

docs: add cookbook on evaluating openai agents

f9658a9

edit tree and author

0de20a5

jannikmaierhoefer changed the title ~~Add evaluation cookbook~~ Evaluating OpenAI Agents Mar 31, 2025

Merge branch 'main' into add-evaluation-cookbook

5836880

Merge branch 'main' into add-evaluation-cookbook

628ea4a

vishnu-oai approved these changes May 28, 2025

View reviewed changes

lspacagna-oai merged commit e1d2bc0 into openai:main May 28, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluating OpenAI Agents #1752

Evaluating OpenAI Agents #1752

Uh oh!

jannikmaierhoefer commented Mar 31, 2025

Uh oh!

jannikmaierhoefer commented Apr 10, 2025

Uh oh!

vishnu-oai left a comment

Uh oh!

Uh oh!

lspacagna-oai commented May 28, 2025

Uh oh!

Uh oh!

Evaluating OpenAI Agents #1752

Evaluating OpenAI Agents #1752

Uh oh!

Conversation

jannikmaierhoefer commented Mar 31, 2025

Summary

Motivation

For new content

Uh oh!

jannikmaierhoefer commented Apr 10, 2025

Uh oh!

vishnu-oai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lspacagna-oai commented May 28, 2025

Uh oh!

Uh oh!