Skip to content

Evaluating OpenAI Agents #1752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 28, 2025

Conversation

jannikmaierhoefer
Copy link
Contributor

Summary

This PR adds a guide on evaluating OpenAI agents using Langfuse.

Motivation

This cookbook guides users through the typical evaluation process involved in developing AI agents using the open-source tool Langfuse.

It shows how to perform offline evaluation by looping over a dataset and iterating agent metrics (e.g. model, search tool, etc.) It also explains how to do online evaluation, i.e., assessing metrics like costs and latency in a live production environment.


For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

@jannikmaierhoefer jannikmaierhoefer changed the title Add evaluation cookbook Evaluating OpenAI Agents Mar 31, 2025
@jannikmaierhoefer
Copy link
Contributor Author

Hi @lspacagna-oai what do you think about this addition? Let us know if you have any feedback :)

Copy link
Contributor

@vishnu-oai vishnu-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving for @lspacagna-oai

@lspacagna-oai lspacagna-oai merged commit e1d2bc0 into openai:main May 28, 2025
1 check passed
@lspacagna-oai
Copy link
Contributor

Hi @lspacagna-oai what do you think about this addition? Let us know if you have any feedback :)

Thanks for contributing, apologies for the delay in reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants