openai / evals Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 17.4k

Code
Issues 111
Pull requests 51
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: openai/evals

Labels 10 Milestones 0

New pull request New

51 Open 1,249 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add tnengoy_citations.dev.v0 (model-graded factuality eval)

#1603 opened Oct 12, 2025 by TheodorNEngoy

Loading…

Update custom-eval.md

#1598 opened Aug 19, 2025 by rajeshkp

Loading…

13 tasks

Add context poisoning quarantine evaluation for testing prompt injection vulnerabilities

#1597 opened Aug 16, 2025 by jscaldwell55

Loading…

Fix typos

#1585 opened May 30, 2025 by GameRoMan

Loading…

Fix AttributeError: Update OpenAI error imports (Closes #1564)

#1577 opened Jan 27, 2025 by SaiKrishna-KK

Loading…

6 of 13 tasks

Update completion-fn-protocol.md

#1575 opened Jan 18, 2025 by NinoRisteski

Loading…

13 tasks

Fix TypeError in add_token_usage_to_result when non-integer usage data is present

#1574 opened Jan 4, 2025 by masihmoloodian

Loading…

Ice linguistic benchmark

#1561 opened Oct 1, 2024 by bjarkiarmanns

Loading…

1 task

Add support for new models (gpt-4o, o1-preview and o1-mini)

#1558 opened Sep 15, 2024 by sakher

Loading…

Bugfixing completion stats break with new reasoning tokens release

#1555 opened Sep 13, 2024 by lucapericlp

Loading…

anthropic_solver.py

#1554 opened Sep 4, 2024 by iHuydang

Loading…

13 tasks done

Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini

#1551 opened Aug 25, 2024 by RobinWitch

Loading…

13 tasks done

Fix the is_chat_model function to work with gpt-4o

#1550 opened Aug 22, 2024 by LoryPack

Loading…

3 tasks done

Added Icelandic QA evaluation data from news texts

#1548 opened Aug 20, 2024 by thorunna

Loading…

12 of 13 tasks

Added Icelandic QA evaluation data from Wikipedia

#1547 opened Aug 20, 2024 by thorunna

Loading…

12 of 13 tasks

Updating make-me-say to be compatible with Solvers

#1546 opened Aug 18, 2024 by lennart-finke

Loading…

1 task done

Fix Information exposure alert through an exception #1543

#1545 opened Aug 8, 2024 by arpitjain099

Loading…

13 tasks done

Fix log injection error

#1544 opened Aug 8, 2024 by arpitjain099

Loading…

13 tasks done

Remove global OpenAI client initialization

#1539 opened Jul 21, 2024 by michaelAlvarino

Loading…

Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers

#1537 opened Jun 24, 2024 by sakher

Loading…

Fix problematic sample in Schelling Point

#1534 opened May 22, 2024 by JunShern

Loading…

Update README: Add Langtrace as an Eval vendor

#1531 opened May 21, 2024 by karthikscale3

Loading…

5 of 13 tasks

Add support for gpt-4o

#1530 opened May 16, 2024 by androettop

Loading…

show evals in wandb weave

#1522 opened Apr 19, 2024 by yogeshg • Draft

13 tasks

Added Quran Eval & Simple Fact Model-Graded Definition

#1511 opened Apr 1, 2024 by sakher

Loading…

13 tasks done

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!