Skip to content

πŸ”§ Compare how Agent systems perform on several benchmarks. πŸ“ŠπŸš€

License

Notifications You must be signed in to change notification settings

aymeric-roucher/agent_reasoning_benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Benchmark agent workflows: try the models of your choice on the framework that you want

This repo is the engine for the evaluations displayed in our Agents v2.0 announcement post.

You can use it to test agents on different frameworks:

On different benchmarks:

And with different models (cf benchmark below).

We also implement LLM-judge evaluation, with parallel processing for faster results.

benchmark

About

πŸ”§ Compare how Agent systems perform on several benchmarks. πŸ“ŠπŸš€

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published