Skip to content

clembench/clembench-runs

Repository files navigation

test-files workflow

Benchmark Runs

Leaderboard available here: Clem Leaderboard

Versions

See CHANGELOG

Supported Models

The list of supported open & closed/commercial models can be found here: model registry

Game-play files

Each model has a separate folder for each game result. The outputs are organised as follows: /model/game/experiment. Each episode under a certain experiment includes the following files:

  • instance.json : info about a certain episode including the prompt text
  • interactions.json: interaction among players and game master
  • requests.json: given inputs and generated outputs for the tested model
  • scores.json: generated scores for the episode and turn level
  • transcript.html: transcript of the dialogue in HTML
  • transcript.tex: transcript of the dialogue in LaTeX

Results files

Each run of the benchmark generates CSV and HTML files for all tested models across all games (results.csv & results.html).

About

The full outputs generated by running the benchmark on different LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9