Benchmark Runs

Leaderboard available here: Clem Leaderboard

Versions

Supported Models

The list of supported open & closed/commercial models can be found here: model registry

Game-play files

Each model has a separate folder for each game result. The outputs are organised as follows: /model/game/experiment. Each episode under a certain experiment includes the following files:

instance.json : info about a certain episode including the prompt text
interactions.json: interaction among players and game master
requests.json: given inputs and generated outputs for the tested model
scores.json: generated scores for the episode and turn level
transcript.html: transcript of the dialogue in HTML
transcript.tex: transcript of the dialogue in LaTeX

Results files

Each run of the benchmark generates CSV and HTML files for all tested models across all games (results.csv & results.html).

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.github/workflows		.github/workflows
Addenda		Addenda
v0.9		v0.9
v1.0		v1.0
v1.5		v1.5
v1.5_quantized		v1.5_quantized
v1.6.5_ascii		v1.6.5_ascii
v1.6.5_multimodal		v1.6.5_multimodal
v1.6		v1.6
v1.6_backends		v1.6_backends
v1.6_multimodal		v1.6_multimodal
v1.6_quantized		v1.6_quantized
v2.0		v2.0
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
benchmark_runs.json		benchmark_runs.json
calculate_latency.py		calculate_latency.py
calculate_tokens.py		calculate_tokens.py
check_csv_parity.py		check_csv_parity.py
check_scores_files.py		check_scores_files.py
requirements.txt		requirements.txt
test_files.py		test_files.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmark Runs

Leaderboard available here: Clem Leaderboard

Versions

Supported Models

Game-play files

Results files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Uh oh!

License

clembench/clembench-runs

Folders and files

Latest commit

History

Repository files navigation

Benchmark Runs

Leaderboard available here: Clem Leaderboard

Versions

Supported Models

Game-play files

Results files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Packages