ComputeScaling-Replication

replication of part of the huggingface blog https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute

Since the details of grading implementation in the blog is not enough to reproduce the results in the blog, i adapted the grading code in the https://github.com/openai/prm800k

Replication of math-psa (https://huggingface.co/openreasoner/Math-psa/tree/main)

Using "last" as the aggregation method:

Using "mean" as the aggregation method:

Using "min" as the aggregation method:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
results_by_category		results_by_category
search		search
.gitignore		.gitignore
README.md		README.md
batch_outputs_merge.py		batch_outputs_merge.py
calculate_metric.py		calculate_metric.py
calculate_metric_by_category.py		calculate_metric_by_category.py
calculate_metric_by_category_multi_in_one.py		calculate_metric_by_category_multi_in_one.py
calculate_metric_by_category_y_aligned.py		calculate_metric_by_category_y_aligned.py
evaluation_in_parallel.sh		evaluation_in_parallel.sh
metric.sh		metric.sh
mmlu_overlap.json		mmlu_overlap.json
results_split_by_category.py		results_split_by_category.py
split.sh		split.sh
storage.sh		storage.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComputeScaling-Replication

About

Releases

Packages

Languages

cychomatica/Inference-Scaling

Folders and files

Latest commit

History

Repository files navigation

ComputeScaling-Replication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages