replication of part of the huggingface blog https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
Since the details of grading implementation in the blog is not enough to reproduce the results in the blog, i adapted the grading code in the https://github.com/openai/prm800k
Replication of math-psa (https://huggingface.co/openreasoner/Math-psa/tree/main)
Using "last" as the aggregation method: