You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks @dmelcer9! It makes sense :) We didn't think about this when developing the initial tasks. We will incorporate this change in the next dataset release.
EvalPlus version
v0_1_0_hf
Output of running
ls ~/.cache/bigcodebench
BigCodeBench-v0.1.0_hf.jsonl
Task ID of the programming task
BigCodeBench/211, BigCodeBench/215, probably some others as well
The original test
Your proposed new test
Description
The LLM sometimes (reasonably!) generates code like:
But fails the test
Other context
No response
The text was updated successfully, but these errors were encountered: