-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty Generations / Failing Reproducing 40% on HumanEval #148
Comments
Can you try again using the framework we used for evaluation: https://github.com/bigcode-project/bigcode-evaluation-harness there's an argument for adding a prefix. In your code it's not clear if you stripped the prompts or not (impacts performance), we also use more stop words |
I'm having a similar problem - lots of empty generations on a straightforward prompt from HumanEval. For example, this code:
Just generates this output:
|
Hi, this prompt is not stripped you need to remove the trailing |
Hi all, I've set up Starcoder as follows:
The stop tokens I'm using are a subset of those found in the Codex paper:
STOP_SEQS = ["\nclass", "\ndef"]
.Somehow, it looks like I'm consistently getting empty generations however -- just an EOS token. Concretely, around ~20% of my generations are empty on HumanEval.
I'm using the suggested prompt as well, i.e.
"<filename>solutions/solution_1.py\n# Here is the correct implementation of the code exercise\n"
.I'm getting around 15% on HumanEval, not 40% as stated in the paper. I'm setting
TEMP = 0.2
andNEW_TOKENS=128
. Would somebody be able to point out what might be going wrong?The text was updated successfully, but these errors were encountered: