-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too Long Problems #22
Comments
Where in the code is this? It's hard to search from pictures. |
generate_gpt_codes.py #176 |
We can make that into a variable that gets passed in and possibly throw a warning when the value less than some threshold like 10. Let me know if you think this would be a good compromise. |
Do you have a proposal then? Some models have a max input size that needs to be respected. You can consider preprocessing your inputs to have less tokens. Open to alternative suggestions. |
I encountered the same problem. Finally, I set truncation=True while encoding, but will this cause other problem? |
I used the data provided by APPS and did not change it. As far as I know, gpt2 has a max input size of 1024 and gpt-neo has a max input size of 2048. But they do not make explicit requirements for the output. If there is really no help for it, we may discard these very long questions. |
Sorry for the confusion. |
There are some long problems in APPS, so I truncated them after encoding. But the output of the model is in the form of "problem + answer", so output is definitely longer than input. Max_length(1024-input_ids) is set for the output. Actually, if output's length needs to meet the requirement, input's length is much less than 1024. Otherwise we won't get a complete answer even if not reporting an error. Is it right? Also, why is max_length of output is set to "1024-inputs"?
The text was updated successfully, but these errors were encountered: