-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip trailing whitespace from prompt file #80
Strip trailing whitespace from prompt file #80
Conversation
Many/most text editors save with trailing whitespace.
Bystander question - does the trailing whitespace additional introduce tokens? |
Maybe just remove a trailing |
@bengarney New lines do add tokens
The one limitation I see here is that you cannot intentionally add trailing newlines at the end of a file (to introduce a new paragraph). I could solve that by only stripping a single trailing '\n' or '\r\n' (the one usually added by the editor) and leaving any additional whitespace untouched. |
I don't think this is a good change. Machine Learning models should not teach people how to use text editors. What if I want to keep whitespace in a prompt? This change makes it impossible. And it's trivial to create a text file with no line endings at the end: |
Usually single spaces do not add tokens because the space is inside a lot of tokens already. If I add a lot of spaces after "Building", with I have:
|
OK, I almost feel like you're being smart here. Once a prompt is long enough (several sentences or multiple lines), I'm not going to want to edit it straight on the CLI just to avoid the unintended line break. This isn't a polished consumer product, but a tiny quality of life feature seems justifiable. @leszekhanusz Good point, hadn't thought of that (some tokenizers truly do ignore whitespace). Though trailing whitespace at the end of a text file is still usually not intended. Maybe a |
I decided to drop the trailing new line from file prompts: 70f01cb |
Many/most text editors save with trailing whitespace.