Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes infill incorrect tokenization #3508

Merged
merged 15 commits into from
Oct 10, 2023
Merged

fixes infill incorrect tokenization #3508

merged 15 commits into from
Oct 10, 2023

Conversation

vvhg1
Copy link
Contributor

@vvhg1 vvhg1 commented Oct 6, 2023

fixes #3503

@vvhg1 vvhg1 marked this pull request as draft October 6, 2023 17:07
@vvhg1 vvhg1 changed the title fixes infill incorrect tokenization #3503 fixes infill incorrect tokenization Oct 6, 2023
…ing space token from suffix when params.escape
std::vector<llama_token> inp_sfx = ::llama_tokenize(ctx, params.input_suffix, false);
const int space_token = 29871;
if (params.escape && inp_sfx.size() > 1 && inp_sfx[0] == space_token) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we always want to remove the leading space, regardless if the params.escape is set or not:

Suggested change
if (params.escape && inp_sfx.size() > 1 && inp_sfx[0] == space_token) {
if (inp_sfx.size() > 1 && inp_sfx[0] == space_token) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the params.escape check as only then we add an extra space to the suffix.

Comment on lines +236 to +240
bool suff_rm_leading_spc = params.escape;
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) {
params.input_suffix.erase(0, 1);
suff_rm_leading_spc = false;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I could be missing something, but this whole section should not be needed.

If you keep everything else and just replace this block with:

Suggested change
bool suff_rm_leading_spc = params.escape;
if (suff_rm_leading_spc && params.input_suffix.find_first_of(" ") == 0 && params.input_suffix.size() > 1) {
params.input_suffix.erase(0, 1);
suff_rm_leading_spc = false;
}
const bool suff_rm_leading_spc = true;

Can you find a case that does not match the Python version?

Copy link
Contributor Author

@vvhg1 vvhg1 Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can find cases where it misbehaves:
with params.escape, if the suffix starts with a space, the tokenizer adds a space and THEN tokenizes the double space to 259 so we can't remove the surplus space anymore but would have to "translate" the 259 to 29871 to remove the space, same with more than one leading spaces...
without params.escape if the suffix starts with a space, we remove it so it does not reflect the original suffix

not sure what the python version does as I can't test it, but the above behaviour seems wrong

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. Let's double check, just in case:
@kherud Could you please run the Python codellama with suffix with 2 leading spaces and see what tokens are produced for infill?

Copy link
Contributor

@kherud kherud Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'<PRE> <SUF>': 1, 32007, 259, 32008, 32009
'<PRE> <SUF> ': 1, 32007, 259, 32008, 29871, 32009
'<PRE> <SUF>  ': 1, 32007, 259, 32008, 259, 32009
'<PRE> <SUF>   ': 1, 32007, 259, 32008, 1678, 32009
'<PRE> <SUF>    ': 1, 32007, 259, 32008, 268, 32009

@vvhg1 vvhg1 marked this pull request as ready for review October 7, 2023 07:59
@vvhg1 vvhg1 marked this pull request as draft October 7, 2023 08:18
@vvhg1
Copy link
Contributor Author

vvhg1 commented Oct 7, 2023

There seems to be another issue too, in interactive mode the -e seems to only act on the initial input as the processing happens when the params are loaded in common.cpp:617
Is that a bug? seems to be a bug for infill so I fixed it there, but what about main, others?
params.cfg_negative_prompt is not being escaped anywhere it seems, is that another bug or desired behaviour?

@vvhg1 vvhg1 marked this pull request as ready for review October 8, 2023 11:21
@ggerganov
Copy link
Owner

This is probably an oversight - we should escape the user input and params.cfg_negative_prompt as well

@vvhg1
Copy link
Contributor Author

vvhg1 commented Oct 9, 2023

This is probably an oversight - we should escape the user input and params.cfg_negative_prompt as well

Should be an easy fix as I am doing that already in the infill script.
Do you want to have the fix here or in a separate pr?

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway you prefer.

If the results match the Python version now, I think we can merge.
I can do some testing later, or we can wait for some feedback from other people.

@ggerganov ggerganov added the need feedback Testing and feedback with results are needed label Oct 9, 2023
@vvhg1
Copy link
Contributor Author

vvhg1 commented Oct 9, 2023

Anyway you prefer.

Would keep it separate as the issues are related but not identical?

If the results match the Python version now, I think we can merge. I can do some testing later, or we can wait for some feedback from other people.

I think it matches now, your call to merge or wait.

@ggerganov
Copy link
Owner

Ok, will merge this first thing tomorrow unless somebody notices any problems

@ggerganov ggerganov merged commit 11ea5c7 into ggerganov:master Oct 10, 2023
39 checks passed
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 12, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp: (34 commits)
  examples: support LLaVA v1.5 (multimodal model) (ggerganov#3436)
  docs : fix typo GOMP_CPU_AFFINITY (ggerganov#3597)
  cmake : fix add_compile_options on macOS
  typo : it is `--n-gpu-layers` not `--gpu-layers` (ggerganov#3592)
  ci : check if there is enough VRAM (ggerganov#3596)
  server : add completion mode (no chat) (ggerganov#3582)
  prompts : add mnemonics.txt
  server : fix kv cache management (ggerganov#3588)
  main : fix session loading bug (ggerganov#3400)
  server : add parameter -tb N, --threads-batch N (ggerganov#3584)
  common : fix mirostat state when using multiple sequences (ggerganov#3543)
  batched : add bench tool (ggerganov#3545)
  examples : add batched.swift + improve CI for swift (ggerganov#3562)
  Add MPT model to supported models in README.md (ggerganov#3574)
  Minor improvements in GPT2 tokenizer (ggerganov#3567)
  readme : add bloom (ggerganov#3570)
  llm : add bloom models (ggerganov#3553)
  swift : improvements and fixes (ggerganov#3564)
  llm : add MPT support (ggerganov#3417)
  infill. : fix tokenization (ggerganov#3508)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infill Incorrect Tokenization
3 participants