Workaround for #3454 #3455

goerch · 2023-10-03T14:24:03Z

Accepting all unsupported token types and handling them like CONTROL tokens.

MaggotHATE · 2023-10-03T14:50:04Z

mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode.

goerch · 2023-10-03T14:58:36Z

mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode.

OK. Does this mean main expects <|im_end|> and <|im_start|> in the output stream from detokenization?

staviq · 2023-10-03T15:00:54Z

mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode.

OK. Does this mean main expects <|im_end|> and <|im_start|> in the output stream from detokenization?

I'm currently testing this, it appears that not using <|im_start|> makes the model never use <|im_end|>, hence no EOS happening, but I'm not entirely certain that's exactly what happens.

MaggotHATE · 2023-10-03T15:24:15Z

I'm currently testing this, it appears that not using <|im_start|> makes the model never use <|im_end|>, hence no EOS happening, but I'm not entirely certain that's exactly what happens.

However, it's not consistent: it added <|im_end|> after the first prompt, but didn't after the second.

<|im_start|>system: complete the given task with precision.<|im_end|><|im_start|>user: Write a short joke.

<|im_end|>

A mathematician, an engineer, and a physicist are having lunch together when their food arrives. The waiter accidentally brings the wrong order for each of them. The mathematician starts to figure out how to divide the plates equally among all three. Meanwhile, the engineer comes up with an efficient way to rearrange the food so everyone gets what they need. The physicist, on the other hand, begins to theorize about the fundamental particles that make up the sandwich and how to create a new form of matter by combining them.

<|im_start|>user: Give a list of job titles in this joke.

1. Mathematician
2. Engineer
3. Physicist
4. Waiter (indirectly involved)

<|im_end|>

<|im_start|>user:

Maybe this model just requires using prefix and suffix.

staviq · 2023-10-03T15:49:34Z

@MaggotHATE i believe the correct prompt format expects "system" "user" "assistant" followed by endline instead of ":"

https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca#example-prompt-exchange

Edit: if you add -e to main arguments, you can use \n for endline character

goerch · 2023-10-03T16:25:46Z

I tried this according to suggested format, and is seems to work

It seems to me you are testing a rather new model? W.r.t. #3525 it is probably important to distinguish regressions (which the assertions disabled in this patch are).

So your fix works, however naively changing USER_DEFINED to CONTROL

@staviq : I believe with this PR we should be backwards compatible for sentencepiece tokenizers, which is probably more important for now. I'll really have to look into this added_tokens stuff...

MaggotHATE · 2023-10-03T16:38:20Z

Edit: if you add -e to main arguments, you can use \n for endline character

@staviq Thank you! Forgot about it completely.

With -e, however, if the prompt format is slightly different, it breaks - for example, if adding \n after <|im_start|> in antiprompt, the model doesn't even consider it:

[1696348822] n_remain: -68
[1696348822] eval: [ '>':28767 ]
[1696348822] n_past = 101
[1696348822] top 10 candidates:
[1696348822]  -  1838: '        user' (0.592)
[1696348822]  -    13: '           
' (0.210)
[1696348822]  -  6574: '      system' (0.114)
[1696348822]  -  5292: '          LL' (0.018)
[1696348822]  -  1585: '          ai' (0.007)
[1696348822]  -  5553: '        task' (0.006)
[1696348822]  -   584: '          ll' (0.006)
[1696348822]  -   489: '         ass' (0.006)
[1696348822]  - 11741: '          AI' (0.005)
[1696348822]  - 16704: '          LL' (0.005)
[1696348822] sampled token:  1838: 'user'

<|im_start|>system
complete the given task with precision.
<|im_end|>
<|im_start|>user
 Write only 3 words that describe Vulcan API best.
<|im_end|>
<|im_start|>LLM
 Reliable, Efficient, and Flexible
<|im_end|>

<|im_start|>user
 What is the Vulcan API used for?
<|im_end|>
<|im_start|>LLM:
 The Vulcan API is a software development tool that enables developers to build applications with greater efficiency and flexibility. It is specifically designed for building scalable, distributed applications that can handle high-volume transactions and complex workflows. By utilizing the Vulcan API, developers can create reliable, efficient, and flexible applications that can adapt to various environments and requirements.
<|im_end|> <|im_start|>user\n Explain in simple terms.
<|im_end|>
<|im_start|>LLM
 In simpler words, the Vulcan API is a tool that helps programmers create powerful and flexible applications. It makes it easier to build apps that can handle lots of data and complex tasks, making them more efficient and reliable.
<|im_end|> <|im_start|>user\n

In this example, my inputs were only first and last, it just didn't stop on the second one. Maybe it's because \n is added at the end of each input.

And finally, it works best without suffix, completing the format for you. Still not ideal, though.

<|im_start|>system
complete the given task with precision.
<|im_end|>
<|im_start|>user
 Write 3 words that describe Vulcan API best.
<|im_end|>
<|im_start|>assistant: Vulcan API is a powerful and versatile tool for managing data and resources in your applications, so here are three words that describe it best:

1. Efficient - The Vulcan API allows developers to optimize their application performance by providing efficient ways of handling data and resource management.

2. Scalable - It's built with the ability to grow as your application needs evolve. Its scalability ensures your app can easily handle increasing amounts of traffic without breaking a sweat.

3. Secure - The Vulcan API comes with robust security features, ensuring that your data and applications are well protected against potential threats.
<|im_end|> <|im_start|>user\n Explain the first one in simple terms.
<|im_end|>
<|im_start|>assistant: Efficient means that Vulcan API makes it easy for developers to manage data and resources with minimal effort, resulting in a smooth-running application.
<|im_end|>
<|im_start|>user\n What about the second one?
<|im_end|>
<|im_start|>assistant: Scalable means that Vulcan API can grow with your app's needs, allowing it to handle more data and users without slowing down or breaking. It's like a suit that expands as you grow.
<|im_end|>
<|im_start|>user\n

which is also a waste of tokens, I guess, since it generates all that extra formatting each time.

Still, it's just a 7b model, so it's hard to tell when it's hallucinating and when it works as intended.

staviq · 2023-10-03T16:40:37Z

I tried this according to suggested format, and is seems to work

It seems to me you are testing a rather new model? W.r.t. #3525 it is probably important to distinguish regressions (which the assertions disabled in this patch are).

So your fix works, however naively changing USER_DEFINED to CONTROL

@staviq : I believe with this PR we should be backwards compatible for sentencepiece tokenizers, which is probably more important for now. I'll really have to look into this added_tokens stuff...

I already tested it and it does solve that particular problem with assert crash, though I would prefer @cebtenzzre to confirm, he's been following recent tokenizer changes.

Convert and quantize actually take less than 10mins on fairly old CPU, if that would make it easier for you, but it's not a problem for me either.

staviq · 2023-10-03T21:02:21Z

@MaggotHATE

I think I found what's wrong with prompt format

Reverse prompt -r is not being processed for escape sequences and \n makes it through as string.

Can you try editing common/common.cpp and exactly below this line:

llama.cpp/common/common.cpp

Line 618 in 79f34ab

process_escapes(params.input_suffix);

paste this:

        for (auto & antiprompt : params.antiprompt) {
            process_escapes(antiprompt);
        }

so It looks like this:

    if (params.escape) {
        process_escapes(params.prompt);
        process_escapes(params.input_prefix);
        process_escapes(params.input_suffix);
        for (auto & antiprompt : params.antiprompt) {
            process_escapes(antiprompt);
        }
    }

After that, do make clean and rebuild, and try those parameters ( actual main prompt text isn't important here ):

I just tested this and managed to have couple of pages worth of conversation without a single problem.

jxy · 2023-10-03T22:47:28Z

I'm confused as what's been tested here. main does not even tokenize <|im_start|> and <|im_end|> to 32001 and 32000.

./main -m models/mistral-7b-openorca.Q8_0.gguf --verbose-prompt -n 1 -p '<|im_start|>system
lol
<|im_end|>'
[... omitted ...]
main: prompt: '<|im_start|>system
lol
<|im_end|>'
main: number of tokens in prompt = 20
     1 -> ''
   523 -> ' <'
 28766 -> '|'
   321 -> 'im'
 28730 -> '_'
  2521 -> 'start'
 28766 -> '|'
 28767 -> '>'
  6574 -> 'system'
    13 -> '
'
 28714 -> 'l'
   328 -> 'ol'
    13 -> '
'
 28789 -> '<'
 28766 -> '|'
   321 -> 'im'
 28730 -> '_'
   416 -> 'end'
 28766 -> '|'
 28767 -> '>'
[... omitted ...]

staviq · 2023-10-03T23:08:53Z

I'm confused as what's been tested here. main does not even tokenize <|im_start|> and <|im_end|> to 32001 and 32000.

It does detokenize it when the model produces it though:

I checked logs and indeed those tags do get tokenized as normal text, yet model responds correctly with those tokens

jxy · 2023-10-04T01:44:02Z

Yes, main can detokenize them because they are in the vocabulary. But the prompt and the in-prefix and in-suffix are not tokenized properly.

MaggotHATE · 2023-10-04T06:58:43Z

@staviq thank you! Processing for escape sequences fixes the issue with stopping (sometimes even without <|im_start|> and <|im_end|>).

For example, questions about Vulkan API seem to test the model best - it stops properly only with <|im_start|> and <|im_end|>, while with simpler questions or conversations it can work without special tokens.

Overall, I think this is just a quirky model that is too much of an effort to work with at the moment, unless there is a way to insert the tokens as tokens into prompt and antiprompt. Thank you for help anyway!

cebtenzzre · 2023-10-04T18:10:24Z

I still haven't gotten a chance to read over the BPE tokenizer PR. Could we just revert whatever was changed in the sentencepiece tokenizer? It wasn't intended to be part of the PR's scope anyway, right?

goerch · 2023-10-06T07:34:35Z

I still haven't gotten a chance to read over the BPE tokenizer PR. Could we just revert whatever was changed in the sentencepiece tokenizer? It wasn't intended to be part of the PR's scope anyway, right?

I believe adding assertion(s) are the only changes done to SPM.

cebtenzzre

This seems to more or less revert 17ca832 for SPM, so it should be fine.

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

…example * 'master' of github.com:ggerganov/llama.cpp: py : change version of numpy requirement to 1.24.4 (ggerganov#3515) quantize : fail fast on write errors (ggerganov#3521) metal : support default.metallib load & reuse code for swift package (ggerganov#3522) llm : support Adept Persimmon 8B (ggerganov#3410) Fix for ggerganov#3454 (ggerganov#3455) readme : update models, cuda + ppl instructions (ggerganov#3510) server : docs fix default values and add n_probs (ggerganov#3506)

Workaround for ggerganov#3454

ee8e2b2

goerch mentioned this pull request Oct 3, 2023

[bug] mistral-7b-openorca crashes main.exe after BPE update. #3454

Closed

4 tasks

goerch requested a review from staviq October 3, 2023 14:25

goerch requested a review from cebtenzzre October 3, 2023 17:19

staviq mentioned this pull request Oct 3, 2023

Process escape sequences in reverse prompts #3461

Merged

staviq mentioned this pull request Oct 4, 2023

Tokenizer not picking the right tokens ( mistral openorca ) #3475

Closed

goerch mentioned this pull request Oct 6, 2023

Please add support for kfkas llama-2-ko-7b-chat #2877

Closed

cebtenzzre approved these changes Oct 6, 2023

View reviewed changes

cwillu mentioned this pull request Oct 6, 2023

main: build = 1336 (9ca79d5) - Load mistral-7b-openorca.Q8_0.gguf - after first prompt "hello" llama crashing - windows build - some time ago was ok - 30 builds before? #3516

Closed

goerch merged commit 3a716b4 into ggerganov:master Oct 7, 2023
32 checks passed

yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023

Fix for ggerganov#3454 (ggerganov#3455)

240db82

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

goerch deleted the accept-unsupported-token-types branch October 22, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for #3454 #3455

Workaround for #3454 #3455

goerch commented Oct 3, 2023

MaggotHATE commented Oct 3, 2023

goerch commented Oct 3, 2023

staviq commented Oct 3, 2023

MaggotHATE commented Oct 3, 2023

staviq commented Oct 3, 2023 •

edited

Loading

goerch commented Oct 3, 2023

MaggotHATE commented Oct 3, 2023

staviq commented Oct 3, 2023

staviq commented Oct 3, 2023

jxy commented Oct 3, 2023

staviq commented Oct 3, 2023

jxy commented Oct 4, 2023

MaggotHATE commented Oct 4, 2023 •

edited

Loading

cebtenzzre commented Oct 4, 2023

goerch commented Oct 6, 2023

cebtenzzre left a comment

Workaround for #3454 #3455

Workaround for #3454 #3455

Conversation

goerch commented Oct 3, 2023

MaggotHATE commented Oct 3, 2023

goerch commented Oct 3, 2023

staviq commented Oct 3, 2023

MaggotHATE commented Oct 3, 2023

staviq commented Oct 3, 2023 • edited Loading

goerch commented Oct 3, 2023

MaggotHATE commented Oct 3, 2023

staviq commented Oct 3, 2023

staviq commented Oct 3, 2023

jxy commented Oct 3, 2023

staviq commented Oct 3, 2023

jxy commented Oct 4, 2023

MaggotHATE commented Oct 4, 2023 • edited Loading

cebtenzzre commented Oct 4, 2023

goerch commented Oct 6, 2023

cebtenzzre left a comment

Choose a reason for hiding this comment

staviq commented Oct 3, 2023 •

edited

Loading

MaggotHATE commented Oct 4, 2023 •

edited

Loading