-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alaways "failed to tokenize string! " #290
Comments
Can you provide the command line and a checksum of the model file? |
same problem, ggml-model-q4_0.bin, md5sum is 919e4f8aee6ce4f3fbabb6cbcd7756db |
./main -m ./models/7B/ggml-model-q4_0.bin -p "china" -n 512 checksum: |
The files look good, though these are the "old" format, you'll have to regenerate them if you update to latest master. There should be three tokens recognized with the old tokenizer:
The new tokenizer gives different tokens:
I really can't explain this, unless you have some strange terminal encoding set? |
Possibly a duplicate of #113. |
failed to tokenize string!
system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
failed to tokenize string!
main: prompt: ' china'
main: number of tokens in prompt = 1
1 -> ''
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
曲ー! /S部ュース / KSHErsLAheLUE - THE NEW CH`,MEgeERSION IS HERE@ÿThis entry was вер in news on JuneSASSSASS8 by adminS [end of text]
The text was updated successfully, but these errors were encountered: