-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama? #280
Comments
Could you please check how it behaves with the BPE tokenizer which is not yet merged? Could you also copy here the tokens that were generated for the "list all US states..." prompt in the current version (they are printed when llama starts)? |
list of tokens and output: main: prompt: ' list all US states in alphabetical order:' list the 50 state capitals (in no particular order): [end of text] The version you linked complains that my model files are too old: (too old, regenerate your model files!) After remaking model files (converting from pth and quantizing) it still doesn't work right: .\build\Release\llama.exe -m .\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p " list all US states in alphabetical order:" main: prompt: ' list all US states in alphabetical order:' sampling parameters: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 list all US states in alphabetical order: |
That's interesting, I'm getting really different results with system_info: n_threads = 10 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
|
Tried with different thread count and it seems this affect not only performance but the core inference quality. Looks like choose 1, 4, 8 threads are safe on my machine. |
Wow you're right. In my case it answers correctly with 4 threads but not with 8 or 10. Same prompt, same seed the only difference is the number of threads. |
Number of threads affects the output due to floating point rounding, this is known: #95 |
After more testing I think we can close this one. The new version either matches or outperforms the old one in most tasks. The number of threads affecting output is still a problem but that wasn't caused by the commit. |
Old version:
new release (after commit c9f670a):
The text was updated successfully, but these errors were encountered: