Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama? #280

ukiyocode · 2023-03-19T04:00:14Z

Old version:

.\build\Release\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p "list all US states in alphabetical order:"
output: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming ... (keeps repeating)

.\build\Release\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 200 --temp 0.2 -p "list all US states in alphabetical order:"
output: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming
list all US states in alphabetical order [end of text]

.\build\Release\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 300 --temp 0.2 -p "list all US states in alphabetical order:"
output: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. ... (keeps repeating)

new release (after commit c9f670a):

.\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p "list all US states in alphabetical order:"
output: list the 50 state capitals (in no particular order): [end of text]

.\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 200 --temp 0.2 -p "list all US states in alphabetical order:"
output: list the 50 state capitals and their abbreviations (e.g., Sacramento, CA): [end of text]

.\llama.exe -m C:\...\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 200 --temp 0.2 -p "list all US states in alphabetical order:"
output: list the 50 largest cities of USA by population (2017): [end of text]

The text was updated successfully, but these errors were encountered:

thement · 2023-03-19T10:32:10Z

Could you please check how it behaves with the BPE tokenizer which is not yet merged?
#252

Could you also copy here the tokens that were generated for the "list all US states..." prompt in the current version (they are printed when llama starts)?

ukiyocode · 2023-03-19T13:31:22Z

list of tokens and output:

main: prompt: ' list all US states in alphabetical order:'
main: number of tokens in prompt = 10
1 -> ''
1051 -> ' list'
599 -> ' all'
3148 -> ' US'
5922 -> ' states'
297 -> ' in'
22968 -> ' alphabet'
936 -> 'ical'
1797 -> ' order'
29901 -> ':'

list the 50 state capitals (in no particular order): [end of text]

The version you linked complains that my model files are too old: (too old, regenerate your model files!)

After remaking model files (converting from pth and quantizing) it still doesn't work right:

.\build\Release\llama.exe -m .\models\30B\ggml-model-q4_0.bin -t 10 -n 256 --seed 100 --temp 0.2 -p " list all US states in alphabetical order:"

main: prompt: ' list all US states in alphabetical order:'
main: number of tokens in prompt = 11
1 -> ''
29871 -> ' '
1051 -> ' list'
599 -> ' all'
3148 -> ' US'
5922 -> ' states'
297 -> ' in'
22968 -> ' alphabet'
936 -> 'ical'
1797 -> ' order'
29901 -> ':'

sampling parameters: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

list all US states in alphabetical order:
for i, state_name in enumerate(state.abbr): print(i + 1)
"""
return [x[0] if isinstance(x, tuple) else x for _, x in sorted(list)] [end of text]

gotzmann · 2023-03-19T20:13:24Z

That's interesting, I'm getting really different results with -t 8 and -t 10 on M1 Pro laptop / 7B model.

system_info: n_threads = 10 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

list all US states in alphabetical order:
Arizona, Arkansas, California (the Golden State), Colorado, Connecticut, Delaware, District of Columbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire (the Granite State), New Jersey, New Mexico, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington DC, West Virgina and Wyoming.
The 50 states of the United States are listed in alphabetical order: Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District Of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire (the Granite State)New Jersey New Mexico North Carolina Ohio Oklahoma Oregon Pennsylvania Rhode Island South Dakota Tennessee Texas Utah Vermont Virginia Washington DC West Virgina Wisconsin Wyoming
The 50 states of the United States are listed in alphabetical order: Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District Of Columbia

main:  predict time =  3814.46 ms / 51.55 ms per token

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |

list all US states in alphabetical order:
http://en.wikipedia.org/wiki/List_of...
I'm not sure what you mean by "in the middle of" a state, but if it means that they are surrounded on both sides (and no other states) then I think there is only one such case - Hawaii! [end of text]

main:  predict time = 47638.37 ms / 179.77 ms per token

gotzmann · 2023-03-19T20:21:08Z

Tried with different thread count and it seems this affect not only performance but the core inference quality. Looks like choose 1, 4, 8 threads are safe on my machine.

ukiyocode · 2023-03-19T20:53:05Z

Wow you're right. In my case it answers correctly with 4 threads but not with 8 or 10. Same prompt, same seed the only difference is the number of threads.

sw · 2023-03-20T19:16:55Z

Number of threads affects the output due to floating point rounding, this is known: #95

ukiyocode · 2023-03-21T05:18:39Z

After more testing I think we can close this one. The new version either matches or outperforms the old one in most tasks. The number of threads affecting output is still a problem but that wasn't caused by the commit.

gjmulder added bug Something isn't working need more info The OP should provide more details about the issue labels Mar 19, 2023

ukiyocode mentioned this issue Mar 19, 2023

LLaMA.cpp returns just some weirdo texts with any model size #291

Closed

Green-Sky closed this as completed Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama? #280

Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama? #280

ukiyocode commented Mar 19, 2023 •

edited by gjmulder

Loading

thement commented Mar 19, 2023

ukiyocode commented Mar 19, 2023

gotzmann commented Mar 19, 2023 •

edited

Loading

gotzmann commented Mar 19, 2023

ukiyocode commented Mar 19, 2023

sw commented Mar 20, 2023

ukiyocode commented Mar 21, 2023

Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama? #280

Commit c9f670a (Implement non-greedy tokenizer that tries to maximize token lengths) breaks llama? #280

Comments

ukiyocode commented Mar 19, 2023 • edited by gjmulder Loading

thement commented Mar 19, 2023

ukiyocode commented Mar 19, 2023

gotzmann commented Mar 19, 2023 • edited Loading

gotzmann commented Mar 19, 2023

ukiyocode commented Mar 19, 2023

sw commented Mar 20, 2023

ukiyocode commented Mar 21, 2023

ukiyocode commented Mar 19, 2023 •

edited by gjmulder

Loading

gotzmann commented Mar 19, 2023 •

edited

Loading