Closed
Description
I'm not sure if the tokenizer is here to blame or something else, I've quantized the 7B model and running on my Mac and the output of any prompt is just garbage.
❯ ./main -m ggml-model-q4_0.bin -t 10 -p "Building a website can be done in 10 simple steps:" -n 512
main: seed = 1678546145
llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
1 -> ''
8893 -> 'Build'
292 -> 'ing'
263 -> ' a'
4700 -> ' website'
508 -> ' can'
367 -> ' be'
2309 -> ' done'
297 -> ' in'
29871 -> ' '
29896 -> '1'
29900 -> '0'
2560 -> ' simple'
6576 -> ' steps'
29901 -> ':'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000
Building a website can be done in 10 simple steps:tegr extremely“œurconnectommensrc périалheader ferm cas inde_" ENDeperCONT knowing Hud Source Dopo UPDATE sig Mobileclerût clean constraintsügel DrathelessOff intituléельm складу oltre\{\Readarrison Santa indicates Clear MongoDBasserControllerisp online Сове вла ingårLAśćcolors zawod Bus cult спWebachivrificeл brotherestyicumtmpjquery takéiveness dopolections^C
Or is it due to the fact that quantization was done on x86 arch, but somehow the weights are saved in architecture specific format?