Closed
Description
- Issue one
I use following code to lode modelmodel, tokenizer = LlamaCppModel.from_pretrained(MODEL_PATH)
and got this print
llama_model_load: loading model from '/Users/jiheng/Documents/meta/code/fc/llama.cpp/models/ggml-model-q4_1_FineTuned.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 3
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 2
llama_model_load: type = 2
llama_model_load: ggml map size = 9311.39 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required = 11359.49 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from '/Users/jiheng/Documents/meta/code/fc/llama.cpp/models/ggml-model-q4_1_FineTuned.bin'
llama_model_load: model size = 9310.96 MB / num tensors = 363
llama_init_from_file: kv self size = 800.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
but when loading using llama.cpp/main
llama_model_load: loading model from 'models/ggml-model-q4_1_Finetuned.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 3
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 2
llama_model_load: type = 2
llama_model_load: ggml map size = 9311.39 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required = 11359.49 MB (+ 1608.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-model-q4_1_Finetuned.bin'
llama_model_load: model size = 9310.96 MB / num tensors = 363
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
The kv self size is different.
- for this specific model, I couldn't get any result back from llama-cpp-python, but llamacpp/main gives correct response.
Metadata
Metadata
Assignees
Labels
No labels