Skip to content

kv model size is twice large then llama.cpp/main #66

Closed
@hengjiUSTC

Description

@hengjiUSTC
  1. Issue one
    I use following code to lode model model, tokenizer = LlamaCppModel.from_pretrained(MODEL_PATH) and got this print
llama_model_load: loading model from '/Users/jiheng/Documents/meta/code/fc/llama.cpp/models/ggml-model-q4_1_FineTuned.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 3
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: type    = 2
llama_model_load: ggml map size = 9311.39 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required  = 11359.49 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from '/Users/jiheng/Documents/meta/code/fc/llama.cpp/models/ggml-model-q4_1_FineTuned.bin'
llama_model_load: model size =  9310.96 MB / num tensors = 363
llama_init_from_file: kv self size  =  800.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

but when loading using llama.cpp/main

llama_model_load: loading model from 'models/ggml-model-q4_1_Finetuned.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 3
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: type    = 2
llama_model_load: ggml map size = 9311.39 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required  = 11359.49 MB (+ 1608.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-model-q4_1_Finetuned.bin'
llama_model_load: model size =  9310.96 MB / num tensors = 363
llama_init_from_file: kv self size  =  400.00 MB

system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

The kv self size is different.

  1. for this specific model, I couldn't get any result back from llama-cpp-python, but llamacpp/main gives correct response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions