Skip to content

Misc. bug: Docker Image llama-quantize Segmentation fault #11196

Closed
@aria3ppp

Description

@aria3ppp

Name and Version

root@f7545b6b4f65:/app# ./llama-cli --version
load_backend: loaded CPU backend from ./libggml-cpu-alderlake.so
version: 4460 (ba8a1f9)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux, Other? (Please let us know in description)

Which llama.cpp modules do you know to be affected?

llama-quantize

Command line

❯ docker run --rm -it \                                                                                                                                          
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M

Problem description & steps to reproduce

just try to quantize a model and you'll get the segfault

❯ docker run --rm -it \                                    
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
main: build = 4460 (ba8a1f9c)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf' to '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 30 key-value pairs and 197 tensors from /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Bge Small En v1.5
llama_model_loader: - kv   3:                            general.version str              = v1.5
llama_model_loader: - kv   4:                           general.finetune str              = en
llama_model_loader: - kv   5:                           general.basename str              = bge
llama_model_loader: - kv   6:                         general.size_label str              = small
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,5]       = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  10:                           bert.block_count u32              = 12
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 384
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 1536
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 12
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = jina-v2-en
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  25:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  26:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  27:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  28:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  197 tensors
Segmentation fault (core dumped)

First Bad Commit

No response

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions