-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
Description
Name and Version
(venv) calibri@devtest:~/experiments/quantization_benchmark$ ./llama.cpp/build/bin/llama-cli
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
build: 5215 (5f5e39e1) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23872 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3090) - 23870 MiB free
gguf_init_from_file: failed to open GGUF file 'models/7B/ggml-model-f16.gguf'
llama_model_load: error loading model: llama_model_loader: failed to load model from models/7B/ggml-model-f16.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/7B/ggml-model-f16.gguf'
main: error: unable to load model
Operating systems
Linux
GGML backends
CUDA
Hardware
AMD Ryzen 7 5800X 8-Core Processor + 2x rtx 3090
Models
Qwen/Qwen3-0.6B
Problem description & steps to reproduce
I used wrapper upon llama.cpp to get qwen3 quantizations (https://github.com/JohnConnor123/quantization-benchmark), but got error. But I found that there are GGUF for this model in hugging face hub
First Bad Commit
No response
Relevant log output
Traceback (most recent call last):
File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 340, in <module>
results_df = compare_quantizations(
File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 285, in compare_quantizations
result = evaluate_quantization(
File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 225, in evaluate_quantization
model, tokenizer = load_model_and_tokenizer(quantized_path)
File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 199, in load_model_and_tokenizer
model = AutoModelForCausalLM.from_pretrained(model_dir, gguf_file=gguf_file, **common_kwargs)
File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1021, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 590, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 681, in _get_config_dict
config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 326, in load_gguf_checkpoint
raise ValueError(f"Architecture {architecture + model_size} not supported")
ValueError: Architecture qwen3 not supported
the-veloper, reneleonhardt and mushan0x0