Skip to content

bug: ValueError: Architecture qwen3 not supported #13157

@JohnConnor123

Description

@JohnConnor123

Name and Version

(venv) calibri@devtest:~/experiments/quantization_benchmark$ ./llama.cpp/build/bin/llama-cli
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
build: 5215 (5f5e39e1) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090) - 23872 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3090) - 23870 MiB free
gguf_init_from_file: failed to open GGUF file 'models/7B/ggml-model-f16.gguf'
llama_model_load: error loading model: llama_model_loader: failed to load model from models/7B/ggml-model-f16.gguf

llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/7B/ggml-model-f16.gguf'
main: error: unable to load model

Operating systems

Linux

GGML backends

CUDA

Hardware

AMD Ryzen 7 5800X 8-Core Processor + 2x rtx 3090

Models

Qwen/Qwen3-0.6B

Problem description & steps to reproduce

I used wrapper upon llama.cpp to get qwen3 quantizations (https://github.com/JohnConnor123/quantization-benchmark), but got error. But I found that there are GGUF for this model in hugging face hub

First Bad Commit

No response

Relevant log output

Traceback (most recent call last):
  File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 340, in <module>
    results_df = compare_quantizations(
  File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 285, in compare_quantizations
    result = evaluate_quantization(
  File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 225, in evaluate_quantization
    model, tokenizer = load_model_and_tokenizer(quantized_path)
  File "/home/calibri/experiments/quantization_benchmark/quantization_benchmark.py", line 199, in load_model_and_tokenizer
    model = AutoModelForCausalLM.from_pretrained(model_dir, gguf_file=gguf_file, **common_kwargs)
  File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1021, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 590, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 681, in _get_config_dict
    config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
  File "/home/calibri/experiments/quantization_benchmark/venv/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 326, in load_gguf_checkpoint
    raise ValueError(f"Architecture {architecture + model_size} not supported")
ValueError: Architecture qwen3 not supported

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions