Gemma2 GGUF: `modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported` #32577

alllexx88 · 2024-08-10T09:47:37Z

System Info

transformers version: 4.44.0
Platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.24.5
Safetensors version: 0.4.4
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no
Using GPU in script?: yes
GPU type: NVIDIA RTX A4000

Who can help?

@SunMarc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run this script:

from transformers import AutoModelForCausalLM

model_id = "bartowski/gemma-2-27b-it-GGUF"
filename = "gemma-2-27b-it-Q2_K.gguf"

model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

Expected behavior

Should load model, but fails instead:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 524, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 976, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 719, in _get_config_dict
    config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 100, in load_gguf_checkpoint
    raise ValueError(f"Architecture {architecture} not supported")
ValueError: Architecture gemma2 not supported

The text was updated successfully, but these errors were encountered:

julien-c · 2024-08-11T08:44:28Z

I think you need to open a PR to add a gemma2<>gguf tensor name mapping, in this code:

transformers/src/transformers/integrations/ggml.py

Lines 74 to 120 in 48101cf

    
           GGUF_TENSOR_MAPPING = { 
        
               "llama": { 
        
                   "token_embd": "model.embed_tokens", 
        
                   "blk": "model.layers", 
        
                   "ffn_up": "mlp.up_proj", 
        
                   "ffn_down": "mlp.down_proj", 
        
                   "ffn_gate": "mlp.gate_proj", 
        
                   "ffn_norm": "post_attention_layernorm", 
        
                   "attn_norm": "input_layernorm", 
        
                   "attn_q": "self_attn.q_proj", 
        
                   "attn_v": "self_attn.v_proj", 
        
                   "attn_k": "self_attn.k_proj", 
        
                   "attn_output": "self_attn.o_proj", 
        
                   "output.weight": "lm_head.weight", 
        
                   "output_norm": "model.norm", 
        
               }, 
        
               "mistral": { 
        
                   "token_embd": "model.embed_tokens", 
        
                   "blk": "model.layers", 
        
                   "ffn_up": "mlp.up_proj", 
        
                   "ffn_down": "mlp.down_proj", 
        
                   "ffn_gate": "mlp.gate_proj", 
        
                   "ffn_norm": "post_attention_layernorm", 
        
                   "attn_norm": "input_layernorm", 
        
                   "attn_q": "self_attn.q_proj", 
        
                   "attn_v": "self_attn.v_proj", 
        
                   "attn_k": "self_attn.k_proj", 
        
                   "attn_output": "self_attn.o_proj", 
        
                   "output.weight": "lm_head.weight", 
        
                   "output_norm": "model.norm", 
        
               }, 
        
               "qwen2": { 
        
                   "token_embd": "model.embed_tokens", 
        
                   "blk": "model.layers", 
        
                   "ffn_up": "mlp.up_proj", 
        
                   "ffn_down": "mlp.down_proj", 
        
                   "ffn_gate": "mlp.gate_proj", 
        
                   "ffn_norm": "post_attention_layernorm", 
        
                   "attn_norm": "input_layernorm", 
        
                   "attn_q": "self_attn.q_proj", 
        
                   "attn_v": "self_attn.v_proj", 
        
                   "attn_k": "self_attn.k_proj", 
        
                   "attn_output": "self_attn.o_proj", 
        
                   "output.weight": "lm_head.weight", 
        
                   "output_norm": "model.norm", 
        
               }, 
        
           }

alllexx88 · 2024-08-12T21:28:18Z

@julien-c thanks, I'm able to load the model with this patch:

--- a/src/transformers/integrations/ggml.py
+++ b/src/transformers/integrations/ggml.py
@@ -117,6 +117,23 @@ GGUF_TENSOR_MAPPING = {
         "output.weight": "lm_head.weight",
         "output_norm": "model.norm",
     },
+    "gemma2": {
+        "token_embd": "model.embed_tokens",
+        "blk": "model.layers",
+        "ffn_up": "mlp.up_proj",
+        "ffn_down": "mlp.down_proj",
+        "ffn_gate": "mlp.gate_proj",
+        "ffn_norm": "post_attention_layernorm",
+        "post_ffw_norm": "post_feedforward_layernorm",
+        "post_attention_norm": "pre_feedforward_layernorm",
+        "attn_norm": "input_layernorm",
+        "attn_q": "self_attn.q_proj",
+        "attn_v": "self_attn.v_proj",
+        "attn_k": "self_attn.k_proj",
+        "attn_output": "self_attn.o_proj",
+        "output.weight": "lm_head.weight",
+        "output_norm": "model.norm",
+    },
 }
 
 
@@ -161,6 +178,18 @@ GGUF_CONFIG_MAPPING = {
         "attention.layer_norm_rms_epsilon": "rms_norm_eps",
         "vocab_size": "vocab_size",
     },
+    "gemma2": {
+        "context_length": "max_position_embeddings",
+        "block_count": "num_hidden_layers",
+        "feed_forward_length": "intermediate_size",
+        "embedding_length": "hidden_size",
+        "rope.dimension_count": None,
+        "rope.freq_base": "rope_theta",
+        "attention.head_count": "num_attention_heads",
+        "attention.head_count_kv": "num_key_value_heads",
+        "attention.layer_norm_rms_epsilon": "rms_norm_eps",
+        "vocab_size": "vocab_size",
+    },
     "tokenizer": {
         "ggml.bos_token_id": "bos_token_id",
         "ggml.eos_token_id": "eos_token_id",

However, I can't get the model together with tokenizer to produce a meaningful output. If I load tokenizer like in the example code at https://huggingface.co/docs/transformers/main/gguf :

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "bartowski/gemma-2-2b-it-GGUF"
filename = "gemma-2-2b-it-Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

input_text = "What is your name?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

generated_ids = model.generate(input_ids, max_length=30)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(generated_text)

I get this error:

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'BartTokenizer'. 
The class this function is called from is 'GemmaTokenizerFast'.
Traceback (most recent call last):
  File "/home/alex/T7/src/vllm/tr.py", line 6, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/models/auto/tokenization_auto.py", line 918, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/models/gemma/tokenization_gemma_fast.py", line 103, in __init__
    super().__init__(
  File "/home/alex/T7/src/transformers/src/transformers/tokenization_utils_fast.py", line 124, in __init__
    fast_tokenizer, additional_kwargs = convert_gguf_tokenizer(architecture, tokenizer_dict)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/integrations/ggml.py", line 743, in convert_gguf_tokenizer
    converter = GGUF_TO_FAST_CONVERTERS[tokenizer_class_name](tokenizer_dict)
                ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'gemma2'

If I use tokenizer directly from google/gemma-2-2b-it I get gibberish output instead:

from os import environ
environ['HF_TOKEN'] = '<my token>'
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "bartowski/gemma-2-2b-it-GGUF"
filename = "gemma-2-2b-it-Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

input_text = "What is your name?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

generated_ids = model.generate(input_ids, max_length=30)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(generated_text)

The output is:

Converting and de-quantizing GGUF tensors...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 288/288 [00:20<00:00, 14.24it/s]
What is your name?шTIMESเทศเทศเทศเทศ ITIS ITIS ITIS ITIS ITIS ITIS ITIS ITISنسMenuView stayingLleg GetEnumeratorsedes})*/})*/})*/})*/

SunMarc · 2024-08-13T13:38:23Z

Hey @alllexx88, you also need to define the tokenizer for gemma 2. Have a look at how qwen2 gguf was added : https://github.com/huggingface/transformers/pull/31175/files

PolRF · 2024-09-02T18:19:13Z

May I submit a PR for this issue? @alllexx88 @SunMarc

SunMarc · 2024-09-03T12:19:35Z

If you have a working solution @PolRF, feel free to submit a PR !

alllexx88 · 2024-09-03T13:05:38Z

I second @SunMarc 's words, @PolRF . I haven't managed to solve this, so a PR is very welcome!

github-actions · 2024-09-28T08:05:52Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

SunMarc · 2024-10-07T12:33:35Z

I'm leaving this issue closed as we centralized gguf model addition request in this issue ! #33260

FireAngelx · 2024-10-20T05:05:08Z

@SunMarc I got a similar error while using vllm to deploy chatglm4-gguf:
raise ValueError(f"Architecture {architecture} not supported")
ValueError: Architecture chatglm not supported
However, llama.cpp seems fit all these models well. Could I understand that the GGUF part has not been completed yet?

alllexx88 added the bug label Aug 10, 2024

alllexx88 mentioned this issue Aug 10, 2024

[Bug]: gemma-2-27b-it-GGUF: Architecture gemma2 not supported vllm-project/vllm#7357

Closed

amyeroberts added Quantization GGUF and removed Quantization labels Aug 12, 2024

github-actions bot closed this as completed Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma2 GGUF: `modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported` #32577

Gemma2 GGUF: `modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported` #32577

alllexx88 commented Aug 10, 2024 •

edited

Loading

julien-c commented Aug 11, 2024

alllexx88 commented Aug 12, 2024 •

edited

Loading

SunMarc commented Aug 13, 2024

PolRF commented Sep 2, 2024

SunMarc commented Sep 3, 2024

alllexx88 commented Sep 3, 2024

github-actions bot commented Sep 28, 2024

SunMarc commented Oct 7, 2024

FireAngelx commented Oct 20, 2024

Gemma2 GGUF: modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported #32577

Gemma2 GGUF: modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported #32577

Comments

alllexx88 commented Aug 10, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

julien-c commented Aug 11, 2024

alllexx88 commented Aug 12, 2024 • edited Loading

SunMarc commented Aug 13, 2024

PolRF commented Sep 2, 2024

SunMarc commented Sep 3, 2024

alllexx88 commented Sep 3, 2024

github-actions bot commented Sep 28, 2024

SunMarc commented Oct 7, 2024

FireAngelx commented Oct 20, 2024

Gemma2 GGUF: `modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported` #32577

Gemma2 GGUF: `modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported` #32577

alllexx88 commented Aug 10, 2024 •

edited

Loading

alllexx88 commented Aug 12, 2024 •

edited

Loading