Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma2 GGUF: modeling_gguf_pytorch_utils.py: ValueError: Architecture gemma2 not supported #32577

Closed
4 tasks
alllexx88 opened this issue Aug 10, 2024 · 9 comments
Closed
4 tasks

Comments

@alllexx88
Copy link

alllexx88 commented Aug 10, 2024

System Info

  • transformers version: 4.44.0
  • Platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.24.5
  • Safetensors version: 0.4.4
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA RTX A4000

Who can help?

@SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run this script:

from transformers import AutoModelForCausalLM

model_id = "bartowski/gemma-2-27b-it-GGUF"
filename = "gemma-2-27b-it-Q2_K.gguf"

model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

Expected behavior

Should load model, but fails instead:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 524, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 976, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 719, in _get_config_dict
    config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
  File "/opt/vllm/venv/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 100, in load_gguf_checkpoint
    raise ValueError(f"Architecture {architecture} not supported")
ValueError: Architecture gemma2 not supported
@julien-c
Copy link
Member

I think you need to open a PR to add a gemma2<>gguf tensor name mapping, in this code:

GGUF_TENSOR_MAPPING = {
"llama": {
"token_embd": "model.embed_tokens",
"blk": "model.layers",
"ffn_up": "mlp.up_proj",
"ffn_down": "mlp.down_proj",
"ffn_gate": "mlp.gate_proj",
"ffn_norm": "post_attention_layernorm",
"attn_norm": "input_layernorm",
"attn_q": "self_attn.q_proj",
"attn_v": "self_attn.v_proj",
"attn_k": "self_attn.k_proj",
"attn_output": "self_attn.o_proj",
"output.weight": "lm_head.weight",
"output_norm": "model.norm",
},
"mistral": {
"token_embd": "model.embed_tokens",
"blk": "model.layers",
"ffn_up": "mlp.up_proj",
"ffn_down": "mlp.down_proj",
"ffn_gate": "mlp.gate_proj",
"ffn_norm": "post_attention_layernorm",
"attn_norm": "input_layernorm",
"attn_q": "self_attn.q_proj",
"attn_v": "self_attn.v_proj",
"attn_k": "self_attn.k_proj",
"attn_output": "self_attn.o_proj",
"output.weight": "lm_head.weight",
"output_norm": "model.norm",
},
"qwen2": {
"token_embd": "model.embed_tokens",
"blk": "model.layers",
"ffn_up": "mlp.up_proj",
"ffn_down": "mlp.down_proj",
"ffn_gate": "mlp.gate_proj",
"ffn_norm": "post_attention_layernorm",
"attn_norm": "input_layernorm",
"attn_q": "self_attn.q_proj",
"attn_v": "self_attn.v_proj",
"attn_k": "self_attn.k_proj",
"attn_output": "self_attn.o_proj",
"output.weight": "lm_head.weight",
"output_norm": "model.norm",
},
}

@alllexx88
Copy link
Author

alllexx88 commented Aug 12, 2024

@julien-c thanks, I'm able to load the model with this patch:

--- a/src/transformers/integrations/ggml.py
+++ b/src/transformers/integrations/ggml.py
@@ -117,6 +117,23 @@ GGUF_TENSOR_MAPPING = {
         "output.weight": "lm_head.weight",
         "output_norm": "model.norm",
     },
+    "gemma2": {
+        "token_embd": "model.embed_tokens",
+        "blk": "model.layers",
+        "ffn_up": "mlp.up_proj",
+        "ffn_down": "mlp.down_proj",
+        "ffn_gate": "mlp.gate_proj",
+        "ffn_norm": "post_attention_layernorm",
+        "post_ffw_norm": "post_feedforward_layernorm",
+        "post_attention_norm": "pre_feedforward_layernorm",
+        "attn_norm": "input_layernorm",
+        "attn_q": "self_attn.q_proj",
+        "attn_v": "self_attn.v_proj",
+        "attn_k": "self_attn.k_proj",
+        "attn_output": "self_attn.o_proj",
+        "output.weight": "lm_head.weight",
+        "output_norm": "model.norm",
+    },
 }
 
 
@@ -161,6 +178,18 @@ GGUF_CONFIG_MAPPING = {
         "attention.layer_norm_rms_epsilon": "rms_norm_eps",
         "vocab_size": "vocab_size",
     },
+    "gemma2": {
+        "context_length": "max_position_embeddings",
+        "block_count": "num_hidden_layers",
+        "feed_forward_length": "intermediate_size",
+        "embedding_length": "hidden_size",
+        "rope.dimension_count": None,
+        "rope.freq_base": "rope_theta",
+        "attention.head_count": "num_attention_heads",
+        "attention.head_count_kv": "num_key_value_heads",
+        "attention.layer_norm_rms_epsilon": "rms_norm_eps",
+        "vocab_size": "vocab_size",
+    },
     "tokenizer": {
         "ggml.bos_token_id": "bos_token_id",
         "ggml.eos_token_id": "eos_token_id",

However, I can't get the model together with tokenizer to produce a meaningful output. If I load tokenizer like in the example code at https://huggingface.co/docs/transformers/main/gguf :

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "bartowski/gemma-2-2b-it-GGUF"
filename = "gemma-2-2b-it-Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

input_text = "What is your name?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

generated_ids = model.generate(input_ids, max_length=30)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(generated_text)

I get this error:

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'BartTokenizer'. 
The class this function is called from is 'GemmaTokenizerFast'.
Traceback (most recent call last):
  File "/home/alex/T7/src/vllm/tr.py", line 6, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/models/auto/tokenization_auto.py", line 918, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/models/gemma/tokenization_gemma_fast.py", line 103, in __init__
    super().__init__(
  File "/home/alex/T7/src/transformers/src/transformers/tokenization_utils_fast.py", line 124, in __init__
    fast_tokenizer, additional_kwargs = convert_gguf_tokenizer(architecture, tokenizer_dict)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/T7/src/transformers/src/transformers/integrations/ggml.py", line 743, in convert_gguf_tokenizer
    converter = GGUF_TO_FAST_CONVERTERS[tokenizer_class_name](tokenizer_dict)
                ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'gemma2'

If I use tokenizer directly from google/gemma-2-2b-it I get gibberish output instead:

from os import environ
environ['HF_TOKEN'] = '<my token>'
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "bartowski/gemma-2-2b-it-GGUF"
filename = "gemma-2-2b-it-Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

input_text = "What is your name?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

generated_ids = model.generate(input_ids, max_length=30)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(generated_text)

The output is:

Converting and de-quantizing GGUF tensors...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 288/288 [00:20<00:00, 14.24it/s]
What is your name?шTIMESเทศเทศเทศเทศ ITIS ITIS ITIS ITIS ITIS ITIS ITIS ITISنسMenuView stayingLleg GetEnumeratorsedes})*/})*/})*/})*/

@SunMarc
Copy link
Member

SunMarc commented Aug 13, 2024

Hey @alllexx88, you also need to define the tokenizer for gemma 2. Have a look at how qwen2 gguf was added : https://github.com/huggingface/transformers/pull/31175/files

@PolRF
Copy link

PolRF commented Sep 2, 2024

May I submit a PR for this issue? @alllexx88 @SunMarc

@SunMarc
Copy link
Member

SunMarc commented Sep 3, 2024

If you have a working solution @PolRF, feel free to submit a PR !

@alllexx88
Copy link
Author

I second @SunMarc 's words, @PolRF . I haven't managed to solve this, so a PR is very welcome!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Oct 6, 2024
@SunMarc
Copy link
Member

SunMarc commented Oct 7, 2024

I'm leaving this issue closed as we centralized gguf model addition request in this issue ! #33260

@FireAngelx
Copy link

@SunMarc I got a similar error while using vllm to deploy chatglm4-gguf:
raise ValueError(f"Architecture {architecture} not supported")
ValueError: Architecture chatglm not supported
However, llama.cpp seems fit all these models well. Could I understand that the GGUF part has not been completed yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants