-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Closed
Closed
Copy link
Labels
Description
System Info
transformersversion: 4.56.2- Platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.31
- Python version: 3.10.14
- Huggingface_hub version: 0.35.0
- Safetensors version: 0.4.3
- Accelerate version: 1.10.1
- Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0,1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: [] - DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.8.0+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: distributed
- Using GPU in script?: yes
- GPU type: NVIDIA H100 PCIe
Who can help?
I am trying to use model.generate() with use_cache=True, but it raises the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[11], [line 1](vscode-notebook-cell:?execution_count=11&line=1)
----> [1](vscode-notebook-cell:?execution_count=11&line=1) output = model.generate(**masked_encoded_sequence, max_new_tokens=15*2+2, use_cache=True, do_sample=False)
File /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:120, in context_decorator.<locals>.decorate_context(*args, **kwargs)
[117](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:117) @functools.wraps(func)
[118](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:118) def decorate_context(*args, **kwargs):
[119](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:119) with ctx_factory():
--> [120](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:120) return func(*args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2399, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
[2393](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2393) if (
[2394](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2394) inputs_tensor.shape[1] != input_ids_length
[2395](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2395) and model_input_name == "inputs_embeds"
[2396](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2396) and not self.config.is_encoder_decoder
[2397](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2397) ):
[2398](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2398) max_cache_length += inputs_tensor.shape[1]
-> [2399](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2399) self._prepare_cache_for_generation(
[2400](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2400) generation_config, model_kwargs, assistant_model, batch_size, max_cache_length
[2401](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2401) )
[2403](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2403) # 8. determine generation mode
[2404](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2404) generation_mode = generation_config.get_generation_mode(assistant_model)
File /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2007, in GenerationMixin._prepare_cache_for_generation(self, generation_config, model_kwargs, assistant_model, batch_size, max_cache_length)
[1999](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1999) model_kwargs[cache_name] = DynamicCache(**dynamic_cache_kwargs)
[2001](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2001) # Use DynamicCache instance by default. This will avoid back and forth from legacy format that
[2002](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2002) # keeps copying the cache thus using much more memory
[2003](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2003) else:
[2004](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2004) model_kwargs[cache_name] = (
[2005](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2005) DynamicCache(**dynamic_cache_kwargs)
[2006](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2006) if not requires_cross_attention_cache
-> [2007](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2007) else EncoderDecoderCache(DynamicCache(**dynamic_cache_kwargs), DynamicCache(**dynamic_cache_kwargs))
[2008](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:2008) )
File /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1018, in DynamicCache.__init__(self, ddp_cache_data, config, offloading, offload_only_non_sliding)
[1014](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1014) layer_types = getattr(config, "layer_types", None)
[1015](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1015) if layer_types is None:
[1016](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1016) layer_types = [
[1017](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1017) "sliding_attention" if sliding_window is not None else "full_attention"
-> [1018](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1018) for _ in range(config.num_hidden_layers)
[1019](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1019) ]
[1020](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1020) # Some models have shared layers thus no cache is needed for them (e.g. Gemma3n)
[1021](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:1021) if hasattr(config, "num_kv_shared_layers"):
File /usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:207, in PretrainedConfig.__getattribute__(self, key)
[205](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:205) if key != "attribute_map" and key in super().__getattribute__("attribute_map"):
[206](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:206) key = super().__getattribute__("attribute_map")[key]
--> [207](https://vscode-remote+dev-002dcontainer-002b7b2273657474696e6754797065223a22636f6e7461696e6572222c22636f6e7461696e65724964223a22653136353536613237623564227d-0040ssh-002dremote-002bh100.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py:207) return super().__getattribute__(key)
AttributeError: 'T5GemmaConfig' object has no attribute 'num_hidden_layers'Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
encoder_config = T5GemmaModuleConfig(
vocab_size=33,
hidden_size=32,
intermediate_size=128,
num_hidden_layers=2,
num_attention_heads=4,
num_key_value_heads=4,
head_dim=32,
max_position_embeddings=1024, # noqa
tie_word_embeddings=False,
layer_types=["full_attention"] * 2,
rope_theta=10000,
bos_token_id=0,
eos_token_id=1,
pad_token_id=2,
)
decoder_config = T5GemmaModuleConfig(
vocab_size=33,
hidden_size=32,
intermediate_size=128,
num_hidden_layers=2,
num_attention_heads=4,
num_key_value_heads=4,
head_dim=32,
max_position_embeddings=1024, # noqa
tie_word_embeddings=False,
layer_types=["full_attention"] * 2,
rope_theta=10000,
bos_token_id=0,
eos_token_id=1,
pad_token_id=2,
)
t5_gemma_config = T5GemmaConfig(
encoder=encoder_config,
decoder=decoder_config,
vocab_size=33,
attn_implementation="eager",
)
model = T5GemmaForConditionalGeneration(t5_gemma_config)
model.generate(torch.randint(0, 33, (1, 10)), use_cache=True)Expected behavior
I expect it to work properly and handle whether the model is an encoder-decoder model or not, and check for num_hidden_layers in the config.decoder.num_hidden_layers in case the model is an encoder-decoder model.