AttributeError: 'GenerationConfig' object has no attribute 'task_to_id' #25084

AmgadHasan · 2023-07-25T14:52:27Z

System Info

I am following the Audio course course and tried to perform translation using the automatic speech recognition pipeline but got a weird error.

Code:

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model='arbml/whisper-largev2-ar', device=0)

res = asr(
    audio_file_path,
    max_new_tokens=256,
    generate_kwargs={"task": "translate"},
    chunk_length_s=30,
    batch_size=8,
)

Error:
AttributeError: 'GenerationConfig' object has no attribute 'task_to_id'

This was using Colab free tier on T4
transformers version:

import transformers
transformers.__version__
>>> '4.31.0'

This error arises when using generate_kwargs={"task": "translate"} or generate_kwargs={"task": "transcribe"}

Tagging @Narsil to help with pipeline issues.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model='arbml/whisper-largev2-ar', device=0)

res = asr(
audio_file_path,
max_new_tokens=256,
generate_kwargs={"task": "translate"},
chunk_length_s=30,
batch_size=8,
)

Expected behavior

Should return a python dict with key named text that holds the English text.

The text was updated successfully, but these errors were encountered:

sgugger · 2023-07-25T15:09:37Z

Also cc @sanchit-gandhi since it comes from the audio course.

Narsil · 2023-07-25T15:10:30Z

Can you link the full stacktrace if possible ? This might help us narrow it down faster.

sanchit-gandhi · 2023-07-25T15:39:18Z

+1 on the full stack-trace. It might require an update to your generation config since this is a fine-tuned checkpoint and the API was updated to take the task/language as arguments rather than as from the config's forced_decoder_ids (see #21878 (comment) for details)

AmgadHasan · 2023-07-25T15:59:17Z

@sanchit-gandhi
@Narsil

Here's a colab notebook to reproduce the error

https://colab.research.google.com/drive/1kLjKWZSKmvPwBqnaN-NJxy6Hv4gG5oDJ?usp=sharing

sanchit-gandhi · 2023-07-27T14:20:38Z

Thanks for the notebook @AmgadHasan! The generation config for this model is indeed missing, meaning it is created automatically from the config in the call to .generate, and is only populated with some basic information:

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model='arbml/whisper-largev2-ar')
print(asr.model.generation_config)

Print Output:

GenerationConfig {
  "_from_model_config": true,
  "begin_suppress_tokens": [
    220,
    50257
  ],
  "bos_token_id": 50257,
  "decoder_start_token_id": 50258,
  "eos_token_id": 50257,
  "max_length": 448,
  "pad_token_id": 50257,
  "transformers_version": "4.32.0.dev0",
  "use_cache": false
}

If we compare this to the most recent generation config, i.e the one for Whisper large-v2, we see that the generation config is missing many both the language and task token id mappings:

GenerationConfig {
  "begin_suppress_tokens": [
    220,
    50257
  ],
 ...
  "task_to_id": {
    "transcribe": 50359,
    "translate": 50358
  },
 ...
}

These language/task token mappings are used in the call to .generate to get the correct language/task token ids respectively:

transformers/src/transformers/models/whisper/modeling_whisper.py

Line 1691 in a1c4954

    
           forced_decoder_ids.append((2, generation_config.task_to_id[generation_config.task]))

Since using the language/task arguments as input to the .generate method was added with the update to the generation config, these are new features that only work with the updated generation config.

Probably what we can do here @ArthurZucker is throw an error when the user tries to call .generate and passes the language/task arguments but the generation config is missing the language/task token ids mapping? Happy to open a PR to fix this

A quick fix for this issue @AmgadHasan is updating the generation config for the model checkpoint (as per my previous comment)

AmgadHasan · 2023-07-29T20:59:39Z

Thanks for the notebook @AmgadHasan! The generation config for this model is indeed missing, meaning it is created automatically from the config in the call to .generate, and is only populated with some basic information:
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model='arbml/whisper-largev2-ar')
print(asr.model.generation_config)
Print Output:
GenerationConfig {
  "_from_model_config": true,
  "begin_suppress_tokens": [
    220,
    50257
  ],
  "bos_token_id": 50257,
  "decoder_start_token_id": 50258,
  "eos_token_id": 50257,
  "max_length": 448,
  "pad_token_id": 50257,
  "transformers_version": "4.32.0.dev0",
  "use_cache": false
}
If we compare this to the most recent generation config, i.e the one for Whisper large-v2, we see that the generation config is missing many both the language and task token id mappings:
GenerationConfig {
  "begin_suppress_tokens": [
    220,
    50257
  ],
 ...
  "task_to_id": {
    "transcribe": 50359,
    "translate": 50358
  },
 ...
}
These language/task token mappings are used in the call to .generate to get the correct language/task token ids respectively:

transformers/src/transformers/models/whisper/modeling_whisper.py

Line 1691 in a1c4954

forced_decoder_ids.append((2, generation_config.task_to_id[generation_config.task]))

Since using the language/task arguments as input to the .generate method was added with the update to the generation config, these are new features that only work with the updated generation config.

Probably what we can do here @ArthurZucker is throw an error when the user tries to call .generate and passes the language/task arguments but the generation config is missing the language/task token ids mapping? Happy to open a PR to fix this

A quick fix for this issue @AmgadHasan is updating the generation config for the model checkpoint (as per my previous comment)

Thanks @sanchit-gandhi ! This solved the issue.

sanchit-gandhi · 2023-08-03T17:55:15Z

The simplest way of updating the generation config is as follows:

from transformers import GenerationConfig

MODEL_ID = "arbml/whisper-largev2-ar"  # set to your model id on the Hub
MULTILINGUAL = True  # set True for multilingual models, False for English-only

if MULTILINGUAL:
    generation_config = GenerationConfig.from_pretrained("openai/whisper-large-v2")
else:
    generation_config = GenerationConfig.from_pretrained("openai/whisper-medium.en")

generation_config.push_to_hub(model_id)

AmgadHasan closed this as completed Jul 29, 2023

sanchit-gandhi mentioned this issue Aug 3, 2023

[Whisper] Better error message for outdated generation config #25298

Merged

sanchit-gandhi linked a pull request Aug 3, 2023 that will close this issue

[Whisper] Better error message for outdated generation config #25298

Merged

lawik mentioned this issue Oct 24, 2023

How to load .en variants under recent versions? elixir-nx/bumblebee#267

Closed

velaia mentioned this issue Dec 11, 2023

mac m1 pro - python3.12 install errors Vaibhavs10/insanely-fast-whisper#75

Closed

gante mentioned this issue Jan 11, 2024

Attribute Error: 'GenerationConfig' object has no attribute 'lang_to_id' #28444

Closed

4 tasks

josuebatista mentioned this issue Mar 4, 2024

267-distil-whisper-asr.ipynb - generation config for "distil-whisper/distil-medium.en" and "distil-whisper/distil-small.en" are missing 'lang_to_id' attributes openvinotoolkit/openvino_notebooks#1783

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'GenerationConfig' object has no attribute 'task_to_id' #25084

AttributeError: 'GenerationConfig' object has no attribute 'task_to_id' #25084

AmgadHasan commented Jul 25, 2023

sgugger commented Jul 25, 2023

Narsil commented Jul 25, 2023

sanchit-gandhi commented Jul 25, 2023

AmgadHasan commented Jul 25, 2023

sanchit-gandhi commented Jul 27, 2023 •

edited

Loading

AmgadHasan commented Jul 29, 2023

sanchit-gandhi commented Aug 3, 2023

AttributeError: 'GenerationConfig' object has no attribute 'task_to_id' #25084

AttributeError: 'GenerationConfig' object has no attribute 'task_to_id' #25084

Comments

AmgadHasan commented Jul 25, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

sgugger commented Jul 25, 2023

Narsil commented Jul 25, 2023

sanchit-gandhi commented Jul 25, 2023

AmgadHasan commented Jul 25, 2023

sanchit-gandhi commented Jul 27, 2023 • edited Loading

AmgadHasan commented Jul 29, 2023

sanchit-gandhi commented Aug 3, 2023

sanchit-gandhi commented Jul 27, 2023 •

edited

Loading