Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. #33498

Open
1 of 4 tasks
asmith26 opened this issue Sep 15, 2024 · 8 comments · Fixed by #33509
Labels

Comments

@asmith26
Copy link

asmith26 commented Sep 15, 2024

System Info

  • transformers version: 4.44.2
  • Platform: Linux-6.8.0-44-generic-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 0.24.7
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1+cu121 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No

Who can help?

speech models: @ylacombe, @eustlb
pipelines: @Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch 
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base.en",
    device="cpu",
    torch_dtype=torch.float32,
)

# https://github.com/openai/whisper/blob/main/tests/jfk.flac
pipe("./jfk.flac")

Expected behavior

This does return the expected:

{'text': ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.'}

But it also prints the following, so would be nice to fix/suppress:

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Thanks!

@asmith26 asmith26 added the bug label Sep 15, 2024
@asmith26
Copy link
Author

Related: openai/whisper#2335

@Rocketknight1
Copy link
Member

@asmith26 thanks for the issue! I've reproduced it here, will open a PR to fix in a sec.

@ritwikmishra
Copy link

I observed this when I was finetuning a LLM with ppo trainer. To resolve this warning I passed the attention mask as a named parameter to the generate function following this.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask,
  pad_token_id=tokenizer.eos_token_id
)

But then I observed an error which stated, "IndexError: too many indices for tensor of dimension 1" on the line of

lib/python3.9/site-packages/transformers/models/gemma/modeling_gemma.py
position_ids_expanded = position_ids[:, None, :].float() # let us call this line_e

I turned off the attention mask and using print statements before that line_e I inspected what is the ideal behavior of this line_e. The original warning was coming but i ignored it. I saw that position ids are being fed one by one. So to resolve this error I just unsqueezed the attention mask.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask.unsqueeze(0),
  pad_token_id=tokenizer.eos_token_id
)

and it worked fine.

@asmith26
Copy link
Author

asmith26 commented Nov 4, 2024

Thanks for your help with this @Rocketknight1. Just thought I'd mention I still seem to be getting the same warning (I'm currently running transformers == 4.47.0.dev0).

Thanks again!

@Rocketknight1
Copy link
Member

@asmith26 I'm not getting that warning when I run the code sample above anymore. Did you change anything about it?

@asmith26
Copy link
Author

asmith26 commented Nov 5, 2024

Interesting, thanks for the info @Rocketknight1

I've determined that if I add a chunk_length_s=30 (i.e. outputs = pipe("./jfk.flac", chunk_length_s=30) following this tutorial), I get The attention mask is not set and....

Happy to remove this argument for my need. Thanks again! :)

@Rocketknight1
Copy link
Member

That's still potentially an issue we should address, though! Even though you've found a fix, I'll reopen to make sure we don't lose track

Copy link

github-actions bot commented Dec 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@eustlb eustlb reopened this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants