Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading warning message about padding_side when passing tokenizer instance to pipeline #29379

Closed
4 tasks
tilmanbeck opened this issue Feb 29, 2024 · 3 comments
Closed
4 tasks

Comments

@tilmanbeck
Copy link

System Info

System Info
Python=3.11.5
Transformers= '4.37.2'

In my setup I am initializing a tokenizer and want to pass it to the pipeline. My expectation is that if I set the padding_side directly on tokenizer instance, the pipeline should not print the warning A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set 'padding_side="left"' when initializing the tokenizer.

Alternatively, I would like to add it as a parameter to the pipeline instantiation, but I think passing tokenizer parameters in the pipeline generator is currently not envisioned, as discussed in #12039 #24707 #22995

Possibly similar issue origin as reported in #29378

@Narsil

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, Conversation

msg = 'The capital of France '
modelname = 'microsoft/DialoGPT-small'

# init model and tokenizer
model = AutoModelForCausalLM.from_pretrained(modelname)
tokenizer = AutoTokenizer.from_pretrained(modelname, padding_side='left')
tokenizer.pad_token_id = tokenizer.eos_token_id
#  hand over model & tokenizer instances to pipeline
chatbot = pipeline(task='conversational', model=model, tokenizer=tokenizer, framework='pt')
messages = ([{"role": "system", "content": 'You are a helpful assistant'}, {"role": "user", "content": msg}])
response = chatbot(Conversation(messages=messages), pad_token_id=chatbot.tokenizer.eos_token_id)

This prints the warning message:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.

Expected behavior

The warning message should not be printed as padding_side was set during tokenizer initialization.
Alternatively, I would expect to be able to provide padding_side parameter directly to the pipeline generation.

@ArthurZucker
Copy link
Collaborator

cc @Rocketknight1

Copy link

github-actions bot commented Apr 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker
Copy link
Collaborator

#29614 fixed it so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants