Misleading warning message about padding_side when passing tokenizer instance to pipeline #29379

tilmanbeck · 2024-02-29T20:58:41Z

System Info

System Info
Python=3.11.5
Transformers= '4.37.2'

In my setup I am initializing a tokenizer and want to pass it to the pipeline. My expectation is that if I set the padding_side directly on tokenizer instance, the pipeline should not print the warning A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set 'padding_side="left"' when initializing the tokenizer.

Alternatively, I would like to add it as a parameter to the pipeline instantiation, but I think passing tokenizer parameters in the pipeline generator is currently not envisioned, as discussed in #12039 #24707 #22995

Possibly similar issue origin as reported in #29378

@Narsil

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, Conversation

msg = 'The capital of France '
modelname = 'microsoft/DialoGPT-small'

# init model and tokenizer
model = AutoModelForCausalLM.from_pretrained(modelname)
tokenizer = AutoTokenizer.from_pretrained(modelname, padding_side='left')
tokenizer.pad_token_id = tokenizer.eos_token_id
#  hand over model & tokenizer instances to pipeline
chatbot = pipeline(task='conversational', model=model, tokenizer=tokenizer, framework='pt')
messages = ([{"role": "system", "content": 'You are a helpful assistant'}, {"role": "user", "content": msg}])
response = chatbot(Conversation(messages=messages), pad_token_id=chatbot.tokenizer.eos_token_id)

This prints the warning message:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.

Expected behavior

The warning message should not be printed as padding_side was set during tokenizer initialization.
Alternatively, I would expect to be able to provide padding_side parameter directly to the pipeline generation.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-03-07T10:29:06Z

cc @Rocketknight1

github-actions · 2024-04-01T08:04:07Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2024-04-01T12:38:16Z

#29614 fixed it so closing

Rocketknight1 mentioned this issue Mar 7, 2024

Misleading warning message about pad_token_id when passing tokenizer instance to pipeline #29378

Closed

4 tasks

ArthurZucker closed this as completed Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misleading warning message about padding_side when passing tokenizer instance to pipeline #29379

Misleading warning message about padding_side when passing tokenizer instance to pipeline #29379

tilmanbeck commented Feb 29, 2024

ArthurZucker commented Mar 7, 2024

github-actions bot commented Apr 1, 2024

ArthurZucker commented Apr 1, 2024

Misleading warning message about padding_side when passing tokenizer instance to pipeline #29379

Misleading warning message about padding_side when passing tokenizer instance to pipeline #29379

Comments

tilmanbeck commented Feb 29, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Mar 7, 2024

github-actions bot commented Apr 1, 2024

ArthurZucker commented Apr 1, 2024