You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my setup I am initializing a tokenizer and want to pass it to the pipeline. My expectation is that if I set the pad_token_id directly on tokenizer instance, the pipeline should not print the warning Setting pad_token_id to eos_token_id:50256 for open-end generation. If I pass the pad_token_id directly to the pipeline's __call__, the warning is not printed. However, I would rather like to set the pad_token_id directly rather than having to think about everytime I use the __call__ method of the pipeline.
Alternatively, I would like to add it as a parameter to the pipeline instantiation, but I think passing tokenizer parameters in the pipeline generator is currently not envisioned, as discussed in #12039#24707#22995
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Here is a minimal code example to demonstrate the difference:
from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, Conversation
msg = 'The capital of France '
modelname = 'microsoft/DialoGPT-small'
# hand over model & tokenizer instantiations to pipeline
model = AutoModelForCausalLM.from_pretrained(modelname)
tokenizer = AutoTokenizer.from_pretrained(modelname)
tokenizer.pad_token_id = tokenizer.eos_token_id
chatbot = pipeline(task='conversational', model=model, tokenizer=tokenizer, framework='pt')
messages = ([{"role": "system", "content": 'You are a helpful assistant'}, {"role": "user", "content": msg}])
response = chatbot(Conversation(messages=messages))
This prints the warning message Setting 'pad_token_id' to 'eos_token_id':50256 for open-end generation.
The following code does not:
from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, Conversation
msg = 'The capital of France '
modelname = 'microsoft/DialoGPT-small'
chatbot = pipeline(task='conversational', model=modelname, framework='pt')
messages = ([{"role": "system", "content": 'You are a helpful assistant'}, {"role": "user", "content": msg}])
response = chatbot(Conversation(messages=messages), pad_token_id=chatbot.tokenizer.eos_token_id)
Expected behavior
I would expect the first code snippet not to print the warning as the pad_token_id is directly set on the tokenizer instance
The text was updated successfully, but these errors were encountered:
The root issue is that the tokenizer and the model are two separate objects, so pad_token_id needs to be set in both. In pipeline, IMO model should inherit pad_token_id from tokenizer when it is not set there. Opening a PR to fix that :)
System Info
Python=3.11.5
Transformers= '4.37.2'
In my setup I am initializing a tokenizer and want to pass it to the pipeline. My expectation is that if I set the
pad_token_id
directly on tokenizer instance, the pipeline should not print the warningSetting pad_token_id to eos_token_id:50256 for open-end generation.
If I pass thepad_token_id
directly to the pipeline's__call__
, the warning is not printed. However, I would rather like to set thepad_token_id
directly rather than having to think about everytime I use the__call__
method of the pipeline.Alternatively, I would like to add it as a parameter to the pipeline instantiation, but I think passing tokenizer parameters in the pipeline generator is currently not envisioned, as discussed in #12039 #24707 #22995
@Narsil
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Here is a minimal code example to demonstrate the difference:
This prints the warning message
Setting 'pad_token_id' to 'eos_token_id':50256 for open-end generation.
The following code does not:
Expected behavior
I would expect the first code snippet not to print the warning as the
pad_token_id
is directly set on the tokenizer instanceThe text was updated successfully, but these errors were encountered: