-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving T5 Docs #16614
Comments
Hi, Thanks for your question! To be honest it wasn't clear for me neither, I guess it's set as otherwise it might complain that no padding token is set. I took that snippet from this PR: #7552. It includes the comment:
However, that was for a decoder-only model (GPT-2). Not sure whether the same is required for an encoder-decoder one like T5. Maybe @patrickvonplaten can clarify here. |
Thanks @NielsRogge, this would be very helpful indeed. I did the following checks this morning:
The results are vastly different. For 1 we get: {
"1_00000": {
"0": {
"utterance": "hi, could you get me a restaurant booking on the 8th please?",
"Restaurants_2": {
"predicted_str": " [states] 10:the 8th [intents] i1 [req_slots] <EOS>"
}
},
"1": {
"utterance": "could you get me a reservation at p.f. chang's in corte madera at afternoon 12?",
"Restaurants_2": {
"predicted_str": " [states] 0:corte madera 2:the 8th 9:p.f. chang's 10:afternoon 12 [intents] i1 [req_slots] <EOS>"
}
} For 2 we get: {
"1_00000": {
"0": {
"utterance": "hi, could you get me a restaurant booking on the 8th please?",
"Restaurants_2": {
"predicted_str": " [states] 10:the 8th [intents] i1 [req_slots] i1 [req_slots] i1 [req_slots] i1 [req_slots] i1 [req_slots] i1 [req_slots] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] <EOS>"
}
},
"1": {
"utterance": "could you get me a reservation at p.f. chang's in corte madera at afternoon 12?",
"Restaurants_2": {
"predicted_str": " [states] 0:corte madera 2:the 8th 9:p.f. chang's 10:afternoon 12 [intents] i1 [req_slots] i1 [req_slots] i1 [req_slots] i1 [req_slots] i1 [req_slots] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] [intents] 0 [intents] i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i1 i <EOS>"
}
},
} The results above are obtained with I'll try to figure out why this occurs. I should state the I can make the code/checkpoints to replicate the above issues available to you for debugging. I'll keep you posted. Note: The results in 2. are after manually post-processing to remove many |
Ok, I debugged my code and the preliminary test passed. The change has been to change the output_seqs = model.generate(
input_ids=input_ids.to(DEVICE),
attention_mask=attention_mask.to(DEVICE),
max_length=args.decoder_max_seq_len,
use_cache=True,
) to output_seqs = model.generate(
input_ids=input_ids.to(DEVICE),
attention_mask=attention_mask.to(DEVICE),
max_length=args.decoder_max_seq_len,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
use_cache=True,
) I'm not sure if this is an omission in the docs or why this is a fix. An explanation would be appreciated! Maybe something note worthy is that the last id in the output_strings = tokenizer.batch_decode(output_seqs)
output_strings = remove_padding(output_strings) where def remove_padding(output_strings: list[str], pad_token: str) -> list(str):
padding_free = []
for s in output_strings:
pad_token_start = s.find(pad_token)
while pad_token_start != -1:
s = f"{s[:pad_token_start]}{s[pad_token_start+len(pad_token):].lstrip()}"
pad_token_start = s.find(pad_token)
padding_free.append(f"{s} {pad_token}")
return padding_free By contrast, the implementation that does not use batching uses the call: output_seqs = model.generate(
input_ids.to(DEVICE),
max_length=args.decoder_max_seq_len,
do_sample=False,
temperature=1.0,
use_cache=True,
num_beams=1,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
early_stopping=True,
) and postprocessing is simply output_strings = tokenizer.decode(output_seqs[0]) |
Good point @alexcoca , @NielsRogge @alexcoca Yeah this looks like it was a bad copy-paste from GPT2. Should be corrected here: #16646 |
Thanks @patrickvonplaten! So from your PR I understand that it is not necessary to set the tokenizer padding to On a different note, I ran a large scale test on batched inference. I get |
I could be related to #14859 (comment) actually |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Who can help
@NielsRogge @patrickvonplaten @sgugger
Documentation: @sgugger
Information
Model I am using : T5ForConditionalGeneration
The problem arises when using:
The tasks I am working on is:
To reproduce
#13240 is a really nice PR that adds a lot of clarity to the documentation. However, in the examples provided we read
and I feel more information could be given to explain users why this is necessary. I am currently attempting to do batched decoding with T5 and observing very strange outputs, and therefore I'm keen to understand if this is a problem. Very soon I will test to understand whether the strange behaviour is due to batching or not, but it would be great to enhance the docs to explain the error that would occur and why.
Expected behavior
One or two extra sentences to explain why we left-pad encoder input sequences with when doing batched decoding for T5.
The text was updated successfully, but these errors were encountered: