-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update test_batched_inference_image_captioning_conditioned
#23391
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR;
I think that we should educate users that for text-conditioned generation we should never add special tokens to the tokenizer - as introduced in #23004
I think here the model sees An photography of<|endoftext|>'
which leads it to generate weird output right after.
As users might check these tests as a reference, I think we should encode in the test the input without the special tokens (i.e. force add_special_tokens=True
) and document that somewhere (I can take care of that in a follow up PR)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in principle, that solution LGTM as well! You can merge that and I can address what I have said later
@younesbelkada If you can take over this PR to avoid |
The test should be now fixed, the generated text produces different output than before, probably due to #23051 that now made the model using a causal attention mask on the text decoder (which was not the case before) |
Thanks a lot! |
@younesbelkada The best place to do this I think is in the example docstring for the model, as this is what a lot of users will reference, and it currently doesn't do that. Could you open a PR to update this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
Changes look fine - my only concern is that the generations appear to have become worse. @younesbelkada @ydshieh do we have any other generation samples to make sure the model is behaving as expected?
Sure yes will do!
Yes! I was relieved since we do have the tests |
|
||
predictions = model.generate(**inputs) | ||
|
||
self.assertEqual( | ||
processor.decode(predictions[0], skip_special_tokens=True), "A picture of a stop sign that says yes." | ||
processor.decode(predictions[0], skip_special_tokens=True), | ||
"A picture of a stop sign with a red stop sign on it.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, seeing this image, I would say this new prediction is better than the previous one.
) | ||
|
||
self.assertEqual( | ||
processor.decode(predictions[1], skip_special_tokens=True), | ||
"An photography of the Temple Bar and a few other places.", | ||
"An photography of the Temple Bar and the Temple Bar.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not better, but I would say the previous about a few other places
is not super good neither.
I am going to merge this PR and leave @amyeroberts 's suggestion for @younesbelkada in a separate PR. Thank you for the review and the refine of this PR. |
…face#23391) * fix * fix * fix test + add more docs --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com>
…face#23391) * fix * fix * fix test + add more docs --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com>
What does this PR do?
The test
tests/models/pix2struct/test_modeling_pix2struct.py::Pix2StructIntegrationTest::test_batched_inference_image_captioning_conditioned
starts to fail on CI run ofApril 27
which includes the merged PR #23023.@younesbelkada Could you double check if the changes in this PR are reasonable? Thank you.