Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pix2struct] Simplify generation #22527

Merged
merged 14 commits into from
Apr 13, 2023
Merged

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Apr 3, 2023

What does this PR do?

This PR aims to fix the warning that is currently printed out when generating text with Pix2Struct:

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

I see that all Pix2Struct models have config.is_encoder_decoder=False, but as Pix2Struct is an encoder-decoder model it'd be great/more logical to have this argument set to True and instead overwrite prepare_inputs_for_generation to have a cleaner way of generating text. This also makes us get rid of the warning.

To do:

  • for the moment there're still one integration test failing (test_batched_inference_image_captioning_conditioned):
AssertionError: 'An photography of the Temple Bar and a collection of other items.' != 'An photography of the Temple Bar and a few other places.'
E       - An photography of the Temple Bar and a collection of other items.
E       ?                                        ^^^^ ^^^^^^^^       ^^ -
E       + An photography of the Temple Bar and a few other places.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 3, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM 👍

Added a few comments regarding potential fixes/simplifications, conditional on our ability to retroactively fix them without impacting users.

tests/models/pix2struct/test_modeling_pix2struct.py Outdated Show resolved Hide resolved
decoder_input_ids = torch.cat(
if isinstance(input_ids, torch.Tensor):
# check if the first element of `input_ids` is equal to `input_ids`:
if (input_ids[:, 0] != self.config.decoder_start_token_id).all().item():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if wouldn't be needed if the tokenizer had the BOS token defined 😢 It's too late to change that, correct (users may have cloned the model)?

If it's too late to retroactively fix this, let's add add a comment about why we need this if :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @younesbelkada since the model isn't available in a new PyPi release yet, we can probably update this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a short window where we can do breaking changes yes, until next week.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part still need updating no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @younesbelkada here

@NielsRogge NielsRogge marked this pull request as ready for review April 7, 2023 17:55
@NielsRogge
Copy link
Contributor Author

PR is ready for review, however checkpoints on the hub will need to be updated (is_encoder_decoder = True) for this PR to be merged

@NielsRogge NielsRogge requested review from gante and sgugger and removed request for gante April 7, 2023 17:55
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM and we can make breaking changes to the model before the release (probably next week).

- [Fine-tuning Notebook](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb)
- [Fine-tuning Notebook on key-value pair dataset](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Pix2Struct/Fine_tune_Pix2Struct_on_key_value_pair_dataset_(PyTorch_Lightning).ipynb)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line or showcase a resource using our ecosystem.

decoder_input_ids = torch.cat(
if isinstance(input_ids, torch.Tensor):
# check if the first element of `input_ids` is equal to `input_ids`:
if (input_ids[:, 0] != self.config.decoder_start_token_id).all().item():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a short window where we can do breaking changes yes, until next week.

@NielsRogge
Copy link
Contributor Author

PR is ready, models on the hub don't need to be updated since they don't have is_encoder_decoder set on the model config level (i.e. Pix2StructConfig. They have set it only in Pix2StructTextConfig). cc @younesbelkada

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this Niels!

@NielsRogge NielsRogge marked this pull request as draft April 12, 2023 10:05
@NielsRogge NielsRogge marked this pull request as ready for review April 13, 2023 06:58
@sgugger sgugger merged commit 8eb38f6 into huggingface:main Apr 13, 2023
sgugger pushed a commit that referenced this pull request Apr 13, 2023
* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link
stevhliu added a commit that referenced this pull request Feb 26, 2024
* [Pix2struct] Simplify generation (#22527)

* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link

* Release: v4.28.0

* Revert (for now) the change on `Deta` in #22437 (#22750)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Patch release: v4.28.1

* update zh chat template.

* Update docs/source/zh/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/zh/_toctree.yml

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Michael <haifeng.yao@daocloud.io>
itazap pushed a commit that referenced this pull request May 14, 2024
* [Pix2struct] Simplify generation (#22527)

* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link

* Release: v4.28.0

* Revert (for now) the change on `Deta` in #22437 (#22750)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Patch release: v4.28.1

* update zh chat template.

* Update docs/source/zh/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/zh/_toctree.yml

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

* Update docs/source/zh/chat_templating.md

Co-authored-by: Michael <haifeng.yao@daocloud.io>

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Michael <haifeng.yao@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants