Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests #22774

Merged
merged 3 commits into from
Apr 17, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 14, 2023

What does this PR do?

These 2 models require different input format than those of usual text models. See the relevant code block at the end.
The offline discussion with @NielsRogge is that these 2 models are only for DocQA pipeline, despite they have implementations for different head tasks.

Therefore, this PR removes these 2 models from being tested (pipeline) in the first place, instead of skipping them at later point.
IMO, we should also remove these models being used in the pipeline classes (except DocQA) if they are not going to work. But I don't do anything on this.

LayoutLMv3 with DocumentQuestionAnsweringPipeline (and the pipeline test) is still not working due to some issue. We need to discuss with @NielsRogge to see if it could be fixed, but it's out of this PR's scope.

relevant code block

if text_pair is not None:
# in case text + text_pair are provided, text = questions, text_pair = words
if not _is_valid_text_input(text):
raise ValueError("text input must of type `str` (single example) or `List[str]` (batch of examples). ")
if not isinstance(text_pair, (list, tuple)):
raise ValueError(
"Words must be of type `List[str]` (single pretokenized example), "
"or `List[List[str]]` (batch of pretokenized examples)."
)
else:
# in case only text is provided => must be words
if not isinstance(text, (list, tuple)):
raise ValueError(
"Words must be of type `List[str]` (single pretokenized example), "
"or `List[List[str]]` (batch of pretokenized examples)."
)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 14, 2023

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh changed the title Don't use LayoutLMv[23] Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests Apr 14, 2023
# `DocumentQuestionAnsweringPipeline` is expected to work with this model, but it combines the text and visual
# embedding along the sequence dimension (dim 1), which causes an error during post-processing as `p_mask` has
# the sequence dimension of the text embedding only.
# (see the line `embedding_output = torch.cat([embedding_output, visual_embeddings], dim=1)`)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @NielsRogge We might need to discuss this at some point.


return super().is_pipeline_test_to_skip(
pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all these - we just don't add these to be tested in the first place

if model_mapping is not None:
model_mapping = {config: model for config, model in model_mapping.items() if config.__name__ in _TO_SKIP}
if tf_model_mapping is not None:
tf_model_mapping = {config: model for config, model in tf_model_mapping.items() if config.__name__ in _TO_SKIP}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid the changes of pipeline_model_mapping in this PR being reverted next time we update the repository with the script in add_pipeline_model_mapping_to_test.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also makes things more explicit: those models are not for this pipeline (test)

@ydshieh ydshieh requested review from NielsRogge and sgugger April 14, 2023 16:27
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up. It looks like the models shouldn't have been added in the auto-mappings then, if they don't have a consistent API to be used in the pipeline. But that's too late to change this now!

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 17, 2023

Merge now as this PR only touches tests. Feel free to leave a comment if any @NielsRogge

@ydshieh ydshieh merged commit 5269718 into main Apr 17, 2023
@ydshieh ydshieh deleted the cleanup_layoutlm_tests branch April 17, 2023 15:45
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
…ngface#22774)

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants