Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests #22774

ydshieh · 2023-04-14T15:35:41Z

What does this PR do?

These 2 models require different input format than those of usual text models. See the relevant code block at the end.
The offline discussion with @NielsRogge is that these 2 models are only for DocQA pipeline, despite they have implementations for different head tasks.

Therefore, this PR removes these 2 models from being tested (pipeline) in the first place, instead of skipping them at later point.
IMO, we should also remove these models being used in the pipeline classes (except DocQA) if they are not going to work. But I don't do anything on this.

LayoutLMv3 with DocumentQuestionAnsweringPipeline (and the pipeline test) is still not working due to some issue. We need to discuss with @NielsRogge to see if it could be fixed, but it's out of this PR's scope.

relevant code block

transformers/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py

Lines 610 to 625 in daf5324

    
           if text_pair is not None: 
        
               # in case text + text_pair are provided, text = questions, text_pair = words 
        
               if not _is_valid_text_input(text): 
        
                   raise ValueError("text input must of type `str` (single example) or `List[str]` (batch of examples). ") 
        
               if not isinstance(text_pair, (list, tuple)): 
        
                   raise ValueError( 
        
                       "Words must be of type `List[str]` (single pretokenized example), " 
        
                       "or `List[List[str]]` (batch of pretokenized examples)." 
        
                   ) 
        
           else: 
        
               # in case only text is provided => must be words 
        
               if not isinstance(text, (list, tuple)): 
        
                   raise ValueError( 
        
                       "Words must be of type `List[str]` (single pretokenized example), " 
        
                       "or `List[List[str]]` (batch of pretokenized examples)." 
        
                   )

HuggingFaceDocBuilderDev · 2023-04-14T15:50:41Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2023-04-14T16:05:11Z

tests/models/layoutlmv3/test_modeling_layoutlmv3.py

+        # `DocumentQuestionAnsweringPipeline` is expected to work with this model, but it combines the text and visual
+        # embedding along the sequence dimension (dim 1), which causes an error during post-processing as `p_mask` has
+        # the sequence dimension of the text embedding only.
+        # (see the line `embedding_output = torch.cat([embedding_output, visual_embeddings], dim=1)`)


cc @NielsRogge We might need to discuss this at some point.

ydshieh · 2023-04-14T16:07:52Z

tests/models/layoutlmv2/test_modeling_layoutlmv2.py

-
-        return super().is_pipeline_test_to_skip(
-            pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name
-        )


remove all these - we just don't add these to be tested in the first place

ydshieh · 2023-04-14T16:10:30Z

tests/pipelines/test_pipelines_question_answering.py

+    if model_mapping is not None:
+        model_mapping = {config: model for config, model in model_mapping.items() if config.__name__ in _TO_SKIP}
+    if tf_model_mapping is not None:
+        tf_model_mapping = {config: model for config, model in tf_model_mapping.items() if config.__name__ in _TO_SKIP}


This is to avoid the changes of pipeline_model_mapping in this PR being reverted next time we update the repository with the script in add_pipeline_model_mapping_to_test.py

This also makes things more explicit: those models are not for this pipeline (test)

sgugger

Thanks for cleaning this up. It looks like the models shouldn't have been added in the auto-mappings then, if they don't have a consistent API to be used in the pipeline. But that's too late to change this now!

ydshieh · 2023-04-17T15:44:58Z

Merge now as this PR only touches tests. Feel free to leave a comment if any @NielsRogge

…ngface#22774) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix

48f733d

fix

72b0690

ydshieh changed the title ~~Don't use LayoutLMv[23]~~ Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests Apr 14, 2023

ydshieh commented Apr 14, 2023

View reviewed changes

fix

50c7aee

ydshieh requested review from NielsRogge and sgugger April 14, 2023 16:27

sgugger approved these changes Apr 14, 2023

View reviewed changes

ydshieh merged commit 5269718 into main Apr 17, 2023

ydshieh deleted the cleanup_layoutlm_tests branch April 17, 2023 15:45

ydshieh mentioned this pull request Apr 21, 2023

Update tiny models and a few fixes #22928

Merged

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests (huggi…

c6cb8fb

…ngface#22774) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests #22774

Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests #22774

ydshieh commented Apr 14, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 14, 2023 •

edited

Loading

ydshieh Apr 14, 2023

ydshieh Apr 14, 2023

ydshieh Apr 14, 2023

ydshieh Apr 14, 2023

sgugger left a comment

ydshieh commented Apr 17, 2023

	if text_pair is not None:
	# in case text + text_pair are provided, text = questions, text_pair = words
	if not _is_valid_text_input(text):
	raise ValueError("text input must of type `str` (single example) or `List[str]` (batch of examples). ")
	if not isinstance(text_pair, (list, tuple)):
	raise ValueError(
	"Words must be of type `List[str]` (single pretokenized example), "
	"or `List[List[str]]` (batch of pretokenized examples)."
	)
	else:
	# in case only text is provided => must be words
	if not isinstance(text, (list, tuple)):
	raise ValueError(
	"Words must be of type `List[str]` (single pretokenized example), "
	"or `List[List[str]]` (batch of pretokenized examples)."
	)

Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests #22774

Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests #22774

Conversation

ydshieh commented Apr 14, 2023 • edited Loading

What does this PR do?

relevant code block

HuggingFaceDocBuilderDev commented Apr 14, 2023 • edited Loading

ydshieh Apr 14, 2023

Choose a reason for hiding this comment

ydshieh Apr 14, 2023

Choose a reason for hiding this comment

ydshieh Apr 14, 2023

Choose a reason for hiding this comment

ydshieh Apr 14, 2023

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

ydshieh commented Apr 17, 2023

Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests #22774

Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests #22774

ydshieh commented Apr 14, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 14, 2023 •

edited

Loading