Skip to content
/ Do-GOOD Public

Code for SIGIR 2023 Paper "Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models"

Notifications You must be signed in to change notification settings

MAEHCM/Do-GOOD

Repository files navigation

📚Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

Table of contents

Overview

The Do-GOOD warehouse is the analysis for document changes in the three modal distributions of image, layout and text. It covers the generation of nine kinds of OOD data, the application of five shifts, the acquisition of FUNSD-H and FUNSD-R datasets, the generation of FUNSD-L datasets, and the running of two kinds of OOD baseline methods Deep Core and Mixup codes under all shift.

The shift type of the Do GOOD dataset is shown in the following figure.

Requirement

This code is developed with

transformers              4.24.0 
pytesseract               0.3.9 
tesseract                 0.1.3     
textattack                0.3.7 
python                    3.9.11
yarl                      1.7.2
detectron2                0.6                         
editdistance              0.6.0                    
einops                    0.4.1

Installation

Installation for Project,if you need to study the robustness of the model to text shift, you need to install Textattack

git clone https://anonymous.4open.science/r/Do-GOOD-D88A && cd Do-GOOD

Datasets

We provide manually labeled FUNSD-H and FUNSD-R, which can be obtained from the links below, and methods for generating FUNSD-L, CDIP-L, CDIP-I1 and CDIP-I2 datasets.

Dataset Header Question Answer Other Total Link
FUNSD 122 1077 821 312 2332 download
FUNSD-H 126 981 755 380 2304 download
FUNSD-R 90 475 445 471 1487 download

Generate FUNSD-L

First generate strong and weak semantic entities and get the following files , /weak_other_map , /strong_answer_map , /strong_question_map , /weak_Q_map , /weak_A_map,We provide five strong and weak semantic entity libraries extracted from our shuffle layout method on the FUNSD test set for five different pre-training models ,You can choose to fill in v3, v2, v1, bros or lilt in { } and execute the following code

python map_{ }_funsd_L.py

Then modify the file path to generate FUNSD-L test data , which is saved in the mix_test.txt , you can modify the number of rows and columns generated by the layout, the size of the bounding box, the probability of random filling, and the number of documents generated

generate_ood_data("mix_test.txt", "/strong_question_map",
                  "/strong_answer_map", "/weak_Q_map",
                  "/weak_A_map", "/weak_other_map",50)
python gen_ood_mix.py

Generate CDIP-L

To facilitate use, we separately place it in the main directory, and adjust two parameters: lamda1 controls the horizontal distance, and lamda2 controls the vertical distance. We use the priority order of consolidation: horizontal first and then vertical

python merge_layout.py

Generate CDIP-I1

Separate text pixels and non text pixels in the document, and then overlay them into the natural scene MSCOCO

python python mixup_image.py

Generate CDIP-I2

Using pre-trained DocGeoNet(specific process reference), a forward propagation calculation of the normal document image is performed to get the distorted image, and then OCR again

python inference.py

Tuning and Testing

Tuning

Select the model used and the task fill, the first { } select v3, v2, v1, bros or lilt, the second {} selectfunsd or cdip

Finetune your own LayoutLMv3 model or download our finetuned model download,Select models and tasks to use

python -m torch.distributed.launch --nproc_per_node --use_env finetune_{ }_{ }.py --config config.yaml --output_dir

For VQA tasks, use the command line alone,fill in the selected model at { }

python docvqa_{}_main.py

Testing

Select the model used and the task fill, the first { } select v3, v2, v1, bros or lilt, the second { } selectfunsd or cdip , modify the following parameters to perform a shift operation on a mode.--text_aug,--image_aug,--aut_layout

--text_aug={'WordSwapMaskedLM','WordSwapEmbedding','WordSwapHomoglyphSwap','WordSwapChangeNumber','WordSwapRandomCharacterDeletion'} , --image_aug=True/False , --aug_layout=True/False
python demo_{ }_ood_{ }.py

Results

The ID and OOD performance of the existing models

image

Incremental training results on the FUNSD and CDIP datasets

Algorithm details

Visualize the results

About

Code for SIGIR 2023 Paper "Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages