Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

VSR

This code repository contains the implementations of the paper VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations (ICDAR2021).

Dataset Preparation

The demos are conducted on two public datasets: PubLayNet and DocBank. Due to the policy, you should download the original data and annoations from the official websites.

  • PubLayNet: PubLayNet is a large dataset of document images, of which the layout is annotated with both bounding boxes and polygonal segmentations. To perform multimodal layout analysis task, we also need to extract the annotations in the character granularity, in addition to layout component granularity. We provide demo examples in demo/text_layout/datalist/PubLayNet and one can get the above annotations through:
  • DocBank: DocBank is a new large-scale dataset that is constructed using a weak supervision approach. It enables models to integrate both the textual and layout information for downstream tasks. The current DocBank dataset totally includes 500K document pages, where 400K for training, 50K for validation and 50K for testing. Please download this dataset and convert annotations to Davar format (please refer to demo/text_layout/datalist/DocBank)

Please format the datalist as the form that davarocr uses according to instructions.

Train From Scratch

If you want to re-implement the model's performance from scratch, please following these steps:

1.Firstly, prepare the pretrained models:

2.Secondly, modify the paths in model config (demo/text_layout/VSR/PubLayNet/config/publaynet_x101.py or demo/text_layout/VSR/DocBank/config/docbank_x101.py.), including the pretrained models paths, images paths, work space, etc.

3.Thirdly, direct run demo/text_layout/VSR/PubLayNet/dist_train.sh or demo/text_layout/VSR/DocBank/dist_train.sh.

Test

Given the trained model, direct run demo/text_layout/VSR/PubLayNet/test.sh or demo/text_layout/VSR/DocBank/test.sh to test model.

Trained Model Download

All of the models are re-implemented and well trained based on the opensourced framework mmdetection. So, the results might be slightly different from reported results.

Trained models can be download as follows:

Dataset Backbone Pretrained Test Scale AP Links
PubLayNet (Reported) ResNext101 COCO (1300, 800) 95.7 -
PubLayNet ResNext101 COCO (1300, 800) 95.8 config, pth (Access Code: 8Rm1)
DocBank (Reported) ResNext101 COCO (600, 800) 95.59 -
DocBank ResNext101 COCO (600, 800) 95.25 config, pth (Access Code: 6T64 )

Citation

If you find this repository is helpful to your research, please feel free to cite us:

@inproceedings{zhang2020acmmm20,
  title={{VSR:} {A} Unified Framework for Document Layout Analysis Combining Vision, Semantics and Relations},
  author={Peng, Zhang and Can, Li and Liang, Qiao, and Zhanzhan, Cheng and Shiliang, Pu and Yi, Niu and Fei, Wu},
  booktitle={16th International Conference on Document Analysis and Recognition ({ICDAR})},
  pages={115--130},
  year={2021}
}

License

This project is released under the Apache 2.0 license

Contact

If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.