This code repository contains the implementations of the paper End-to-End Compound Table Understanding with Multi-Modal Modeling (ACM MM 2022).
Original images and the whole formatted datalist can be downloaded from: ComFinTab.
The example of formatted datalist can be found in demo/table_understanding/datalist/
Modified the paths of "ann_file", "img_prefix", "pretrained_model" and "work_dir" in the config files demo/table_understanding/ctunet/configs_clean/ctunet_chn.py
.
Run the following bash command in the command line,
cd $DAVAR_LAB_OCR_ROOT$/demo/table_understanding/ctunet/
bash dist_train.sh
We provide the online evaluation support that can be used to pick models during training. Three evaluation metrics are supported:
-
"macro_f1": measures the accuracy of cell type classification. Calculate f1-score for each category separately and then average it.
-
"hard_f1": measures the similarity between the table items extraction results and ground truth. When two items are exactly the same, the similarity is 1. Otherwise 0.
-
"tree_f1": measures the similarity between the table items extraction results and ground truth. When two items are exactly the same, the similarity is 1. Otherwise the similarity is their TEDS scores.
In order to avoid the tables with a large number of cells affecting the overall evaluation indicators, the evaluation metrice are calculated based on each image and averaged in thie work. However, the 'macro_f1' used in online evaluation is still calculated based on traditional methods to speed up training.
We provide a demo of forward inference and evaluation on ConFinTab dataset. You can specify the paths of testing dataset(ann_file
, img_prefix
) and the paths to save the inference results in config.py, and start testing:
cd $DAVAR_LAB_OCR_ROOT$/demo/table_understanding/ctunet/
bash dist_test.sh
We provide a visualization tools which can be found in demo/table_understanding/tools/visualization.py
. You can specify the ann_file_path
, img_prefix
and vis_dir
, and start visualization:
python $DAVAR_LAB_OCR_ROOT$/demo/table_understanding/tools/visualization.py
Some visualization of cell types and the relationships are shown(paleturquoise nodes stands for the generated virtual node):
All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection. So, the results might be slightly different from reported results.
Results on various datasets and trained models download:
Dataset | Cell-F1 | Tree-P | Tree-R | Tree-F1 | link |
---|---|---|---|---|---|
ComFinTab-Chinese(reported) | 92.98 | 90.45 | 90.30 | 90.37 | |
ComFinTab-Chinese | 93.59 | 90.38 | 90.69 | 90.44 | config, pth (Access Code: jU2i) |
ComFinTab-English(reported) | 92.45 | 89.25 | 88.55 | 88.90 | |
ComFinTab-English | 92.75 | 89.39 | 88.63 | 88.84 | config, pth (Access Code: stWG) |
If you find this repository is helpful to your research, please feel free to cite us:
@inproceedings{li2022acmmm22,
title={End-to-End Compound Table Understanding with Multi-Modal Modeling},
author={Li, Zaisheng and Li, Yi and Liang, Qiao and Li, Pengfei and Cheng, Zhanzhan and Niu, Yi and Pu, Shiliang and Li, Xi},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={4112–4121},
year={2022}
}
This project is released under the Apache 2.0 license. The dataset of ComFinTab is under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.