Skip to content

Latest commit

 

History

History
236 lines (198 loc) · 8.94 KB

sotas_tsr.md

File metadata and controls

236 lines (198 loc) · 8.94 KB

Table Structure Recognition

SOTAs

This page contains performance on public benchmarks of visual information extraction alogorithms. Data are collected from papers & official code repositories.

🎖️Commonly Used Metrics

F1-score

For comparing two cell structures, we use a method inspired by Hurst’s proto-links: for each table region we generate a list of adjacency relations between each content cell and its nearest neighbour in horizontal and vertical directions. No adjacency relations are generated between blank cells or a blank cell and a content cell. This 1-D list of adjacency relations can be compared to the ground truth by using precision and recall measures. If both cells are identical and the direction matches, then it is marked as correctly retrieved; otherwise it is marked as incorrect. Using neighbourhoods makes the comparison invariant to the absolute position of the table (e.g. if everything is shifted by one cell) and also avoids ambiguities arising with dealing with different types of errors (merged/split cells, inserted empty column, etc.).

$$ precision = \frac{correct adjacency relations}{total adjacency relations} $$

$$ recall = \frac{correct adjacency relations}{detected adjacency relations} $$

$$ F1 = \frac{2 \times precision \times recall}{precision + recall} $$

The SciTSR library can be used to calculate the F1-score between tables.

TREE-EDIT-DISTANCE-BASED SIMILARITY

Tables are presented as a tree structure in the HTML format.The root has two children thead and tbody, which group table headers and table body cells, respectively. The children of thead and tbody nodes are table rows (tr). The leaves of the tree are table cells (td). Each cell node has three attributes, i.e. ‘colspan’, ‘rowspan’, and ‘content’. We measure the similarity between two tables using the tree-edit distance proposed by Pawlik and Augsten. The cost of insertion and deletion operations is 1. When the edit is substituting a node no with ns, the cost is 1 if either no or ns is not td. When both no and ns are td, the substitution cost is 1 if the column span or the row span of no and ns is different. Otherwise, the substitution cost is the normalized Levenshtein similarity (∈ [0, 1]) between the content of no and ns.Finally, TEDS between two trees is computed as

$$ TEDS(Ta, Tb) = 1 - \frac{EditDist(Ta, Tb)}{max(|Ta|, |Tb|)} $$

where $EditDist$ denotes tree-edit distance, and $|T|$ is the number of nodes in $T$. The table recognition performance of a method on a set of test samples is defined as the mean of the TEDS score between the recognition result and ground truth of each sample

The PubTabNet library can be used to calculate the TEDS between tables.


🗒️List of Index


PubTabNet

PubTabNet is automatically generated by matching the XML and PDF representations of the scientific articles in PubMed CentralTM Open Access Subset (PMCOA). It takes TEDS and TEDS-S as the evaluation metric, where TEDS-S refers to the TEDS result that ignoring the text contents.

Approach Training Dataset TEDS(%) TEDS-S(%)
TSRFormer PubTabNet - 97.5
RobusTabNet PubTabNet - 97.0
TRUST PubTabNet 96.2 97.1
TableFormer PubTabNet 93.6 96.75
TableMaster PubTabNet - 96.76
EDD PubTabNet 88.3 -
LGPMA PubTabNet 94.6 96.7

SciTSR

SciTSR is a large-scale table structure recognition dataset, which contains 15,000 tables in PDF format and their corresponding high quality structure labels obtained from LaTeX source files.

Approach Training Dataset Precision Recall F1
TSRFormer SciTSR 99.5 99.4 99.4
RobusTabNet SciTSR 99.4 99.1 99.3
SEM SciTSR 97.70 96.52 97.11
NCGM SciTSR 99.7 99.6 99.6
FLAGNet SciTSR 99.7 99.3 99.5
LGPMA SciTSR 98.2 99.3 98.8

ICDAR2013

These documents have been collected systematically from the European Union and US Government websites, and we therefore expect them to have public domain status. Each PDF document is accompanied by three XML (or CSV) file containing its ground truth in the following models:

table regions (for evaluating table location) cell structures (for evaluating table structure recognition) functional representation (for evaluating table interpretation)

The dataset can be downloaded from here.

Approach Training Dataset Precision Recall F1
SPLERGE ICDAR2013 94.64 95.89 95.26
LGPMA SciTSR+ICDAR2013 96.7 99.1 97.9

ICDAR2019

Two new datasets consisting of modern and archival documents have been prepared for cTDaR 2019. The historical dataset contains contributions from more than 23 institutions around the world. The images show a great variety of tables from hand-drawn accounting books to stock exchange lists and train timetables, from record books to prisoner lists, simple tabular prints in books, production census and many more. The modern dataset comes from different kinds of PDF documents such as scientific journals, forms, financial statements, etc. The dataset contains of Chinese and English documents with various formats, including document images and born-digital format. The annotated contents contain the table entities and cell entities in a document.


WTW

WTW, which has a total of 14581 images in a wide range of real business scenarios and the corresponding full annotation (including cell coordinates and row/column information) of tables. The images in the WTW dataset are mainly collected from the natural images that contain at least one table. As our purpose is to parsing table structures without considering the image source, we additionally add the archival document images and the printed document images. Statically, the portion of images from natural scenes, archival, and printed document images are 50%, 30%, and 20%. After obtaining all the images, we statically found 7 challenging cases.WTW dataset covers all challenging cases with a reasonable proportion of each case.

Approach Training Dataset Precision Recall F1
TSRFormer WTW 93.7 93.2 93.4
NCGM WTW 94.7 95.5 95.1