TableNet-pytorch

Pytorch Implementation of TableNet Research Paper : https://arxiv.org/abs/2001.01469

Description

In this project we will implement an end-to-end Deep learning architecture which will not only localize the Table in an image, but will also generate structure of Table by segmenting columns in that Table. After detecting Table structure from the image, we will use Pytesseract OCR package to read the contents of the Table.

To know more about the approach, refer my medium blog post,

Part 1: https://asagar60.medium.com/tablenet-deep-learning-model-for-end-to-end-table-detection-and-tabular-data-extraction-from-b1547799fe29

Part 2: https://asagar60.medium.com/tablenet-deep-learning-model-for-end-to-end-table-detection-and-tabular-data-extraction-from-a49ac4cbffd4

Data

We will use both Marmot and Marmot Extended dataset for Table Recognition. Marmot dataset contains Table bounding box coordinates and extended version of this dataset contains Column bounding box coordinates.

Marmot Dataset : https://www.icst.pku.edu.cn/cpdp/docs/20190424190300041510.zip Marmot Extended dataset : https://drive.google.com/drive/folders/1QZiv5RKe3xlOBdTzuTVuYRxixemVIODp

Download processed Marmot dataset: https://drive.google.com/file/d/1irIm19B58-o92IbD9b5qd6k3F31pqp1o/view?usp=sharing

Model

We will use DenseNet121 as encoder and build model upon it.

Trainable Params

Download saved model : https://drive.google.com/file/d/1TKALmlwUM_n4gULh6A6Q35VPRUpWDmJZ/view?usp=sharing

Performance compared to other encoder models ( Resnet18, EfficientNet-B0, EfficientNet-B1, VGG19 )

Table Detection - F1

Table Detection - Loss

Column Detection - F1

Column Detection - Loss

Predictions

Predictions from the model

After fixing table mask using contours

After fixing column mask using contours

After processing it through pytesseract

Deployed application

https://vimeo.com/577282006

Future Work

Deploy this application on a remote server using AWS /StreamLit sharing/heroku.
Model Quantization for faster inference time.
Train for more epochs and compare the performances.
Increase data size by adding data from ICDAR 2013 Table recognition dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
deployment		deployment
images		images
performance_analysis		performance_analysis
training		training
training_data_processing		training_data_processing
video		video
LICENSE		LICENSE
README.md		README.md
predict.ipynb		predict.ipynb
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TableNet-pytorch

Description

Data