TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism

Official implemenetation of "TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism" (IJCAI 2024)

📣 Latest Updates

💻 [17/01/2025] Release of TFLOP code!
🚀 [15/10/2024] Try out the enterprise-grade integration of TFLOP within Upstage’s Document Parse -- [Link]
⚡️ [03/08/2024] Presentation of TFLOP in IJCAI 2024 -- [Paper]

🚀 Getting Started

Installation

# Create a new conda environment with Python 3.9
conda create -n tflop python=3.9
conda activate tflop

# Clone the TFLOP repository
git clone https://github.com/UpstageAI/TFLOP

# Install required packages
cd TFLOP
pip install torch==2.0.1 torchmetrics==1.6.0 torchvision==0.15.2
pip install -r requirements.txt

Download required files

install & login huggingface

reference: https://huggingface.co/docs/huggingface_hub/en/guides/cli

pip install -U "huggingface_hub[cli]"
huggingface-cli login

install git-lfs

sudo apt install git-lfs
git lfs install

download dataset from huggingface

git clone https://huggingface.co/datasets/upstage/TFLOP-dataset

Directory Layout

├── images
│   ├── test.tar.gz
│   ├── train.tar.gz
│   └── validation.tar.gz
├── meta_data
│   ├── erroneous_pubtabnet_data.json
│   ├── final_eval_v2.json
│   └── PubTabNet_2.0.0.jsonl
└── pse_results
    ├── test
    │   └── end2end_results.pkl
    ├── train
    │   ├── detection_results_0.pkl
    │   ├── detection_results_1.pkl
    │   ├── detection_results_2.pkl
    │   ├── detection_results_3.pkl
    │   ├── detection_results_4.pkl
    │   ├── detection_results_5.pkl
    │   ├── detection_results_6.pkl
    │   └── detection_results_7.pkl
    └── val
        └── detection_results_0.pkl

unzip image files

cd TFLOP-dataset
cd images
tar -xvzf train.tar.gz
tar -xvzf validation.tar.gz
tar -xvzf test.tar.gz

download pretrained weights

mkdir pretrain_weights
cd pretrain_weights
git clone --branch official https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2

Data preprocessing

preprocess dataset with pse result

bash scripts/preprocess_data/preprocess_pubtabnet.sh

You can get TFLOP-dataset/meta_data/dataset_train.jsonl, TFLOP-dataset/meta_data/validation.jsonl

TFLOP-dataset
├── images
│   ├── test
│   ├── train
│   ├── validation
├── meta_data
│   ├── dataset_train.jsonl
│   ├── dataset_validation.jsonl
│   ├── erroneous_pubtabnet_data.json
│   ├── final_eval_v2.json
│   └── PubTabNet_2.0.0.jsonl
└── pse_results
    ├── test
    ├── train
    └── val

Training

bash scripts/training/train_pubtabnet.sh

Evaluation

bash scripts/testing/test_pubtabnet.sh <bin_idx> <total_bin_cnt> <experiment_savedir> <epoch_step>
python evaluate_ted.py --model_inference_pathdir <experiment_savedir>/<epoch_step> \
                       --output_savepath <experiment_savedir>/<epoch_step>

# Example
bash scripts/testing/test_pubtabnet.sh 0 1 results/pubtabnet_experiment/expv1 epoch_29_step_231000

Contributors

_{Khang, Minsoo}

_{Joo, SeHwan}

_{Hong, Teakgyu}

Acknowledgement

We would like to express our gratitude for the outstanding works that have served as valuable references in this research:

Donut repository for architecture implementation
SupContrast repository for Contrastive Learning implementation
PubTabNet repository for TED implementation

Citation

@inproceedings{khang2024tflop,
  title={TFLOP: table structure recognition framework with layout pointer mechanism},
  author={Khang, Minsoo and Hong, Teakgyu},
  booktitle={Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence},
  pages={947--955},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config/exp_configs		config/exp_configs
dataset		dataset
figures		figures
scripts		scripts
tflop		tflop
Makefile		Makefile
README.md		README.md
evaluate_ted.py		evaluate_ted.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism

📣 Latest Updates

🚀 Getting Started

Installation

Download required files

Data preprocessing

Training

Evaluation

Contributors

Acknowledgement

Citation

About

Releases

Packages

Languages

UpstageAI/TFLOP

Folders and files

Latest commit

History

Repository files navigation

TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism

📣 Latest Updates

🚀 Getting Started

Installation

Download required files

Data preprocessing

Training

Evaluation

Contributors

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages