LineFormer - Rethinking Chart Data Extraction as Instance Segmentation

Official repository for the ICDAR 2023 Paper

[Link] to the paper.

Quantitative Results

Dataset	AdobeSynth19 Visual Element Detection¹	Data Extraction²	UB-PMC22 Visual Element Detection	Data Extraction	LineEX Visual Element Detection	Data Extraction
ChartOCR	84.67	55	83.89	72.9	86.47	78.25
Lenovo	99.29	98.81	84.03	67.01	-	-
LineEX	82.52	81.97	50.23	47.03	71.13	71.08
Lineformer (Ours)	97.51	97.02	93.1	88.25	99.20	97.57

Model Usage

Install Environment

This code is based on MMdetection Framework.

Code has been tested on Pytorch 1.13.1 and CUDA 11.7.

Create Conda Environment and install dependencies:

conda create -n LineFormer python=3.8
conda activate LineFormer
bash install.sh

Inference

Download the Trained Model Checkpoint here
Use the demo inference snippet shown below

import infer
import cv2
import line_utils

img_path = "demo/PMC5959982___3_HTML.jpg"
img = cv2.imread(img_path) # BGR format

CKPT = "iter_3000.pth"
CONFIG = "lineformer_swin_t_config.py"
DEVICE = "cpu"

infer.load_model(CONFIG, CKPT, DEVICE)
line_dataseries = infer.get_dataseries(img, to_clean=False)

# Visualize extracted line keypoints
img = line_utils.draw_lines(img, line_utils.points_to_array(line_dataseries))
    
cv2.imwrite('demo/sample_result.png', img)

Example extraction result:

Citation

If you found our work useful, please cite us as follows:

@InProceedings{10.1007/978-3-031-41734-4_24,
author="Lal, Jay
and Mitkari, Aditya
and Bhosale, Mahesh
and Doermann, David",
editor="Fink, Gernot A.
and Jain, Rajiv
and Kise, Koichi
and Zanibbi, Richard",
title="LineFormer: Line Chart Data Extraction Using Instance Segmentation",
booktitle="Document Analysis and Recognition - ICDAR 2023",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="387--400",
abstract="Data extraction from line-chart images is an essential component of the automated document understanding process, as line charts are a ubiquitous data visualization format. However, the amount of visual and structural variations in multi-line graphs makes them particularly challenging for automated parsing. Existing works, however, are not robust to all these variations, either taking an all-chart unified approach or relying on auxiliary information such as legends for line data extraction. In this work, we propose LineFormer, a robust approach to line data extraction using instance segmentation. We achieve state-of-the-art performance on several benchmark synthetic and real chart datasets. Our implementation is available at https://github.com/TheJaeLal/LineFormer.",
isbn="978-3-031-41734-4"
}

Full Plot Data Extraction

Note: LineFormer returns data in form of x,y points w.r.t the image, to extract full data-values you need to extract axis information. Please refer the following resources:

E2E Line Chart Data extraction implementation put together by @tdsone
Chart Element Detection this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data_processing		data_processing
demo		demo
mmdetection		mmdetection
.gitignore		.gitignore
README.md		README.md
clean_chart.py		clean_chart.py
eval.py		eval.py
infer.py		infer.py
install.sh		install.sh
line_utils.py		line_utils.py
lineformer_swin_t_config.py		lineformer_swin_t_config.py
metric6a.py		metric6a.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LineFormer - Rethinking Chart Data Extraction as Instance Segmentation

Quantitative Results

Model Usage

Install Environment

Inference

Citation

Full Plot Data Extraction

About

Releases

Packages

Contributors 3

Languages

TheJaeLal/LineFormer

Folders and files

Latest commit

History

Repository files navigation

LineFormer - Rethinking Chart Data Extraction as Instance Segmentation

Quantitative Results

Model Usage

Install Environment

Inference

Citation

Full Plot Data Extraction

Footnotes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages