Name		Name	Last commit message	Last commit date
parent directory ..
imgs		imgs
README.md		README.md
config.txt		config.txt
run.sh		run.sh

README.md

cBAD Dataset

See.

The database of Track A [Simple Documents] consists of 755 images extracted from 9 different archival collections. The dataset comprises images with additional PAGE XMLs 1. The PAGE XML contains text regions, e.g. paragraphs. Thus a layout analysis or text detection needs not to be performed on this dataset. Only handwritten text is present and the dataset contains no tables. The groundtruth of the test-set will be released after evaluating all submitted methods and the final results being made public.

Track B [Complex Documents] contains mixed documents. Though most documents are handwritten, printed documents, book covers, empty pages, and tables are contained in this track. While Track A has locally skewed text-lines, text-lines in Track B are rotated up to 180°

On this example only Complex Track is used.

Usage:

./run.sh

Dataset size more then 2GB, make sure to have at least 6GB of free disk space to store all the experiment.

See config for details about training parameters.

ICDAR 2017 Results

Complex Track

Following table shows results published on ICDAR 2017 proceddings plus the results of this experiment (P2PaLA row), Nonparametric Bootstrapping confidence intervals at 95%, 10000 repetitions.

Method	P	R	F1
DMRZ	85.4	86.3	85.9
P2PaLA	84.8[83.9, 85.7]	85.4[84.4, 86.4]	85.1
BYU	77.3	82.0	79.6
IRISA	69.2	77.2	73.0
UPVLC	83.3	60.6	70.2

As you can notice, results are pretty close to competition winner. Although no hyperparameter tunning is performed.

Corpus Notes

Complex Track

Train data

number of pages: 270
color schema: 60 Gray, 210 sRGB
size: 209 different sizes, from 1504x1194 to 7456x6104
orientation: both Portrait and landscape
Baselines:
- total: 21684
- average per page: 80.3
- min: 0
- max: 472
- histogram

Test data

number of pages: 1010
color schema: 163 Gray, 847 sRGB
size: 678 different sizes, from 982x3127 to 7472x6088
orientation: both Portrait and landscape
Baselines: blind test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cBAD_complex

cBAD_complex

README.md

cBAD Dataset

Usage:

ICDAR 2017 Results

Complex Track

Corpus Notes

Complex Track

Train data

Test data

Files

cBAD_complex

Directory actions

More options

Directory actions

More options

Latest commit

History

cBAD_complex

Folders and files

parent directory

README.md

cBAD Dataset

Usage:

ICDAR 2017 Results

Complex Track

Corpus Notes

Complex Track

Train data

Test data