Keyword Spotting Data

##Task ## Your task is to develop a machine learning approach for spotting keywords in the provided documents. You can test your approach on the provided training and validation dataset where you find a list of keywords that you can find for certain at least once in each set.

Data

In this repository you'll find all the data necessary for your KeywordSpotting Task.

You find the following folders:

ground-truth

Contains ground-truth data.

transcription.txt

Contains the transcription of all words (on a character level) of the whole dataset. The Format is as follows:

- XXX-YY-ZZ: XXX = Document Number, YY = Line Number, ZZ = Word Number
- Contains the character-wise transcription of the word (letters seperated with dashes)
- Special characters denoted with s_
	- numbers (s_x)
	- punctuation (s_pt, s_cm, ...)
	- strong s (s_s)
	- hyphen (s_mi)
	- semicolon (s_sq)
	- apostrophe (s_qt)
	- colon (s_qo)

locations

Contains bounding boxes for all words in the svg-format.

- XXX.svg: File containing the bounding boxes for the given documents
- **id** contains the same XXX-YY-ZZ naming as above

images

Contains the original images in jpg-format.

task

Contains three files:

####train.txt / valid.txt #### Contains a splitting of the documents into a training and a validation set.

keywords.txt

Contains a list of keywords of which each will be at least once in the training and validation dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
code		code
ground-truth		ground-truth
images		images
output		output
task		task
task2		task2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyword Spotting Data

Data

ground-truth

transcription.txt

locations

images

task

keywords.txt

About

Releases

Packages

Languages

pattern-recognition-mcs/PatRec19_KWS_Data

Folders and files

Latest commit

History

Repository files navigation

Keyword Spotting Data

Data

ground-truth

transcription.txt

locations

images

task

keywords.txt

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages