Curated datasets for TEA

This repository contains curation data and the target articles for Pathogen Identifier and Strain Tagger datasets.

The curation data is stored in JSON format under curation_data and divided into two subfolders: pathogens for Pathogen Identifier dataset and strains for Strain Tagger dataset.

The JSON files contain curated article hash as a key followed by IOB-format compatible tag/location key-value pairs. The tag locations are indicated inside an array on word-level as a start index, followed by a plus sign and another number that indicates the length of the tag: e.g. strains/purple: [42+2, 153+1] means that there are two strains/purple tags starting from word number 43 and 154, and spanning for two words and one word (i.e. that word only).

See TEA repository for usage example.

Entity categories

Pathogen Identifier

Entity	Number of tags
commensals	25
negatives	9322
opportunistics	37
pathogens	859
probiotics	20
strains	123

Strain Tagger

Entity	Number of tags
negatives	13181
species	837
strains	1788

The curated data is provided under MIT licence.

Third-party licences

The target articles texts are found under source_articles. The article texts are licenced under various Creative Commons licences (i.e. BY/NC/SA).

See attribution.txt for article-level details and general attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
curation_data		curation_data
source_articles		source_articles
LICENSE		LICENSE
README.md		README.md
attribution.txt		attribution.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curated datasets for TEA

Entity categories

Pathogen Identifier

Strain Tagger

Third-party licences

About

Releases 1

Packages

License

tznurmin/TEA_curated_data

Folders and files

Latest commit

History

Repository files navigation

Curated datasets for TEA

Entity categories

Pathogen Identifier

Strain Tagger

Third-party licences

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Packages