PsyTAR preprocessed files

This repository contains

the original PsyTAR dataset, as downloaded from Ask a Patient, on August 1st, 2019;
a Python script to convert it to CSV and CoNLL format;
the converted data.

The folder structure is the following:

data/binary contains the annotations from the Sentence_Labeling sheet;
data/all contains the annotations from the {ADR, WD, SSI, DI}_Identified sheets, in CoNLL format;
data/conflated contains the same data as data/all, but all the entity types are conflated on a single type.

The corpus is avaiable as a whole in each full.txt. file. For the sake of reproducibility, I also provide training, development and test sets splits, with a 80-10-20 ratio. The code for generating the splits should be perfectly reproducible, i.e. if you run the Python scripts, you should obtain the exact same splits you see in this repository.

License

The PsyTAR dataset is under the CC BY 4.0 Data license.

Please cite the original paper if you use the corpus. If you use the splits provided here, please also provide a pointer to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
README.md		README.md
psytar2conll.py		psytar2conll.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PsyTAR preprocessed files

License

About

Releases

Packages

Languages

basaldella/psytarpreprocessor

Folders and files

Latest commit

History

Repository files navigation

PsyTAR preprocessed files

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages