Examples

All examples are under directory egs and named by its name of dataset. All data-sets starts with "mock" are data-sets for test.

Examples for NLP

DataSet	Supported Tasks	Description
ATIS	Sequence labeling/ Text classification/ NLU joint learning	Air Travel Information System (ATIS) pilot corpus.
CoNLL2003	Sequence labeling	The CoNLL 2003 NER task consists of newswire text from the Reuters RCV1 corpus tagged with four different entity types (PER, LOC, ORG, MISC).
MSRA_NER	Sequence labeling	MSRA datasets are in the news domain about NER.
SNIL	Sentence Matching	Stanford Natural Language Inference corpus is a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.
Quora_QP	Sentence Matching	Data collected from the quara platform. Quora is a place to gain and share knowledge—about anything.
Yahoo_Answer	Document Classification	Yahoo answers are obtained from (Zhang et al., 2015). This is a topic classification task with 10 classes. The document we use includes question titles, question contexts and best answers.
Trec	Document Classification	This data collection contains all the data used in our learning question classification experiments,which has question class definitions.

DataSet	Supported Tasks	Description
hkust	ASR	HKUST Mandarin Telephone Speech
voxceleb	Speaker Verfication	VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube
iemocap	Emotion	The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC.