GH-243: dataset downloader #247

alanakbik · 2018-11-26T16:44:20Z

addresses #243

DataFetcher can now download universal dependencies corpora for 30 languages, WikiNER corpora for 8 languages and some CoNLL tasks.
If there is only a training data file, now samples both dev and test data (Train NER for Swedish #3)
rename all fetch_* methods to load_* methods

So now you can load a dataset like this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask

# load one corpus
corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)
print(corpus)

# load a MultiCorpus of two UD corpora
corpus = NLPTaskDataFetcher.load_corpora([NLPTask.UD_ENGLISH, NLPTask.UD_GERMAN])
print(corpus)

You no longer need to download the UD corpus yourself. The method will check if it is there and if not download the corpus.

…h/flair into GH-243-dataset-downloader

tabergma · 2018-11-26T18:55:39Z

👍 Great!

aakbik added 17 commits November 26, 2018 14:51

GH-243: added dataset downloader for UD and CoNLL corpora

efc5bb7

GH-243: fixed test

8c0ad4a

GH-243: added wikiner reader

98febfd

GH-3: data fetcher samples test data from train if no test file exists

0de8c0b

GH-243: added WikiNER downloader for all languages

0dcc7de

GH-243: added test for dataset downloader

ce62212

GH-243: clean up data after completing test

015b82e

GH-243: added dataset downloader for UD and CoNLL corpora

879482b

GH-243: fixed test

3ccd5ba

GH-243: added wikiner reader

56054b5

GH-3: data fetcher samples test data from train if no test file exists

dbf593f

GH-243: added WikiNER downloader for all languages

057962c

GH-243: added test for dataset downloader

bb31364

GH-243: clean up data after completing test

b288dec

GH-243: fix test

c6b0447

Merge branch 'GH-243-dataset-downloader' of github.com:zalandoresearc…

2d819e4

…h/flair into GH-243-dataset-downloader

GH-243: fix test

b12b073

tabergma merged commit 65caab9 into release-0.4 Nov 26, 2018

tabergma deleted the GH-243-dataset-downloader branch November 26, 2018 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-243: dataset downloader #247

GH-243: dataset downloader #247

alanakbik commented Nov 26, 2018

tabergma commented Nov 26, 2018

GH-243: dataset downloader #247

GH-243: dataset downloader #247

Conversation

alanakbik commented Nov 26, 2018

tabergma commented Nov 26, 2018