Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-243: dataset downloader #247

Merged
merged 17 commits into from
Nov 26, 2018
Merged

GH-243: dataset downloader #247

merged 17 commits into from
Nov 26, 2018

Conversation

alanakbik
Copy link
Collaborator

addresses #243

  • DataFetcher can now download universal dependencies corpora for 30 languages, WikiNER corpora for 8 languages and some CoNLL tasks.
  • If there is only a training data file, now samples both dev and test data (Train NER for Swedish #3)
  • rename all fetch_* methods to load_* methods

So now you can load a dataset like this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask

# load one corpus
corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)
print(corpus)

# load a MultiCorpus of two UD corpora
corpus = NLPTaskDataFetcher.load_corpora([NLPTask.UD_ENGLISH, NLPTask.UD_GERMAN])
print(corpus)

You no longer need to download the UD corpus yourself. The method will check if it is there and if not download the corpus.

@tabergma
Copy link
Collaborator

👍 Great!

@tabergma tabergma merged commit 65caab9 into release-0.4 Nov 26, 2018
@tabergma tabergma deleted the GH-243-dataset-downloader branch November 26, 2018 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants