Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Extraction Framework #8

Open
mgns opened this issue Jan 23, 2018 · 2 comments
Open

Run Extraction Framework #8

mgns opened this issue Jan 23, 2018 · 2 comments
Labels
warmup-task Warmup task to practice before applying for GSoC.

Comments

@mgns
Copy link
Member

mgns commented Jan 23, 2018

Effort

1-2 days

Skills

basic maven, executing README file

Description

The DBpedia extraction framework can download a set of Wikipedia XML dumps and extract facts. There is a configuration file where you specify the language(s) you want and just run it. Setup your download & extract configuration files and run a simple dump-based extraction.

Impact

Get to know the was the extraction framework works.

@mgns mgns added gsoc-2018 Google Summer of Code 2018. warmup-task Warmup task to practice before applying for GSoC. labels Jan 23, 2018
@mommi84 mommi84 removed gsoc-2018 Google Summer of Code 2018. labels Dec 2, 2018
@AnubhavUjjawal
Copy link

Hi. This is in reference to issue #24 . I downloaded the project and ran a dump-based extraction. Everything went well, just faced a java issue(Had to make sure to use java1.8, used jenv for this.) before the extraction. However, I had to stop the ../run download download.10000.properties command at
date page 'https://dumps.wikimedia.org/wikidatawiki/20190101/' has all files [pages-articles-multistream.xml.bz2]
downloading 'https://dumps.wikimedia.org/wikidatawiki/20190101/wikidatawiki-20190101-pages-articles-multistream.xml.bz2' to '/Users/anubhavujjawal/Desktop/data/extraction-data/2018-10/wikidatawiki/20190101/wikidatawiki-20190101-pages-articles-multistream.xml.bz2' read 28.0153 MB of 58.74201 GB in 01:52 min
since I didn't have the available bandwidth and space(I use a macbook air 128 GB model) to complete this. After it, I ran ../run extraction extraction.default.properties ran well. Have I messed anything up?

@joshuabezaleel
Copy link

joshuabezaleel commented Mar 28, 2019

Hi everyone and @mgns . I tried the instructions of the dump-based-extraction here on both of the download.10000.properties and download.minimal.properties download config file and got an error "Caused by: java.lang.IllegalArgumentException: Base directory does not exist yet: \data\extraction-data\2018-10" for both.

I tried to create directories from the root with /data/extraction-data/2018-10 but still got the error.

Is there any solution to this?
Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
warmup-task Warmup task to practice before applying for GSoC.
Projects
None yet
Development

No branches or pull requests

4 participants