Run Extraction Framework #8

mgns · 2018-01-23T13:31:21Z

Effort

1-2 days

Skills

basic maven, executing README file

Description

The DBpedia extraction framework can download a set of Wikipedia XML dumps and extract facts. There is a configuration file where you specify the language(s) you want and just run it. Setup your download & extract configuration files and run a simple dump-based extraction.

Impact

Get to know the was the extraction framework works.

AnubhavUjjawal · 2019-01-15T06:46:39Z

Hi. This is in reference to issue #24 . I downloaded the project and ran a dump-based extraction. Everything went well, just faced a java issue(Had to make sure to use java1.8, used jenv for this.) before the extraction. However, I had to stop the ../run download download.10000.properties command at
date page 'https://dumps.wikimedia.org/wikidatawiki/20190101/' has all files [pages-articles-multistream.xml.bz2]
downloading 'https://dumps.wikimedia.org/wikidatawiki/20190101/wikidatawiki-20190101-pages-articles-multistream.xml.bz2' to '/Users/anubhavujjawal/Desktop/data/extraction-data/2018-10/wikidatawiki/20190101/wikidatawiki-20190101-pages-articles-multistream.xml.bz2' read 28.0153 MB of 58.74201 GB in 01:52 min
since I didn't have the available bandwidth and space(I use a macbook air 128 GB model) to complete this. After it, I ran ../run extraction extraction.default.properties ran well. Have I messed anything up?

joshuabezaleel · 2019-03-28T01:09:40Z

Hi everyone and @mgns . I tried the instructions of the dump-based-extraction here on both of the download.10000.properties and download.minimal.properties download config file and got an error "Caused by: java.lang.IllegalArgumentException: Base directory does not exist yet: \data\extraction-data\2018-10" for both.

I tried to create directories from the root with /data/extraction-data/2018-10 but still got the error.

Is there any solution to this?
Thank you very much.

mgns added gsoc-2018 Google Summer of Code 2018. warmup-task Warmup task to practice before applying for GSoC. labels Jan 23, 2018

mgns mentioned this issue Feb 15, 2018

Extracting Table of Contents (TOCs) for Articles #3

Closed

mommi84 removed gsoc-2018 Google Summer of Code 2018. labels Dec 2, 2018

beyzayaman mentioned this issue Dec 20, 2018

Extending Extraction Framework with Citations, Commons and Lexemes Extractors #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Extraction Framework #8

Run Extraction Framework #8

mgns commented Jan 23, 2018

AnubhavUjjawal commented Jan 15, 2019

joshuabezaleel commented Mar 28, 2019 •

edited

Loading

Run Extraction Framework #8

Run Extraction Framework #8

Comments

mgns commented Jan 23, 2018

Effort

Skills

Description

Impact

AnubhavUjjawal commented Jan 15, 2019

joshuabezaleel commented Mar 28, 2019 • edited Loading

joshuabezaleel commented Mar 28, 2019 •

edited

Loading