wikidata_taxonomy_extraction

Script to extract Wikidata's taxonomy and corresponding class objects from JSON dump.

JSON dumps are available at https://dumps.wikimedia.org/wikidatawiki/entities/.

Any dump formats as provided by Wikidata can be used as input. Specifically this means .bz2, .gz or uncompressed/plain .json are legal file formats.

Due to large file size, file compression and no parallel processes the script will run for multiple hours. Running the script in parallel on subsets of the dump and then merging the resulting taxonomies is a good idea.

Taxonomy in Wikidata

A taxonomy is a directed acyclic graph (DAG) with classes (concepts) as vertices and subclass-of relations as edges. In Wikidata, the subclass-of relation is the property subclass-of (P279). This tool identifies classes with the following rule:

A Wikidata entity X is a class, if it is an item (ID starts with Q) and at least one of the following statements is fulfilled:

X is a subclass iff X has at least one statement with property P279
X is a superclass iff there exists a class Y, which has a statement with property P279 and value Y
X has instances iff there exists an item Y, which has a statement with property P31 and value X

Installation

pip install git+https://github.com/AlexBaier/wikidata_taxonomy_extraction.git

Script

usage: extract-wd-taxonomy [-h] [-v] dump

positional arguments:

dump Path to Wikidata JSON dump.

optional arguments:

-h, --help Show help message
-v, --verbose Prints progress log to stdout.

output:

Given dump path path/to/dump.json.bz2, the following files are generated:

path/to/dump.nodes.taxonomy.csv: Nodes, Table with one column class
path/to/dump.edges.taxonomy.csv: Edges, Table with columns subclass and superclass

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
wikidata_taxonomy_extraction		wikidata_taxonomy_extraction
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikidata_taxonomy_extraction

Taxonomy in Wikidata

Installation

Script

About

Releases

Packages

Languages

License

AlexandraBaier/wikidata_taxonomy_extraction

Folders and files

Latest commit

History

Repository files navigation

wikidata_taxonomy_extraction

Taxonomy in Wikidata

Installation

Script

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages