Skip to content

📊Maps nodes and edges of a multi-relational graph to integer

License

Notifications You must be signed in to change notification settings

simonepri/edgelist-mapper

Repository files navigation

edgelist-mapper

Build status PyPi version Total downloads
Code style Linter Types checker Test runner Task runner Build tool
Project license

📊 Maps nodes and edges of a multi-relational graph to integer

Synopsis

edgelist-mapper is a simple tool that reads an edge-list file representing a graph and maps each node and relation to integer. The mapping assigned is such that entities and relations that appear more frequently in the graph are mapped to smaller numerical values.

This tool is particularly useful to pre-process some of the publicly available knowledge graph datasets that are often used for the machine learning task of relation prediction.

Do you believe that this is useful? Has it saved you time? Or maybe you simply like it?
If so, support this work with a Star ⭐️.

Input format

The tool takes as input a file (edgelist.tsv) that represents a graph as tab-separated triples of the form (head, relation, tail) and generates three new files, namely mapped_edgelist.tsv, entities_map.tsv, and relations_map.tsv.

san_marino	locatedin	europe
belgium	locatedin	europe
russia	locatedin	europe
monaco	locatedin	europe
croatia	locatedin	europe
poland	locatedin	europe

Example content of the edgelist.tsv file.

0	europe
1	san_marino
2	russia
3	poland
4	monaco
5	croatia
6	belgium

Content of the entities_map.tsv generated from the edgelist.tsv file.

0	locatedin

Content of the relations_map.tsv generated from the edgelist.tsv file.

1	0	0
6	0	0
2	0	0
4	0	0
5	0	0
3	0	0

Content of the mapped_edgelist.tsv generated from the edgelist.tsv file.

CLI Usage

The CLI takes the following positional arguments:

  edgelist    Path of the edgelist file
  output      Path of the output directory

Example usage:

pip install edgelist-mapper
python -m edgelist_mapper.bin.run \
    edgelist.tsv \
    .

NB: You need Python 3 to run the CLI.

Showcase

This tool has been used to create this collection of datasets.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the license file for details.