Knowledge Graph and Linked Data uniformity analysis with affiliation networks
- Install Python development headers
apt install python3.7-dev
- Install Python C++11 bindings
pip install pybind11
- Clone the repo
git clone https://github.com/davidweisscode/kg-analysis.git
- Create and activate a virtual environment (optional, but recommended)
python3 -m venv venv
source venv/bin/activate
- Install project dependencies
pip install -r requirements.txt
- Download HDT serialized datasets, indexfiles and ontologies from dbpedia.org and rdfhdt.org to start analyzing classes
mkdir kg/ && cd kg/
wget -c http://fragments.dbpedia.org/hdt/dbpedia2016-04en.hdt
wget -c http://fragments.dbpedia.org/hdt/dbpedia2016-04en.hdt.index.v1-1
wget -c http://downloads.dbpedia.org/2016-04/dbpedia_2016-04.owl
Specify your runs with a custom configuration of classes, data size, and projection approach.
config = {
"classes": ["Athlete", "Artist"], # List of DBpedia class names to analyze
"project_method": "intersect", # Choose between 'dot', 'hop', 'intersect', or 'nx'
"kg_source": "kg/dbpedia2016-04en.hdt", # Relative path to .hdt serialized Knowledge Graph
"kg_ontology": "kg/dbpedia.owl", # Relative path to respective Knowledge Graph ontology
"subject_limit": 0, # SPARQL subject limit for each subclass (0 for unlimited)
"predicate_limit": 0, # SPARQL predicate limit for each subject (0 for unlimited)
}
Analyze classes and its subclasses from the DBpedia class mappings.
Run the following four scripts in sequence together with your run configuration
-
Building
- Query your dataset and build a bipartite Knowledge Graph for each
Superclass
specified in your config file - Run
python3 build_graph.py run_config.py
to output an edgelist inout/Superclass.g.csv
- Query your dataset and build a bipartite Knowledge Graph for each
-
Projecting
- Project your bipartite graph into its two onemode representations
- Run
python3 project_graph.py run_config.py
to output onemode edgelists inout/Superclass.t
andout/Superclass.b
-
Computing
- Compute a KNC (k-neighborhood-connectivity) plot based on onemode graphs
- Run
python3 compute_knc.py run_config.py
to output a KNC list inout/Superclass.k.csv
-
Analyzing
- Get properties of the KNC plots computed beforehand
- Run
python3 analyze_knc.py run_config.py
to save properties in your run's result fileout/_results_run_config.py
python3 build_graph.py run_config.py >> log.txt && python3 project_graph.py run_config.py >> log.txt && python3 compute_knc.py run_config.py >> log.txt && python3 analyze_knc.py run_config.py >> log.txt
The results of your run_config.py
runs are saved in out/_results_run_config.py