Skip to content

TKG Batch Data Loading Scripts

Richard Bruskiewich edited this page Apr 12, 2018 · 1 revision

Objective

We wish to have a high throughput data loading standard for the TKB.

Strategy

One strategy is to adapt existing Neo4j Cypher batch loading scripts that were developed for the Knowledge Bio database loading into the Neo4j database which is now the RKB being adapted to use as the TKG. These data loading scripts load a small target set of simple Tab Separated Variable field (TSV) text files: one file for concepts, one for predicates and one for relationship statements with evidence. In principle, the evidence can be "many-to-one" statement, so perhaps, a separate "evidence" flat file may make sense.