Skip to content

Latest commit

 

History

History
121 lines (86 loc) · 3.44 KB

README.md

File metadata and controls

121 lines (86 loc) · 3.44 KB

<img source="goku_rdf_slurp.png" width=10% height=10%>

RDFIngest

tests Coverage Status PyPI version license: GPL v3

RDFIngest - A simple tool for ingesting local and remote RDF data sources into a triplestore.

WARNING: This project is in an early stage of development and should be used with caution.

Requirements

  • Python >= 3.11

Installation

RDFIngest is availabe on PyPI:

pip install rdfingest

Also the RDFIngest CLI can be installed with pipx:

pipx install rdfingest

For installation from source either use poetry or run pip install . from the package folder.

Usage

RDFIngest reads two YAML files:

  • a config file for obtaining triplestore credentials and
  • a registry which defines the RDF sources to be ingested.

Example config:

service:
  endpoint: "https://sometriplestore.endpoint"
  user: "admin"
  password: "supersecretpassword123"

Example registry:

graphs:
  - source: https://someremote.ttl
    graph_id: https://somenamedgraph.id

  - source: [
    somelocal.ttl,
    https://someotherremote.ttl
    ]
    graph_id: https://someothernamedgraph.id
    
  - source: https://someremote.trig
  
  - source: [
    https://someotherremote.trig,
    someotherlocal.ttl,
    yetanotherremote.ttl	
    ]
    graph_id: https://yetanothernamedgraph.id

RDFIngest parses all registered RDF sources and ingests the data as named graphs into the specified triplestore by executing POST requests for every source.

By default also a SPARQL DROP operation is run for every Graph ID before POSTing.

For contextless RDF sources a graph_id is required, RDF Datasets/Quad formats obviously do not require a graph_id field.

For Datasets, the default graph (at least for now) is ignored. Running automated DROP and/or POST operations on a remote default graph is considered somewhat dangerous.

Namespaces are one honking great idea -- let's do more of those!

The tool accepts both local and remote RDF data sources.

Entry example

Consider the following entry:

graphs:
 - source: [
    https://someremote.trig,
    somelocal.ttl,
    anotherremote.ttl	
    ]
    graph_id: https://somenamedgraph.id/

In this case every named graph in the Dataset https://someremote.trig is ingested using their respective named graph identifiers, somelocal.ttl and anotherremote.ttl are ingested into a named graph https://somenamedgraph.id/.

CLI

Run the rdfingest command.

rdfingest --config ./config.yaml --registry ./registry.yaml

Default values for config and registry are ./config.yaml and ./registry.yaml.

Also see rdfingest --help.

RDFIngest class

Point an RDFIngest instance to a config file and a registry and invoke run_ingest.

rdfingest = RDFIngest(
	config="./config.yaml"
	registry="./registry.yaml", 
	drop=True,
	debug=False
)

rdfingest.run_ingest()