- VoID stands for "Vocabulary of Interlinked Datasets". It is a specification for describing the metadata of RDF datasets. For example, in our LinkedMusic datalake, there are various RDF graphs in one SPARQL endpoint. VoID can serve to describe the metadata for the different graphs and their interrelationships. This is useful for managing our datalake as well as play a role in the workflow to faciliate NLQ2SPARQL using LLMs, particularly to generate "federated SPARQL query" (across different databases).
- This is transferred from https://github.com/JervenBolleman/void-generator, which is referenced from the paper LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs (https://arxiv.org/abs/2410.06062).
[Junjun Cao (junjun.cao@mail.mcgill.ca) commented]
This CLI tool will automatically generates VoID description as RDF for a public endpoint given its URL.
Building uses java 17 and maven 3.6+
mvn package
Or you can use one of the pre build releases.
- Always have permission from the endpoint hoster to run this program
- Always run this locally and directly on the endpoint, without cache/proxy servers in between
- If using virtuoso connecting via jdbc is much faster than connecting via http sparql protocol
- Check the help option for exact options.
Runs on a java 17+ jdk, see the help for how to use
java -jar target/void-generator-*uber.jar --help
java -jar target/void-generator-*uber.jar \
--user dba \
--password dba \
--virtuoso-jdbc=jdbc:virtuoso://localhost:1111/charset=UTF-8 \ # note the localhost and "isql-t" port
-r "https://YOUR_SPARQL_ENDPOINT/sparql" \
-s void-file-locally-stored.ttl \
-i "https://YOUR_SPARQL_ENDPOINT/.well-known/void"
The IRI is supposed to be the URL of the published location of the void file/service description.
For any endpoint supporting the SPARQL protocol.
Example given here for the WikiPathways SPARQL endpoint:
java -jar target/void-generator-*-uber.jar -r https://sparql.wikipathways.org/sparql \
-p https://sparql.wikipathways.org/sparql \
--void-file void-wikipathway.ttl \
--iri-of-void 'https://rdf.wikipathway.org/.well-known/void#' \
-g http://rdf.wikipathways.org/
The command line options are there to turn off certain queries/void features that may not run in a reasonable time on specific endpoints. The default should be not to use any options.
When looking to generate shapes or code from a VoID file the main thing to look for are the void:classPartition
s. For most shape and programming languages when generating code you would want one shape
or object oriented class
for each of them.
These void:classPartitions
s will have void:predicatePartition
s and void:datatypePartition
s. The predicatePartions will lead to other resources (objects that are denoted by an IRI or BNode) while the datatypePartitions will lead to literal values.
The void:predicatePartition
s are objects of a triple where a void:subjectTarget
will be that triples predicate. The subject of that triple will also be a type void:LinkSet
and the void:objectTarget
will point to a different object of a void:classPartition
triple.
-Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.log.org.apache=info
Once the VoID description turtle file has been generated you can upload it to your endpoint and retrieve its information with the SPARQL queries below.
Without subject/objects count:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
SELECT DISTINCT ?subjectClass ?prop ?objectClass ?objectDatatype
WHERE {
{
?cp void:class ?subjectClass ;
void:propertyPartition ?pp .
?pp void:property ?prop .
OPTIONAL {
{
?pp void:classPartition [ void:class ?objectClass ] .
} UNION {
?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
}
}
} UNION {
?linkset void:subjectsTarget [ void:class ?subjectClass ] ;
void:linkPredicate ?prop ;
void:objectsTarget [ void:class ?objectClass ] .
}
}
With subject/objects count:
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
SELECT DISTINCT ?subjectsCount ?subjectClass ?prop ?objectClass ?objectsCount ?objectDatatype
WHERE {
{
?cp void:class ?subjectClass ;
void:entities ?subjectsCount ;
void:propertyPartition ?pp .
?pp void:property ?prop .
OPTIONAL {
{
?pp void:classPartition [ void:class ?objectClass ; void:triples ?objectsCount ] .
} UNION {
?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
}
}
} UNION {
?linkset void:subjectsTarget [ void:class ?subjectClass ; void:entities ?subjectsCount ] ;
void:linkPredicate ?prop ;
void:objectsTarget [ void:class ?objectClass ; void:entities ?objectsCount ] .
}
}