Skip to content

RDF star for recording commit deltas to an RDF graph

Peter F. Patel-Schneider edited this page Nov 30, 2023 · 10 revisions

RDF-star for recording commit deltas to an RDF graph

From https://github.com/w3c/rdf-ucr/issues/13 This use case is based on https://github.com/w3c/rdf-star/issues/29#issuecomment-728334375.

When recording commit deltas it is useful to have the deltas be quoted triples, as in

r:47e1cf2 a :Commit ; 
     :delete <<:bob :age 23>> ;
     :add <<:bob :age 24>>, <<:bob :gender :male>> .

Motivation

There is a need to store commit deltas on RDF graphs, for example to regenerate a previous version of the graph. Storing the commit deltas as RDF further allows for searching through them using SPARQL, for example to find when triples associated with a node were added or deleted.

Need for quoted triples

Using quoted triples, as opposed to something like reification, allows for simpler and easier recording of the commits and easier searching of them across commmit history using a version of SPARQL that includes quoted triples.

With quoted triples there is also the possibility of better handling of blank nodes in commits, so that the actual blank node can be distinguished.

An example RDF graph that shows part of the use case

There is a tension in this use case between representation fidelity and ease of access. The add and delete lists for a commit are fixed and thus would be better represented as something like an RDF list. However, searching is easier if the add and delete lists are represented as values of triples associated directly with the commit, as is done here.

Here is a representation of several small deltas, showing both the simple case (ground triples with no embedded quoted triples), quoted triples, and blank nodes.

r:47e1cf2 a :Commit ; 
     :graph r:geneology;
     :time "2002-05-30T09:00:00"^^xsd:dateTime;
     :delete <<a:bob b:age 23>> ;
     :add <<a:bob b:age 24>>, <<a:bob b:gender b:male>> .

r:47a54ad a :Commit ; 
     :graph r:geneology;
     :time "2002-06-07T09:00:00"^^xsd:dateTime;
     :add << <<a:bob b:gender b:male>> b:certainty 0.1 >>.

r:47a54ae a :Commit ; 
     :graph r:geneology;
     :time "2002-06-07T09:00:01"^^xsd:dateTime;
     :add << <<a:bob b:gender b:male>> b:support _:x >> ,
     	  <<_:x b:source b:news-of-the-world >> , 
     	  <<_:x b:date "1999-04-01"^^xsd:date >> , 
 	  << <<a:bob b:gender b:male>> b:support _:y >> ,
     	  <<_:y b:source b:weekly-world-news >> , 
     	  <<_:y b:date "2001-08-09"^^xsd:date >> .

Desired behaviour

The identity of blank nodes is important in this use case, as the added and deleted triples should exactly match triples in the graph. This implies that a commit graph is in a context where blank nodes in it are shared with the blank nodes in the target graph and that a blank node identifier identifies a particular blank node. So the above graph entails

r:47a54ae :add <<_:x b:source b:news-of-the-world >> .

but not

r:47a54ae :add <<_:xx b:source b:news-of-the-world >> .

and using the second triple as a SPARQL query would produce no matches.

The SPARQL query

r:47a54ae :add <<?x :source b:news-of-the-world >> .

would produce two matches and the blank node identifiers produced would be significant.

Representing the use case without quoted triples

RDF reification

Without quoted triples the commit deltas could use something like RDF reification, as in

r:47a54ae a :Commit ; 
     :graph r:geneology;
     :time "2002-06-07T09:00:01"^^xsd:dateTime;
     :add [ rdf:subject [ rdf:subject a:bob; rdf:predicate b:gender; rdf:object b:male ];
     	    rdf:predicate b:support;
 	    rdf:object _:a ],
...

This is more verbose. Further, blank nodes representing triples would have to be treated differently from blank nodes in the original triples.

Named graphs

It might be possible to use named graphs - a named graph for the deleted triples and a named graph for the added triples. This has the advantage that there is no need to represent a set of triples inside an RDF graph using either lists or an multi-valued property. If all the graphs are in the same RDF dataset SPARQL can effectively be used, except that there is still the problem of searching for a specific blank node. As RDF entailment does not do anything with named graphs, there is no problem with devising a semantics that works correctly for this use case.

Analysis

This is an obvious use of quoted triples, but one in which the identity of blank nodes should be preserved between different graphs.