Skip to content

RDF‐star for Wikidata

Peter F. Patel-Schneider edited this page Nov 30, 2023 · 2 revisions

RDF-star for Wikidata

This is the use case of representing (much of) Wikidata information See https://github.com/w3c/rdf-ucr/issues/24 for discussion.

Wikibase is the sofware that powers Wikidata. Wikibase is using its own data model but provides a RDF mapping. Wikibase contains a native reification system. Each main "snak" (aka triple) like "USA president JoeBiden" can be annotated with "qualifiers" like "start date January 20th 2021" or "predecessor DonaldTrump", "references" (i.e. blank nodes describing a source) and a "rank" (a processing annotation that can have three values "preferred"/"normal"/"deprecated"). Wikibase calls this full construction a "statement".

The current RDF encoding uses a specific RDF node to encode each statement. For example (Wikibase uses opaque identifiers, I have tweaked the RDF to make it more readable):

wd:USA a wikibase:Item ;
    p:president wd:JoeBidenPresidencyStatement wd:DonaldTrumpPresidencyStatement . # p:X are relations between a subject and a statement. The statement subject is the triple subject (here "USA) and the statement predicate is the relation predicate (here "president")

wds:JoeBidenPresidencyStatement a wikibase:Statement  ;
     ps:president wd:JoeBiden ; # ps:X are relations between a statement and an object. The statement object is the triple object (here "JoeBiden") and the statement predicate is the relation predicate (here "president")
     wikibase:rank wikibase:PreferredRank ;
     pq:start_date "2021-01-20"^^xsd:dateTime ; # A qualifier
     pq:predecessor wd:DonaldTrump ; # A qualifier
     prov:wasDerivedFrom wdref:a_reference , wdref:an_other_reference .

wds:DonaldTrumpPresidencyStatement a wikibase:Statement  ;
     ps:president wd:DonaldTrump ;
     wikibase:rank wikibase:NormalRank ;
     pq:start_date "2017-01-20"^^xsd:dateTime ;
     pq:start_date "2021-01-20"^^xsd:dateTime .

wd:USA wdt:president wd:JoeBiden . # For statements with the "best" rank a direct edges is inserted in the RDF with the "wdt:" prefix.

Note that in the previous example the wd:USA wdt:president wd:JoeBiden direct triple have been generated because the statement rank is "preferred". Statements about the older presidencies also exists but have only the "normal" rank such that the direct triples are not generated.

Paper about Wikibase RDF encoding design Reifying RDF: What Works Well With Wikidata?

Representing Wikidata information in RDF-star

For more information on the language of Wikidata, see https://www.mediawiki.org/wiki/Wikibase/DataModel

The obvious way to represent Wikidata statements is to create a quoted triple with the item that the statement is on as the subject, the property of the statement's main Snak as the predicate, and the value of the Snak as the object. This runs into the first problem - there can be multiple statements on one item whose main Snaks have both the same property and same value. The solution is the usual one - to have a special link from quoted triples to something like occurences and attach other information to these occurences. Then the "best" information becomes asserted triples and the "other" information is not asserted.

There are other constructs in Wikidata, such as some-value snaks and no-value snaks. The former can be captured using a blank node for the object. The latter requires the use of an OWL construct and is likely out of the scope of the working group.

So the above information can be represented as:

_:s1 a wikibase:Statement  ;
     rdfstar:occurence_of << wd:USA wd:president wd:JoeBiden >> ;
     wikibase:rank wikibase:PreferredRank ;
     wd:start_date "2021-01-20"^^xsd:dateTime ;
     wd:predecessor wd:DonaldTrump ;
     prov:wasDerivedFrom wdref:a_reference , wdref:an_other_reference .

_:s2 a wikibase:Statement  ;
     rdfstar:occurence_of << wd:USA wd:president wd:DonaldTrump >> ;
     wikibase:rank wikibase:NormalRank ;
     wd:start_date "2017-01-20"^^xsd:dateTime ;
     wd:start_date "2021-01-20"^^xsd:dateTime .

wd:USA wd:president wd:JoeBiden .

It appears that there is no need to have the different namespaces for properties, but it might be necessary to have a separate namespace for properties that can only occur in triples whose subject is a quoted triple.

Analysis

Wikidata does not store the string version of literals, instead using an internal form, so literals are treated transparently. Wikidata does sort of have a way of asserting equality between nodes, via redirection (the result of a merge). This is modelled in the RDF dump using owl:sameAs. However, redirection is later removed which, in effect, results in rewriting an IRI with another. In Wikidata, merging removes information about the item being merged so there is little guidance on whether Wikidata is best modelled by transparent or opaque IRIs.