Where should my ontology go? Data graph versus shapes graph #185

wouterbeek · 2024-02-18T11:09:44Z

Originally posed over at #155; also see the comments by others over there.

Observation

According to the SHACL standard, two graphs are relevant for validation: the data graph and the shapes graph. The ontology should be part of the data graph:

The data graph is expected to include all the ontology axioms related to the data and especially all the rdfs:subClassOf triples in order for SHACL to correctly identify class targets and validate Core SHACL constraints.

This seems counter-intuitive to me, since I associate the ontology more with the shapes graph. For example, a shapes graph can owl:import an ontology.

Example

To illustrate my unease, let's take the following data graph:

prefix id: <https://example.com/>
prefix foaf: <http://xmlns.com/foaf/0.1/>

id:john a foaf:Person.

And the following shapes graph:

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sh: <http://www.w3.org/ns/shacl#>

[] sh:targetClass foaf:Agent;
   sh:property
     [ sh:minCount 1;
       sh:path foaf:name ].

Adding the following ontology graph is crucial, otherwise we cannot invalidate the data graph which is missing a foaf:name statement:

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

foaf:Person rdfs:subClassOf foaf:Agent.

Use case

I have a specific use case where this comes up: in TriplyETL we stream though the instance data. The stream passes along millions of small data graphs. For each of these data graphs, we have to add the ontology before the data graph can be validated in-stream. In this use case, it makes more sense to add the ontology to the shapes graph once, and use that same shapes graph to validate all data graphs that pass by.

Expected

I expect either of the following:

The SHACL standard requires that the ontology graph is added to the shapes graph, and not to the data graph.
The SHACL standard requires that the ontology graph is added to the data graph and/or the shapes graph.

The text was updated successfully, but these errors were encountered:

HolgerKnublauch · 2024-02-19T12:20:32Z

I guess rdfs:subClassOf triples are what matters here. They impact sh:class and class-based targets.

I believe we could change SHACL Core so that these triples will be considered from the union of data and shapes graphs.

Would this address your concern or are there other triples in the data graph that should also be in the shapes graph and vice versa?

bergos · 2024-02-19T20:50:28Z

Can't we define that all rdfs:subClassOf reasoning must happen in the shapes graph?

A union graph could make some edge cases, like validating constraints on a SHACL shape, difficult to process. A flag to enable that feature could solve it, but we should only consider it if there are other use cases than the rdfs:subClassOf reasoning.

HolgerKnublauch · 2024-02-20T08:48:58Z

If we were to ignore rdfs:subClassOf triples from the data graph then we would introduce a breaking change to SHACL, which is something we definitely want to avoid for this (incremental) release. Adding the shapes graph as an extra graph to process is less likely to break existing use cases. But even that is potentially breaking.

wouterbeek · 2024-03-04T13:50:43Z

An alternative solution is to introduce a new 3rd graph:

Data graph, containing instances of the classes defined in (2) and (3).
Shapes graph, containing node shapes and property shapes for the instance data in (1). This is the closed half of the data model (SHACL).
Ontology graph, optionally containing classes and properties for the instance data in (1). This is the open half of the data model (RDFS/OWL).

In SHACL 1.0 graph 3 is never given.

In SHACL 1.1, it becomes possible to optionally specify graph 3. If graph 3 is specified, then all class and property statements (RDFS/OWL, including rdfs:subClassOf) are assumed to be in that graph. A user can choose to specify the same graph for (2) and (3).

VladimirAlexiev · 2025-01-20T14:17:19Z

Related: #183

ajnelson-nist · 2025-01-20T19:11:21Z

An alternative solution is to introduce a new 3rd graph:

Data graph, containing instances of the classes defined in (2) and (3).

Shapes graph, containing node shapes and property shapes for the instance data in (1). This is the closed half of the data model (SHACL).

Ontology graph, optionally containing classes and properties for the instance data in (1). This is the open half of the data model (RDFS/OWL).

In SHACL 1.0 graph 3 is never given.

In SHACL 1.1, it becomes possible to optionally specify graph 3. If graph 3 is specified, then all class and property statements (RDFS/OWL, including rdfs:subClassOf) are assumed to be in that graph. A user can choose to specify the same graph for (2) and (3).

FWIW, I'm aware of at least one tool that follows this practice, keeping data and ontology graphs separate as tool inputs but mixing them in in-memory.

On the other hand, I work with an ontology community that uses shapes as part of its ontology specification, keeping the ontology graph and shapes graph together. This makes significant use of Implicit Class Targets in co-typing sh:NodeShape and owl:Class.

That community happens to use that tool I noted, so the end result is two graphs (shapes graph S, and ontology graph O) reviewing three (shapes graph S, ontology graph O, and data graph D - and yes, S=S and O=O).

I just wanted to leave this user story as "anecdata," which might or might not help @wouterbeek 's original guessed either-or:

I expect either of the following:

The SHACL standard requires that the ontology graph is added to the shapes graph, and not to the data graph.

The SHACL standard requires that the ontology graph is added to the data graph and/or the shapes graph.

I think we will learn the right way forward more from the inferencing work. My hunch is that the ontology and data graphs will typically have different update rhythms. If data updates, the ontology graph probably(?) wouldn't need to re-run inferencing. If the ontology graph updates, the data graph would probably need to re-run inferencing.

wouterbeek mentioned this issue Feb 18, 2024

Where should my ontology go? Data graph versus shapes graph #155

Closed

tpluscode mentioned this issue Sep 12, 2024

Inheritance of property classes through rdfs:subClassOf zazuko/rdf-validate-shacl#144

Closed

HolgerKnublauch transferred this issue from w3c/shacl Jan 20, 2025

HolgerKnublauch added the Core For SHACL 1.2 Core spec label Jan 20, 2025

VladimirAlexiev mentioned this issue Jan 20, 2025

define SHACL Validation API #214

Open

ajnelson-nist mentioned this issue Jan 21, 2025

Enable "Shapes as Data" Paradigm #189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where should my ontology go? Data graph versus shapes graph #185

Where should my ontology go? Data graph versus shapes graph #185

wouterbeek commented Feb 18, 2024 •

edited

Loading

HolgerKnublauch commented Feb 19, 2024

bergos commented Feb 19, 2024

HolgerKnublauch commented Feb 20, 2024

wouterbeek commented Mar 4, 2024 •

edited

Loading

VladimirAlexiev commented Jan 20, 2025

ajnelson-nist commented Jan 20, 2025

Where should my ontology go? Data graph versus shapes graph #185

Where should my ontology go? Data graph versus shapes graph #185

Comments

wouterbeek commented Feb 18, 2024 • edited Loading

Observation

Example

Use case

Expected

HolgerKnublauch commented Feb 19, 2024

bergos commented Feb 19, 2024

HolgerKnublauch commented Feb 20, 2024

wouterbeek commented Mar 4, 2024 • edited Loading

VladimirAlexiev commented Jan 20, 2025

ajnelson-nist commented Jan 20, 2025

wouterbeek commented Feb 18, 2024 •

edited

Loading

wouterbeek commented Mar 4, 2024 •

edited

Loading