Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where should my ontology go? Data graph versus shapes graph #185

Open
wouterbeek opened this issue Feb 18, 2024 · 6 comments
Open

Where should my ontology go? Data graph versus shapes graph #185

wouterbeek opened this issue Feb 18, 2024 · 6 comments
Labels
Core For SHACL 1.2 Core spec

Comments

@wouterbeek
Copy link

wouterbeek commented Feb 18, 2024

Originally posed over at #155; also see the comments by others over there.

Observation

According to the SHACL standard, two graphs are relevant for validation: the data graph and the shapes graph. The ontology should be part of the data graph:

The data graph is expected to include all the ontology axioms related to the data and especially all the rdfs:subClassOf triples in order for SHACL to correctly identify class targets and validate Core SHACL constraints.

This seems counter-intuitive to me, since I associate the ontology more with the shapes graph. For example, a shapes graph can owl:import an ontology.

Example

To illustrate my unease, let's take the following data graph:

prefix id: <https://example.com/>
prefix foaf: <http://xmlns.com/foaf/0.1/>

id:john a foaf:Person.

And the following shapes graph:

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sh: <http://www.w3.org/ns/shacl#>

[] sh:targetClass foaf:Agent;
   sh:property
     [ sh:minCount 1;
       sh:path foaf:name ].

Adding the following ontology graph is crucial, otherwise we cannot invalidate the data graph which is missing a foaf:name statement:

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

foaf:Person rdfs:subClassOf foaf:Agent.

Use case

I have a specific use case where this comes up: in TriplyETL we stream though the instance data. The stream passes along millions of small data graphs. For each of these data graphs, we have to add the ontology before the data graph can be validated in-stream. In this use case, it makes more sense to add the ontology to the shapes graph once, and use that same shapes graph to validate all data graphs that pass by.

Expected

I expect either of the following:

  • The SHACL standard requires that the ontology graph is added to the shapes graph, and not to the data graph.
  • The SHACL standard requires that the ontology graph is added to the data graph and/or the shapes graph.
@HolgerKnublauch
Copy link
Contributor

I guess rdfs:subClassOf triples are what matters here. They impact sh:class and class-based targets.

I believe we could change SHACL Core so that these triples will be considered from the union of data and shapes graphs.

Would this address your concern or are there other triples in the data graph that should also be in the shapes graph and vice versa?

@bergos
Copy link

bergos commented Feb 19, 2024

Can't we define that all rdfs:subClassOf reasoning must happen in the shapes graph?

A union graph could make some edge cases, like validating constraints on a SHACL shape, difficult to process. A flag to enable that feature could solve it, but we should only consider it if there are other use cases than the rdfs:subClassOf reasoning.

@HolgerKnublauch
Copy link
Contributor

If we were to ignore rdfs:subClassOf triples from the data graph then we would introduce a breaking change to SHACL, which is something we definitely want to avoid for this (incremental) release. Adding the shapes graph as an extra graph to process is less likely to break existing use cases. But even that is potentially breaking.

@wouterbeek
Copy link
Author

wouterbeek commented Mar 4, 2024

An alternative solution is to introduce a new 3rd graph:

  1. Data graph, containing instances of the classes defined in (2) and (3).
  2. Shapes graph, containing node shapes and property shapes for the instance data in (1). This is the closed half of the data model (SHACL).
  3. Ontology graph, optionally containing classes and properties for the instance data in (1). This is the open half of the data model (RDFS/OWL).

In SHACL 1.0 graph 3 is never given.

In SHACL 1.1, it becomes possible to optionally specify graph 3. If graph 3 is specified, then all class and property statements (RDFS/OWL, including rdfs:subClassOf) are assumed to be in that graph. A user can choose to specify the same graph for (2) and (3).

@VladimirAlexiev
Copy link

Related: #183

@ajnelson-nist
Copy link

An alternative solution is to introduce a new 3rd graph:

  1. Data graph, containing instances of the classes defined in (2) and (3).
  2. Shapes graph, containing node shapes and property shapes for the instance data in (1). This is the closed half of the data model (SHACL).
  3. Ontology graph, optionally containing classes and properties for the instance data in (1). This is the open half of the data model (RDFS/OWL).

In SHACL 1.0 graph 3 is never given.

In SHACL 1.1, it becomes possible to optionally specify graph 3. If graph 3 is specified, then all class and property statements (RDFS/OWL, including rdfs:subClassOf) are assumed to be in that graph. A user can choose to specify the same graph for (2) and (3).

FWIW, I'm aware of at least one tool that follows this practice, keeping data and ontology graphs separate as tool inputs but mixing them in in-memory.

On the other hand, I work with an ontology community that uses shapes as part of its ontology specification, keeping the ontology graph and shapes graph together. This makes significant use of Implicit Class Targets in co-typing sh:NodeShape and owl:Class.

That community happens to use that tool I noted, so the end result is two graphs (shapes graph S, and ontology graph O) reviewing three (shapes graph S, ontology graph O, and data graph D - and yes, S=S and O=O).

I just wanted to leave this user story as "anecdata," which might or might not help @wouterbeek 's original guessed either-or:

I expect either of the following:

  • The SHACL standard requires that the ontology graph is added to the shapes graph, and not to the data graph.
  • The SHACL standard requires that the ontology graph is added to the data graph and/or the shapes graph.

I think we will learn the right way forward more from the inferencing work. My hunch is that the ontology and data graphs will typically have different update rhythms. If data updates, the ontology graph probably(?) wouldn't need to re-run inferencing. If the ontology graph updates, the data graph would probably need to re-run inferencing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core For SHACL 1.2 Core spec
Projects
None yet
Development

No branches or pull requests

5 participants