Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable "Shapes as Data" Paradigm #189

Open
mgberg opened this issue Feb 19, 2024 · 2 comments
Open

Enable "Shapes as Data" Paradigm #189

mgberg opened this issue Feb 19, 2024 · 2 comments
Labels
Core For SHACL 1.2 Core spec

Comments

@mgberg
Copy link

mgberg commented Feb 19, 2024

I've had an idea for a possible extension to SHACL for a while and I'm wondering what others think about it.

Over the past couple years, I have run into several situations where constraints are part of the domain of interest and those constraints should apply to other data in the domain. In those cases, it would be helpful to have shapes be defined as part of data instead of at the schema level, and it would be helpful if the SHACL engine knew how data were connected to these shapes they should be validated against via some existing path expressed in domain terminology.

Doing this would prevent users from needing to extend the ontology/schema to add new constraints. Also, it could prevent the use of metamodeling to accomplish a similar goal, which can get messy and confusing for users.

Here are three generic examples where this feature could potentially be helpful to help convey the idea:

Example 1

Consider the Function Ontology (https://fno.io/spec/#ontology-abstract). If you look at the documentation for fno:Parameter and fno:Output they look very similar to sh:PropertyShape in spirit, and the class fno:Function is therefore like sh:NodeShape. It might be useful to use SHACL to validate sure that function arguments and outputs match what is expected based on the function definition.

However, if instances of fno:Function were Node Shapes, then there would be no convenient way to configure each fno:Function instance to target the right nodes with current SHACL. You'd have to either make each one a class and have the corresponding instances of fno:Execution be instances of each (which might tempt the introduction of metamodeling similar to SPIN Functions), write a clunky custom target type using SHACL-AF that wouldn't be supported by all SHACL engines, or use sh:targetNode to connect each fno:Function instance to the corresponding fno:Execution instances instead of the domain property fno:executes (or in addition to it, which would be redundant).

In this case, it would be convenient if each fno:Execution could be validated against whatever node it was connected to via fno:executes.

Example 2

Consider some future state of the W3C Data Cube ontology. Data Structure Definitions (https://www.w3.org/TR/vocab-data-cube/#dsd-dsd) and Component Specifications are data in this domain. However, they could be modified to be represented as Node Shapes and Property Shapes respectively such that SHACL could be used to validate that the Observations that are part of DataSets that have that Data Structure Definition actually conform to that structure.

The same challenges exist for trying to validate a qb:DataStructureDefinition as a Node Shape as for fno:Function; there is no convenient way to configure each qb:DataStructureDefinition instance to target the right nodes with current SHACL.

In this case, it would be convenient if each qb:Observation could be validated against whatever node it was connected to via the path qb:dataSet/qb:structure.

Furthermore, this would allow more fancy data cube behavior more easily, like how shapes are used for datatypes of QB components here: https://docs.allotrope.org/ADF%20Data%20Cube%20Ontology.html (see examples 5 and 11)

Example 3

Consider the EP-PLAN ontology (https://trustlens.github.io/EP-PLAN/, documentation: https://trustlens.github.io/EP-PLAN/widoco_output/index-en.html), an extension to W3C PROV for capturing in detail the plans that go along with the Activities in PROV. It may be desired to use SHACL to determine whether an activity went according to plan or if some deviation occured. Note that ep-plan:Step and ep-plan:Variable both could be similar in spirit to sh:NodeShape.

The same challenges exist for trying to validate instances of these classes as Node Shapes as for fno:Function; there is no convenient way to configure each ep-plan:Step and ep-plan:Variable instance to target the right nodes with current SHACL.

In this case, it would be convienient if each ep-plan:Activity could be validated against whatever node it was connected to via ep-plan:correspondsToStep and if each ep-plan:Entity could be validated against whatever node it was connected to via ep-plan:correspondsToVariable.

Possible Implementation

I've thought of a few different ways to implement this behavior, but I think the simplest and most efficient way I've thought of so far is to create a new Constraint Component.

This new Constraint Component would function somewhat like the one for sh:node. However, instead of specifying the URI of a Node Shape that value nodes must also conform to, it specifies a SHACL path using a parameter perhaps called, e.g., sh:nodesPath. For each value node for the shape with a value for sh:nodesPath, that value node is also validated against any Node Shape(s) found at the specifed path from the value node (if any resources at that path exist and are Node Shapes).

This would enable the following addition for the Function Ontology in order to validate that all instances of fno:Execution conform to any corresponding instance of fno:Function:

fno:Execution
  sh:nodesPath fno:executes ;
.

And this addition for the Data Cube Ontology in order to validate that all instances of qb:Observation conform to any corresponding instance of qb:DataStructureDefinition:

qb:Observation
  sh:nodesPath (
    qb:dataSet
    qb:structure
  ) ;
.

And these additions for the EP-PLAN Ontology in order to validate that all instances of ep-plan:Activity conform to any corresponding instance(s) of ep-plan:Step and that all instances of ep-plan:Entity conform to any corresponding instance(s) of ep-plan:Variable:

ep-plan:Activity
  sh:nodesPath ep-plan:correspondsToStep ;
.
ep-plan:Entity
  sh:nodesPath ep-plan:correspondsToVariable ;
.

My main reservation with this approach is that I'm not a huge fan of how if sh:node fails validation, many SHACL engines don't include the nested results via sh:detail in their reports, and this constraint would probably function the same way. I hope that more validators would use/take advantage of sh:detail in the future in general.

I have added a prototype implementation of this to this branch in this fork of pyshacl (just because I happen to be the most familiar with the internals of that SHACL engine) and have been playing around with it. Included in this folder in the repo is a file with example data and shapes that demonstrates how it works, as well as the output from the modified version of pyshacl (cleaned up a bit for readability).

I'm curious to know what the community thinks of this, both as a concept and also this particular method of implementation.

@HolgerKnublauch HolgerKnublauch transferred this issue from w3c/shacl Jan 20, 2025
@HolgerKnublauch HolgerKnublauch added the Core For SHACL 1.2 Core spec label Jan 20, 2025
@ajnelson-nist
Copy link

It looks to me like this Issue veers close to a philosophy-of-SHACL question, which I'm not sure would be in scope of the WG or not. (I know scoping like this was mentioned in the meeting yesterday, but that was an early hour for me, so apologies if I misremember.)

#185 poses a question about how to carve up the graphs involved in a SHACL validation process. For the duration of this comment, I'll assume there is a divide, but not necessarily a partitioning, into a data graph (to be reviewed), a shapes graph (providing review rules), and an ontology graph (which helps with the data to be reviewed, but incidentally also gets reviewed due to mix-in). It's not "partitioning" because triples could be in multiple of these graphs simultaneously.

SHACL does support reviewing SHACL. SHACL-SHACL specifically does that.

Having an ontology that uses and extends SHACL as a more-foundational model doesn't seem inconsistent with the nature of RDF modeling. (Apologies for the double-negative.) At some point, the ontology developer (and/or data implementer) would need to decide on whether there would be shapes that need to review only the "TBox" -- but, at least the Function Ontology you noted sounds like a case where "ABox" and "TBox" have a pretty blurry divide.

Is there any change to the core SHACL specification suggested by these use cases? There's already discussion on #215 related to sh:path.

@mgberg
Copy link
Author

mgberg commented Jan 26, 2025

Here's my two cents at least:

The targetShape/node expression targets discussed in #215 are a great way to add dynamic targeting capability to SHACL imo, and I would greatly enjoy that capability as part of the core spec.

I think the main difference between that capability (and SHACL/SHACL-AF's current custom targeting capabilities) is that this proposes that the shapes are applied from another shape instead of specifying their own targets. Using the current/proposed target capability, each of these shapes created (e.g. each instance of fno:Function) would have to have a target defined that refers to itself in some capacity (e.g. ex:MyFunc rsx:targetShape [sh:path fno:executes ; sh:hasValue ex:myFunc] using the syntax shown here). This means each user provided "shape as data" would need a specialized shape created for it automatically in order for it to function as intended.

In this approach, the shapes are applied externally similar to how sh:node currently works. Using the Function Ontology example above, fno:Execution states that all instances of that class must conform to the shape at the path fno:executes, and therefore nothing additional or specialized would need to be added to each instance of fno:Function to apply those shapes to the appropriate targets.

As a side note, if using node expressions is an option on the table, being able to use node expressions instead of just a path for this would be a great extension/alternative to this proposal.

I realize that this may not be as efficient for the SHACL engine as the other approach, but it is simpler in terms of enabling this "shapes as data" paradigm if that is a path that is considered acceptable or in scope. I'm not a part of the WG (although I would find it very interesting to be involved in some capacity) so I wasn't involved in any of the conversations mentioned and don't have the full context here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core For SHACL 1.2 Core spec
Projects
None yet
Development

No branches or pull requests

3 participants