-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable "Shapes as Data" Paradigm #189
Comments
It looks to me like this Issue veers close to a philosophy-of-SHACL question, which I'm not sure would be in scope of the WG or not. (I know scoping like this was mentioned in the meeting yesterday, but that was an early hour for me, so apologies if I misremember.) #185 poses a question about how to carve up the graphs involved in a SHACL validation process. For the duration of this comment, I'll assume there is a divide, but not necessarily a partitioning, into a data graph (to be reviewed), a shapes graph (providing review rules), and an ontology graph (which helps with the data to be reviewed, but incidentally also gets reviewed due to mix-in). It's not "partitioning" because triples could be in multiple of these graphs simultaneously. SHACL does support reviewing SHACL. SHACL-SHACL specifically does that. Having an ontology that uses and extends SHACL as a more-foundational model doesn't seem inconsistent with the nature of RDF modeling. (Apologies for the double-negative.) At some point, the ontology developer (and/or data implementer) would need to decide on whether there would be shapes that need to review only the "TBox" -- but, at least the Function Ontology you noted sounds like a case where "ABox" and "TBox" have a pretty blurry divide. Is there any change to the core SHACL specification suggested by these use cases? There's already discussion on #215 related to |
Here's my two cents at least: The targetShape/node expression targets discussed in #215 are a great way to add dynamic targeting capability to SHACL imo, and I would greatly enjoy that capability as part of the core spec. I think the main difference between that capability (and SHACL/SHACL-AF's current custom targeting capabilities) is that this proposes that the shapes are applied from another shape instead of specifying their own targets. Using the current/proposed target capability, each of these shapes created (e.g. each instance of In this approach, the shapes are applied externally similar to how As a side note, if using node expressions is an option on the table, being able to use node expressions instead of just a path for this would be a great extension/alternative to this proposal. I realize that this may not be as efficient for the SHACL engine as the other approach, but it is simpler in terms of enabling this "shapes as data" paradigm if that is a path that is considered acceptable or in scope. I'm not a part of the WG (although I would find it very interesting to be involved in some capacity) so I wasn't involved in any of the conversations mentioned and don't have the full context here. |
I've had an idea for a possible extension to SHACL for a while and I'm wondering what others think about it.
Over the past couple years, I have run into several situations where constraints are part of the domain of interest and those constraints should apply to other data in the domain. In those cases, it would be helpful to have shapes be defined as part of data instead of at the schema level, and it would be helpful if the SHACL engine knew how data were connected to these shapes they should be validated against via some existing path expressed in domain terminology.
Doing this would prevent users from needing to extend the ontology/schema to add new constraints. Also, it could prevent the use of metamodeling to accomplish a similar goal, which can get messy and confusing for users.
Here are three generic examples where this feature could potentially be helpful to help convey the idea:
Example 1
Consider the Function Ontology (https://fno.io/spec/#ontology-abstract). If you look at the documentation for
fno:Parameter
andfno:Output
they look very similar tosh:PropertyShape
in spirit, and the classfno:Function
is therefore likesh:NodeShape
. It might be useful to use SHACL to validate sure that function arguments and outputs match what is expected based on the function definition.However, if instances of
fno:Function
were Node Shapes, then there would be no convenient way to configure eachfno:Function
instance to target the right nodes with current SHACL. You'd have to either make each one a class and have the corresponding instances offno:Execution
be instances of each (which might tempt the introduction of metamodeling similar to SPIN Functions), write a clunky custom target type using SHACL-AF that wouldn't be supported by all SHACL engines, or use sh:targetNode to connect eachfno:Function
instance to the correspondingfno:Execution
instances instead of the domain propertyfno:executes
(or in addition to it, which would be redundant).In this case, it would be convenient if each
fno:Execution
could be validated against whatever node it was connected to viafno:executes
.Example 2
Consider some future state of the W3C Data Cube ontology. Data Structure Definitions (https://www.w3.org/TR/vocab-data-cube/#dsd-dsd) and Component Specifications are data in this domain. However, they could be modified to be represented as Node Shapes and Property Shapes respectively such that SHACL could be used to validate that the Observations that are part of DataSets that have that Data Structure Definition actually conform to that structure.
The same challenges exist for trying to validate a
qb:DataStructureDefinition
as a Node Shape as forfno:Function
; there is no convenient way to configure eachqb:DataStructureDefinition
instance to target the right nodes with current SHACL.In this case, it would be convenient if each
qb:Observation
could be validated against whatever node it was connected to via the pathqb:dataSet/qb:structure
.Furthermore, this would allow more fancy data cube behavior more easily, like how shapes are used for datatypes of QB components here: https://docs.allotrope.org/ADF%20Data%20Cube%20Ontology.html (see examples 5 and 11)
Example 3
Consider the EP-PLAN ontology (https://trustlens.github.io/EP-PLAN/, documentation: https://trustlens.github.io/EP-PLAN/widoco_output/index-en.html), an extension to W3C PROV for capturing in detail the plans that go along with the Activities in PROV. It may be desired to use SHACL to determine whether an activity went according to plan or if some deviation occured. Note that
ep-plan:Step
andep-plan:Variable
both could be similar in spirit tosh:NodeShape
.The same challenges exist for trying to validate instances of these classes as Node Shapes as for
fno:Function
; there is no convenient way to configure eachep-plan:Step
andep-plan:Variable
instance to target the right nodes with current SHACL.In this case, it would be convienient if each
ep-plan:Activity
could be validated against whatever node it was connected to viaep-plan:correspondsToStep
and if eachep-plan:Entity
could be validated against whatever node it was connected to viaep-plan:correspondsToVariable
.Possible Implementation
I've thought of a few different ways to implement this behavior, but I think the simplest and most efficient way I've thought of so far is to create a new Constraint Component.
This new Constraint Component would function somewhat like the one for
sh:node
. However, instead of specifying the URI of a Node Shape that value nodes must also conform to, it specifies a SHACL path using a parameter perhaps called, e.g.,sh:nodesPath
. For each value node for the shape with a value forsh:nodesPath
, that value node is also validated against any Node Shape(s) found at the specifed path from the value node (if any resources at that path exist and are Node Shapes).This would enable the following addition for the Function Ontology in order to validate that all instances of
fno:Execution
conform to any corresponding instance offno:Function
:And this addition for the Data Cube Ontology in order to validate that all instances of
qb:Observation
conform to any corresponding instance ofqb:DataStructureDefinition
:And these additions for the EP-PLAN Ontology in order to validate that all instances of
ep-plan:Activity
conform to any corresponding instance(s) ofep-plan:Step
and that all instances ofep-plan:Entity
conform to any corresponding instance(s) ofep-plan:Variable
:My main reservation with this approach is that I'm not a huge fan of how if sh:node fails validation, many SHACL engines don't include the nested results via
sh:detail
in their reports, and this constraint would probably function the same way. I hope that more validators would use/take advantage ofsh:detail
in the future in general.I have added a prototype implementation of this to this branch in this fork of pyshacl (just because I happen to be the most familiar with the internals of that SHACL engine) and have been playing around with it. Included in this folder in the repo is a file with example data and shapes that demonstrates how it works, as well as the output from the modified version of pyshacl (cleaned up a bit for readability).
I'm curious to know what the community thinks of this, both as a concept and also this particular method of implementation.
The text was updated successfully, but these errors were encountered: