Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct OWL 2 DL syntax of enumerations of literals #435

Closed
13 tasks done
ajnelson-nist opened this issue Aug 8, 2022 · 3 comments · Fixed by #427 or #436
Closed
13 tasks done

Correct OWL 2 DL syntax of enumerations of literals #435

ajnelson-nist opened this issue Aug 8, 2022 · 3 comments · Fixed by #427 or #436

Comments

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented Aug 8, 2022

Background

This proposal is an upgrade to UCO CP-90, and is in part a transcription to re-capture original motivations. The CP-90 Confluence page should now be considered superseded by this Github Issue.

The construction of UCO vocabularies does not conform to the OWL 2 mechanism for custom datatype definitions. The non-conformance issue harkens back to the definition of datatypes in RDF Schema. This proposal adds and encodes two requirements for the UCO vocabulary namespace, which apply equally to CASE’s vocabulary namespace.

For illustration, we will upgrade UCO's vocabulary:BitnessVocab to become conformant. (This was selected by merit of having just two members.)

Here is the current definition of BitnessVocab, omitting the rdfs:label and rdfs:comment so we may focus on the OWL syntax:

vocabulary:BitnessVocab
	a rdfs:Datatype ;
	rdfs:subClassOf rdfs:Resource ;
	owl:oneOf (
		"32"^^vocabulary:BitnessVocab
		"64"^^vocabulary:BitnessVocab
	) ;
	.

Requirements

Requirement 1

UCO must satisfy with OWL 2 syntactic requirements. This entails requiring conformance with RDF syntactic requirements. (These requirements are not distinct to this proposal.) (These requirements are agnostic to serialization language, e.g. application/rdf+xml, text/turtle, et al.)

Requirement 2

Remove all statements of the form vocabulary:SomeVocab rdfs:subClassOf rdfs:Resource . from the vocabulary namespace.

Requirement 2 applicability

This syntax is present in the vocabulary definitions that has confused some tools used in earlier attempts to review the Datatype issues. This statement:

vocabulary:BitnessVocab rdfs:subClassOf rdfs:Resource .

is entailed by RDF Schema, Sections 2.3 and 2.4, which respectively state:

rdfs:Literal is a subclass of rdfs:Resource.

Each instance of rdfs:Datatype is a subclass of rdfs:Literal.

At least one OWL Profile validation tool is confused by this redundant statement. An error declaration is made by the tool ROBOT and its verb validate-profile, that the custom vocabulary declares itself an individual and a class. This violates OWL 1 DL, and is not excepted by OWL 2 DL's Punning.

Hence, with no loss of semantics, we propose this requirement to prevent tool confusion.

Requirement 3

In conformance with OWL 2 DL and RDF datatype definitions, UCO's datatypes that are enumerations of string-literals must conform with this definition from RDF 1.1 Concepts Section 5

A datatype consists of a lexical space, a value space and a lexical-to-value mapping, and is denoted by one or more IRIs.

Requirement 3 applicability

For some of the original XML Schema datatypes, the value space draws from abstract and/or platonic concepts, such as xsd:boolean containing the values true and false, distinct from the lexical values "true" and "false".

As spelled in UCO 0.9.0 and earlier, BitnessVocab confuses the lexical space and value space, and provides no mapping. The syntax for defining a datatype is given in functional syntax in OWL 2 Syntax, Section 9.4; see especially the example a:SSN. Also note that RDF syntax can be toggled "On" with that document's "Show RDF in Examples" button, which shows this demonstration:

a:SSN rdf:type rdfs:Datatype .

a:SSN owl:equivalentClass _:x .
_:x rdf:type rdfs:Datatype .
_:x owl:onDatatype xsd:string .
_:x owl:withRestrictions ( _:y ) .
_:y xsd:pattern "[0-9]{3}-[0-9]{2}-[0-9]{4}" .

a:hasSSN rdfs:range a:SSN .

(OWL 1 made discussion of lexical vs. value Datatypes somewhat less confusing to discuss with a concept owl:DataRange. However, owl:DataRange was dropped in the transition to OWL 2. So, we must make do with discussing "Value space" Datatypes and "Lexical space" Datatypes.)

Two less-obvious syntax components are pertinent to UCO:

  • The IRI-identified rdfs:Datatype (i.e. the value-space Datatype) is defined as equivalent to another rdfs:Datatype (i.e. the lexical-space Datatype).
  • _:x, especially the underscore as a node-identifier prefix, is a designation of a blank node. This text pattern holds both in examples in the OWL 2 Syntax document, and in the strict parsing rules in the OWL 2 to RDF mapping.

The OWL 2 to RDF mapping defines the required syntax (Section 3.2.4, Table 12, row 4) for the enumeration-based lexical-space rdfs:Datatype:

_:x rdf:type rdfs:Datatype .
_:x owl:oneOf T(SEQ lt1 ... ltn) .
{ n ≥ 1 }

UCO's vocabularies are currently incorrect in two manners according to this syntax:

  1. The subject-node bearing owl:oneOf is an IRI, not a blank node.
  2. No distinction is made between value space and lexical space.

Risk / Benefit analysis

Benefits

  • This brings UCO in conformance with OWL 2 DL's datatype syntax. Without applying this syntax revision, UCO is in OWL 2 FULL, due to not matching the strict mapping requirements in the OWL 2 to RDF mapping document.
  • UCO will have a better path to being open-able in other OWL review tools.

Risks

  • Correction of required syntax is seen as bearing no appreciable risks.
  • It is not currently clear whether it is safe to leave the literal-typing declarations upon the strings within the owl:oneOf sequences. Their usage induces a circular definition, but to date this usage has not caused confusion within graph engines. Hence, review of the strings' type annotations is left as out of scope.

Competencies demonstrated

Competency 1

A general OWL 2 consumer is interested in seeing all literals that are members of enumerated vocabularies.

Competency Question 1.1

What datatypes are based on fixed sets of literals, and what are their members?

SELECT ?nDatatype ?lValue
WHERE {
  ?nDatatype
    a rdfs:Datatype ;
    owl:equivalentClass ?nLexicalValueSpace ;
    .

  ?nLexicalValueSpace
    a rdfs:Datatype ;
    owl:oneOf/(rdf:rest*)/rdf:first ?lValue ;
    .
}

Result 1.1

Run against UCO 0.9.0, these are the results of that query:

?nDatatype ?lValue
0 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_appletalk
1 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_bth
2 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_inet
3 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_inet6
4 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_ipx
5 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_irda
6 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_netbios
7 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketAddressFamily af_unspec
8 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketProtocolFamily pf_appletalk
9 https://ontology.unifiedcyberontology.org/uco/observable/NetworkSocketProtocolFamily pf_ash
.. (snip) ...

Nothing from the vocabulary namespace is yet returned.

Solution suggestion

  • Define new SHACL-SPARQL test for owl:oneOf on rdfs:Datatype nodes, adding to OWL review suite started in Issue 406.
  • Apply fixes to UCO vocabularies in the vocabulary namespace.
    • Aside: The observable namespace also defines Datatypes that are enumerations of literals, e.g. NetworkSocketAddressFamily. These are already in the correct OWL 2 DL syntax.

Coordination

  • Tracking in Jira ticket OC-192
  • Administrative review completed
  • Requirements to be discussed in OC meeting, 2022-08-09
  • Requirements Review vote occurred, passing, on 2022-08-16
  • Requirements development phase completed.
  • Solution announced to OCs on 2022-08-21
  • Solutions Approval to be discussed in OC meeting, 2022-08-25
  • Solutions Approval vote occurred, passing, on 2022-08-25
  • Solutions development phase completed.
  • Implementation merged into develop (Note: Implementation was accidentally merged before Solutions Approval vote)
  • Implementation (part 2) merged into develop (Note: Implementation slated to merge before Solutions Approval vote due to 427 already being merged)
  • Implementation for CASE merged into CASE's develop (Note: Implementation slated to merge before Solutions Approval vote due to 427 already being merged)
  • Milestone linked
  • Documentation logged in pending release page
@ajnelson-nist ajnelson-nist linked a pull request Aug 8, 2022 that will close this issue
10 tasks
ajnelson-nist added a commit that referenced this issue Aug 8, 2022
This test builds on the PR for Issue 406, and will fail CI as it is
currently filed.  The failure is an intentional demonstration of
non-conformance.  This test will need to be merged into another branch
that had applied the syntax fix.

References:
* #406
* #435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit that referenced this issue Aug 8, 2022
References:
* #435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist ajnelson-nist added this to the UCO 1.0.0 milestone Aug 18, 2022
@sbarnum
Copy link
Contributor

sbarnum commented Aug 18, 2022

The actual PR of the proposed solution for this issue contained a significant and widespread change that was not identified or addressed in the actual change proposal.
The CP identified needed changes to the vocabulary datatype definitions in the vocabulary namespace. These were clearly identified and explained in the CP and were widely discussed and agreed to in the ontology committee.
The solution PR included the additional significant change of duplicating out all vocabulary value lists from not only their position within the vocabulary datatype definition (as identified in the CP) but also into the property shapes for EVERY point of use for any vocabulary throughout all of UCO.
The current approach in UCO supported simple maintenance and evolution of vocabularies as they are all located in one namespace so you always know where to find them and their value space are defined in one place (their datatype definition) so that changes/additions/deletions can be made in one place eliminating the risk of conflicting definitions due to changes being made in some places and not in others.
The proposed changes in the PR eliminate this practicality. With the proposed change there is now significantly more work required and risk imparted to manage vocabularies. Now for any changes/additions/deletions the entire ontology must be searched and ALL occurrences must be maintained consistently. This becomes increasingly problematic when application domain ontologies using UCO utilize UCO vocabularies resulting in required consistency across multiple scopes of authoritative control.
OWL DL is a significant objective to shoot for but should not outweigh the critical needs for practicality in the development and use of UCO. Given the choice between the two, practicality should always win. This is specifically codified as a foundational principle for UCO within the CDO charter.
I see no issues with the proposed changes at the vocabulary datatype definitions but the proliferation of vocabulary value duplication should not be pursued even if it is at the expense of full OWL DL conformance.

This issue was voted on at the August 9th ontology committee meeting and passed over my strong objections. I do not believe that the ontology members voting in the affirmative fully comprehend the potential impact of this change and I think this will likely come back to bite us at a later date where it will be much more difficult to reverse.

This comment is added as a record of these objections and their rationale.

@ajnelson-nist
Copy link
Contributor Author

@sbarnum Thank you for logging your objection.

Slight correction - your comment should have been posted on Issue #406 , not 435.

I agree with this effect on RDF Lists being a significantly unpleasant consequence of OWL 2 DL conformance.

However, the Ontology Committees were warned several times that the engineering convenience of making the shared -members rdf:List concepts was a convenience made in spite of a known potential incompatibility with OWL 2 DL. The first warning was explicit, recorded in Proposal 100, and called out in the meeting just before that proposal was voted on. The committee was warned in the Risks of the Issue page of 406 that, having found the explicit citation in the OWL 2 to RDF mapping, that convenience was being rolled back. And, the first commit documented that the unpleasant change was being carried out.

This is an unfortunate interaction of OWL and SHACL. However, be aware also that the "Semi-open vocabulary" design of UCO is, by my experience within the ontology space, possibly a novel feature, and I suspect is certainly a novel implementation as it is now. The most graceful implementation of the goal of semi-open vocabularies might not be lists of strings. The MIME Taxonomy proposal may lead us to an alternative implementation that still allows for user extensibility with gentle warnings about set-membership; allows definitions of semantics for the members; and further, doesn't itself encourage any of our committee members to jettison our underlying standards for the sake of a "foundational principle" that is meeting harsh realities of technology implementation and interoperability of predating standards.

The design document itself is still undergoing review and testing. The lack of workflow time being devoted to it should not be mistaken for acceptance of the document as a whole.

ajnelson-nist added a commit to frederich-stine/UCO that referenced this issue Aug 19, 2022
A pattern implemented in PR 463 is to avoid `sh:declare`, and where
prefixes are needed in SHACL-SPARQL queries, to inline a PREFIX clause.

This patch removes one instance of `sh:prefixes` that is not addressed
by PR 463.

References:
* ucoProject#435
* ucoProject#463

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Aug 19, 2022
This is a downstream application of two proposals:

* UCO CP-100 implemented the suggested-value enforcement pattern for
  semi-open vocabularies.
  This was a part of UCO 0.8.0.
* UCO Issue 406 adjusted the implementation pattern from UCO CP-100 to
  account for an OWL requirement on `rdf:List` usage.
  This has been approved for UCO 1.0.0.

A third proposal adjusting the CASE vocabulary namespace, UCO Issue 435,
would also apply, but has not had an approval vote yet.  Due to vote and
other Git logistics, that will be handled separately.

References:
* [UCO OC-12] (CP-100) UCO's idea of "Open vocabulary" does not agree
  with its implementation with owl:oneOf
* ucoProject/UCO#406
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Aug 19, 2022
This is a downstream application of two proposals:

* UCO CP-100 implemented the suggested-value enforcement pattern for
  semi-open vocabularies.
  This was a part of UCO 0.8.0.
* UCO Issue 406 adjusted the implementation pattern from UCO CP-100 to
  account for an OWL requirement on `rdf:List` usage.
  This has been approved for UCO 1.0.0.

A third proposal adjusting the CASE vocabulary namespace, UCO Issue 435,
would also apply, but has not had an approval vote yet.  Due to vote and
other Git logistics, that will be handled separately.

No effects were observed on Make-managed files.

References:
* [UCO OC-12] (CP-100) UCO's idea of "Open vocabulary" does not agree
  with its implementation with owl:oneOf
* ucoProject/UCO#406
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Aug 21, 2022
A follow-on patch will fix a realized copy-paste error in the CASE
implementation of the new vocabulary form.

No effects were observed on Make-managed files.

References:
* [ONT-467] (CP-43) Vocabulary datatypes are OWL-syntactically
  incomplete
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE that referenced this issue Aug 21, 2022
This patch corrects a nesting error in the original application of the
syntax form.

The syntax form now matches the form implemented in the resolution of
UCO Issue 435.

No effects were observed on Make-managed files.

References:
* [ONT-467] (CP-43) Vocabulary datatypes are OWL-syntactically
  incomplete
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Aug 21, 2022
Both pointers need to be updated in order to catch an issue raised in
CASE by a new test in UCO.

References:
* casework/CASE#76
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Aug 21, 2022
Both pointers need to be updated in order to catch an issue raised in
CASE by a new test in UCO.

This patch applies a needed fix for a new vocabulary.

With the vocabulary fix, no effects were observed on Make-managed files.

References:
* [UCO OC-119] (CP-43) Represent recoverability of unallocated files
* casework/CASE#76
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Aug 21, 2022
Both pointers need to be updated in order to catch an issue raised in
CASE by a new test in UCO.

A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE#76
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Aug 21, 2022
References:
* casework/CASE#76
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

This proposal is ready for a Solutions Approval vote on 2022-08-25. Note that there was a procedural error in processing this proposal, and it was prematurely merged into UCO's develop. Also, an effect realized on a vocabulary in CASE c/o a new test in this proposal brought to light that CASE had fallen behind on implementing upstream UCO features. The meeting on Thursday will cover updates to the review process implemented to handle both of these matters.

ajnelson-nist added a commit that referenced this issue Aug 22, 2022
With this change, no effects were observed on Make-managed files.

References:
* #435
* [OC-119] (CP-43) Represent recoverability of unallocated files

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to casework/CASE-Utilities-Python that referenced this issue Sep 2, 2022
It's unclear what causes these to appear, but it is believed to be
something between UCO Issue 435 where `rdf:List`s became fully
duplicated in SHACL shapes, the recent release of rdflib 6.2.0, and/or
potentially a bug in the rdf-toolkit normalization.

References:
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
ajnelson-nist added a commit to ajnelson-nist/CASE-Examples-QC that referenced this issue Sep 22, 2022
A follow-on patch will regenerate Make-managed files.

References:
* ucoProject/UCO#435

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants